Word Embedding Fairness Evaluation

By Pablo Badilla and Felipe Bravo-Marquez.

  Word embeddings are dense vector representations of words trained from document corpora.

They have become a core component of natural language processing (NLP) downstream systems because of their ability to efficiently capture semantic and syntactic relationships between words.

A widely reported shortcoming of word embeddings is that they are prone to inherit stereotypical social biases exhibited in the corpora on which they are trained.

The problem of how to quantify the mentioned biases is currently an active area of research, and several different fairness metrics have been proposed in the literature in the past few years.

Although all metrics have a similar objective, the relationship between them is by no means clear.

Two issues that prevent a clean comparison is that they operate with different inputs (pairs of words, sets of words, multiple sets of words, and so on) and that their outputs are incompatible with each other (reals, positive numbers,  range, etc.


This leads to a lack of consistency between them, which causes several problems when trying to compare and validate their results.

We propose the Word Embedding Fairness Evaluation (WEFE) as a framework for measuring fairness in word embeddings, and we released its implementation as an open-source library.

  We propose an abstract view of a fairness metric as a function that receives queries as input, with each query formed by a target and attribute words.

The target words describe the social groups in which fairness is intended to be measured (e.


, women, white people, Muslims), and the attribute words describe traits or attitudes by which a bias towards one of the social groups may be exhibited (e.


, pleasant vs.

unpleasant terms).

For more details on the framework, you can read our recently accepted paper IJCAI paper [1].

WEFE implements the following metrics:  The standard process for measuring bias using WEFE is shown in the following diagram:  There are two different ways to install WEFE:or  In the following code, we measure the gender bias of word2vec using:Given a word embedding , WEAT defines first the measure  where  is the cosine similarity of the word embedding vectors.

Then for a query  the WEAT metric is defined over the embeddings of the query word sets as:The idea is that the more positive the value given by , the more the target  will be related to attribute  and target  to attribute .

On the other hand, the more negative the value, the more target  will be related to attribute  and target  to attribute .

Commonly these values are between  and .

The ideal score is .


We first load a word embedding model using the gensim API.


Then, we create the Query object using the target words (Male names and Female names) and two attribute words sets (Career and Family terms).


Finally, we run the Query using WEAT as the metric.

  As we can see, the execution returns a dict with the name of the executed query and its score.

The score being positive and higher than one indicates that word2vec exhibits a moderately strong relationship between mens names and careers and womens names and family.

Running multiple QueriesIn WEFE, we can easily test multiple queries in one single call:1.

Create the queries: 2.

Add the queries to an array: 3.

Run the queries using WEAT: Note that these results are returned as DataFrame objects.

 We can see that in all cases, male names are positively associated with career, science and math words, whereas female names are more associated with family and art terms.

While the above results give us an idea of the gender bias that word2vec exhibits, we would also like to know how these biases occur in other Word Embeddings models.

We run the same queries on two other embedding models: “glove-wiki” and “glove-twitter”.


Load glove models and execute the queries again:  2.

We can also plot the results:  The execution of run_queries in the previous step gave us various result scores.

However, these do not tell us much about the overall fairness of the embedding models.

We would like to have some mechanism to aggregate these results into a single score.

To do this, when using run_queries, you can set the add_results parameter to True.

This will activate the option to add the results by averaging the absolute values of the results and putting them in the last column of the result table.

It is also possible to ask the function run_queries to return only the aggregated results by setting the return_only_aggregation parameter to True.

  The idea of this type of aggregation is to quantify the amount of bias of the embedding model according to various queries.

In this case, we can see that glove-twitter has a lower amount of gender bias than the other models.

  Finally, we would like to rank these embedding models according to the overall amount of bias they contain.

This can be done using the create_ranking function, which calculates a fairness ranking from one or more query result.

  You can see this tutorial code in this notebook and the complete reference documentation including a user guide, examples and replication of previous studies at the following link.

If you like the project, you are more than welcome to “star” it on Github.

[1] P.

Badilla, F.

Bravo-Marquez, and J.

Pérez WEFE: The Word Embeddings Fairness Evaluation Framework In Proceedings of the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020), Yokohama, Japan.

 Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.



js; (document.

getElementsByTagName(head)[0] || document.


appendChild(dsq); })();.

Leave a Reply