🌱Evaluate Vector Bias

A guide on evaluating vectors on whether they are biased.

Assumed Knowledge: Vectors Target Audience: Data Scientists, Vector Enthusiasts, Python Developers Reading Time: 3 minutes

To identify bias in the representation space, we want to know which direction vectors will be leaning towards. This can be achieved using Normed 2D Cosine Similarity Plots.

We will explore below how the model interprets certain terms and their bias. Let us consider the following male vs female comparison and then explore what these charts show.

Male Vs Female

In the male vs female comparison:

Let us go over the main takeaways from this chart:

  • The purple bar suggests that a certain is more biased towards "female" whereas the green bar suggests the word is more biased towards the "male".

  • The words "princess", "skirt", "perfume" and "make-up" are all strongly tied to females.

  • Comparatively, the words "computers", "football", "machine", "beer" and "prince" are all strongly tied to males.

  • The magnitude of the cosine similarities are also interesting as it indicates that princess is more strongly tied to "female" in contrast to "skirt", "perfume", or "makeup". Conversely, "prince" is more strongly tied to "male" in contrast to "beer" or "machine".

In better understanding the hidden bias in our models, we may want to finetune these vectors and models.

Analysing Between Groups

The use cases of vectors can be extended even further. For example - when optimising for retail, we may want to decide where to place items and where to place categories such that customers can intuitively go to a section, find what they need and optimise conversion. For this, we will be interested in seeing where we should place each item and in which section. This can be optimised using bias indicator to determine where a particular item should go.

In the above graph, we explore a similar look into the representation space - looking over how different home tools compare to different categories. The above example compares technology and home gardening to different objects.

  • Telephones, televisions, tablets, PCs, phone are more biased to technology than home gardening.

  • Manure and garden hose are more biased to technology than home gardening.

  • Cultivator is slightly more biased to home gardening compared to technology. This may be because while cultivators are useful for gardening (not necessarily just home gardening), they are more used for farms and are a product of technology.

Using The Bias Indicator In VectorAI

from vectorai import ViClient
vi.bias_indicator(anchor_docs, docs, metadata_field='word')

This guide would not have been possible without the work of the following papers and articles by teams that have open-sourced their work for research purposes and for us to improve on.

@inproceedings{
  author = {Piero Molino, Yang Wang, Jiwei Zhang},
  booktitle = {ACL},
  title = {Parallax: Visualizing and Understanding the Semantics of Embedding Spaces via Algebraic Formulae},
  year = {2019},
}

Last updated