Bias for Word Embeddings and LLMs

Angelina Yang
3 min readSep 18, 2023

Welcome to today’s data science interview challenge! Many of you are already experts in the area of data science and machine learning. And, THANK YOU, for sitting along with me reviewing some of the concepts. If you are new to the space, don’t worry, we are here to support you throughout your journey as well!

Now back to the basics:

Question 1: What’s the best way to tackle bias issues with word embeddings?

Question 2: What about biases within LLMs?

Source

Here are some tips for readers’ reference:

Question 1:

What’s the best way to tackle bias issues with word embeddings? There are ways to do this algorithmically (see a previous post on this topic). But solving this through an algorithmic approach has been proven very challenging.

If you use pre-trained word embeddings that have been trained on substantial portions of the internet, there’s going to be bias in there. Eliminating this bias is hard.

How exactly these biases propagate are really hard to predict, so the best place to think about how to deal with these embeddings and bias is in the application.

If there’s a big risk that your application can

--

--