Member-only story

What Is 🅕🅞🅞🅓 For AI 🤖❓

4 min readMay 29, 2022

Dr. Andrew Ng had a famous talk last year about data-centric AI, which largely shifted the focus of machine learning development in the industry. He introduced the analogy of data as “food for AI”.

Watch Andrew!

How to prep this “food”🍖 to cook a good “meal”🍲?

A few posts ago we talked about feature store and “model-ready data”. Today to answer this question we will dive deeper into data selection for model development.

To extend Andrew’s analogy, we can think of feature stores as grocery stores where we pick out veggies and meat (raw data) that we want to use for cooking, and selected data as what we bring home, lay on our chopping board and stare at for a moment thinking about how to cook them. Once the veggies washed and chopped, the meat sliced and diced and ready for cooking, then we call them “model-ready data”.

How to select data ?

Careful data selection saves resources.

When dealing with big data, one crucial yet resource demanding task for training modern machine learning models is data annotation. So we can rephrase the problem statement as:

How do we efficiently identify the most informative training examples?

What Is 🅕🅞🅞🅓 For AI 🤖❓

How to prep this “food”🍖 to cook a good “meal”🍲?

How to select data ?

Written by Angelina Yang

No responses yet