Member-only story

What Is πŸ…•πŸ…žπŸ…žπŸ…“ For AI πŸ€–β“

Angelina Yang
4 min readMay 29, 2022

--

Dr. Andrew Ng had a famous talk last year about data-centric AI, which largely shifted the focus of machine learning development in the industry. He introduced the analogy of data as β€œfood for AI”.

Watch Andrew!

How to prep this β€œfoodβ€πŸ– to cook a good β€œmealβ€πŸ²?

A few posts ago we talked about feature store and β€œmodel-ready data”. Today to answer this question we will dive deeper into data selection for model development.

To extend Andrew’s analogy, we can think of feature stores as grocery stores where we pick out veggies and meat (raw data) that we want to use for cooking, and selected data as what we bring home, lay on our chopping board and stare at for a moment thinking about how to cook them. Once the veggies washed and chopped, the meat sliced and diced and ready for cooking, then we call them β€œmodel-ready data”.

How to select data ?

Careful data selection saves resources.

When dealing with big data, one crucial yet resource demanding task for training modern machine learning models is data annotation. So we can rephrase the problem statement as:

How do we efficiently identify the most informative training examples?

--

--

No responses yet