Member-only story
What Is π π π π For AI π€β
Dr. Andrew Ng had a famous talk last year about data-centric AI, which largely shifted the focus of machine learning development in the industry. He introduced the analogy of data as βfood for AIβ.
How to prep this βfoodβπ to cook a good βmealβπ²?
A few posts ago we talked about feature store and βmodel-ready dataβ. Today to answer this question we will dive deeper into data selection for model development.
To extend Andrewβs analogy, we can think of feature stores as grocery stores where we pick out veggies and meat (raw data) that we want to use for cooking, and selected data as what we bring home, lay on our chopping board and stare at for a moment thinking about how to cook them. Once the veggies washed and chopped, the meat sliced and diced and ready for cooking, then we call them βmodel-ready dataβ.
How to select data ?
Careful data selection saves resources.
When dealing with big data, one crucial yet resource demanding task for training modern machine learning models is data annotation. So we can rephrase the problem statement as:
How do we efficiently identify the most informative training examples?