Member-only story
“All You Need Is Attention”😻 for Data Quality Management
The Notorious Data Problem
Frankly, I don’t know any data scientist or ML engineer who would say that they’ve never encountered data quality issues. In fact, most people I’ve asked admit how a painstaking and time-consuming task it could be. A friend of mine claimed her job was a “bait and switch” due to the amount of data engineering and cleaning she had to do versus machine learning.
Data quality management is a bottleneck in modern analytics as high-effort tasks such as data validation and cleaning are essential to obtain accurate results.
The notorious data quality problem involves all kinds of plumbing work such as:
- Data preparation and validation (transformation, standardization, error detection, data repairs)
- Data integration cataloguing
What Is the Motivation to Solve This Pain-point?
It’s obvious, but let me put it in words:
1. Both analytics results and ML models are sensitive to low-quality data. See our previous post about how vulnerable SOTA…