Member-only story

“All You Need Is Attention”😻 for Data Quality Management

Angelina Yang
5 min readAug 11, 2022

--

The Notorious Data Problem

Frankly, I don’t know any data scientist or ML engineer who would say that they’ve never encountered data quality issues. In fact, most people I’ve asked admit how a painstaking and time-consuming task it could be. A friend of mine claimed her job was a “bait and switch” due to the amount of data engineering and cleaning she had to do versus machine learning.

Data quality management is a bottleneck in modern analytics as high-effort tasks such as data validation and cleaning are essential to obtain accurate results.

The notorious data quality problem involves all kinds of plumbing work such as:

  • Data preparation and validation (transformation, standardization, error detection, data repairs)
  • Data integration cataloguing
Example of data quality issues: missing values and repairs

What Is the Motivation to Solve This Pain-point?

It’s obvious, but let me put it in words:

1. Both analytics results and ML models are sensitive to low-quality data. See our previous post about how vulnerable SOTA…

--

--

No responses yet