What about clean data?

Data quality is an important part of any big data project. Recently, we have seen cases where bad data has the outcome of some major projects. One, for instance, is the case of Amazon AI recruiting tool showed bias against women. The underlying issue here is bad data as that tends to affect the final result. The model is built from the data, our hypothesis is drawn from that model. If the data is wrong, everything else is definitely going to be wrong.

Now, data cleaning is not an easy task. Over 50% of developers time is spent cleaning and organizing the data this is not to mention the spent collecting viable data sets. We have to ensure that data is accessible and available as at when needed. With all of the privacy issues going on now, organizations have to ensure that their data meets the requirements, especially for EU businesses.

We probably heard this in our Database 101 class, data consistency and integrity. I think sometimes this also affects the result of our analysis. Times have changed, one might say the reason some AI systems have “failed” or appear to be biased is that the data sets have not changed with the times we are in. I guess that will be more of the integrity of the data than its consistency.

What if the data was “perfect” and still the result was “wrong” or “unacceptable”. Could it be that human bias is to blame? Or using the right data but in the wrong scenario different from the original model?


  1. You are right that for any analysis, the bulk of time is spent on data cleaning and it’s an important step that can’t be bypassed. My former company once had a project where all we did was verify and validate data and then pass on to another organization that had the pleasure of doing the analysis on the clean data. I was the “scapegoat” analyst who cleaned the data.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s