The Missing Data Twins(aka Null|NAN)

Yet another beautiful day in data town, pd is having a field day, since the missing import issue was fixed we have been on the row just then NULL|NAN values are displayed.

designed using powtoon

There is something endearing about pandas. All these attributes and yet still so cool.  It takes one who loves data and understands pd to avoid a clash like the one that happened yesterday between concat and merge.

Yes, they have both come a long way but sometimes the data Engineer(DE) tries to fit a round peg in a square one, during transformation.  The famous ETL has changed with the times, these days we have more of ELT-RL

One main issue is with our favorite nulls and NAN. Have you tried calling .concat on pd for multiple lists with different row count based on a unique key? And forgetting to specify the key? But then what about the other values on the table? If they don’t match we might be missing the details?

append the dataframes will yield same result

A lot of questions I know but that was how confused the DE was yesterday. Teamwork, a little googling, reading pandas documentation and stack-overflow helped the day. “It feels good when the proper middle name is called to action”.

At the end of the day, NAN|Null is not an enemy. Like every story, our imperfections make us perfect. To understand your data, listen to the story it is telling.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s