Data Science Operation v2

Word on the big data sphere is that people are beginning to understand that data science is a big cup of mixed flavored nuts. In summary DS is part of a team and not the whole team. 🙂

Old news!!!? Okay,  but I have noticed there has been more mention of other part of the “data science“ team to be specific DataOps in recent times. Yes!!! Machine learning, deep learning are all important but like any good meal there is a secret ingredient that holds it all together. 

def defineDataOps(self, dataOps, devOps):
    """We are trying to simplifier dataOPs"""
    if dataOps == devOps:
        return False

Yes!! They are totally different. One might argue that dataops is to data science what devops is to the dev team. That argument might hold some truth but then again they both address something different and there is more to it. Since our last dataops post I have seen more information on dataops and I thought it was about time we have a dataOps v2.

What really is dataOps? Is it a manifesto? Strategy? Methodology? This bring us back to our definition of dataOps

def dataOpsContent(self, agile, devOps, leanManufacturing):
    """ What makes up dataOps?"""
    return agile + devOps + leanManufacturing

dataOps brings balance and proper communications between the data science team and stakeholders while reducing waste during the data lifecycle. 

Since we have concluded that devOps != dataOps. How then does it differ from devOps, if it has it in the cup mix? Difference in how they are handled or what is addressed; software development vs data but then in the data team we have software development done by the data engineers in building pipeline  which leads us the processes.

As companies embrace this strategy, what are the skills needed? Most times dataOps tends to overlap with data engineering because of the misconception that it is somewhat devOps but this team role is more of understanding the different facets of a big data team from code, infrastructure, communication and most importantly the data.

def dataOpsSkills(self, interestedInDataOps, haveSkills):
    """Note that the skills for this role depends:
        - Organization
        - Projects
        - Cost e.t.c """
    if interestedInDataOps:
        return haveSkills in ['automation', 'scripting', 'communication', 'customer support', 'and Lots More']

In summary dataOps has moved on from best practices into more of data management strategy while incorporating all of the different facets of a big data agile team. As companies break this more into a team as we have for devops, more and more of the issues we have with data projects failing to make it to production will be resolved.

What do you think about dataOps? Are there learnings on it like we have for Data Science, BI, Data Engineering?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s