The Big Data Maturity Model and Shifting to a DataOps Enterprise

The Big Data Maturity Model and Shifting to a DataOps Enterprise

Big Data isn’t a temporary buzzword; it’s a reality that’s here to stay. The Big Data Maturity Model helps organisations to recognise and strategically grow their own capability towards data driven decision making.

Gartner’s often cited report that by 2020, the number of internet connected devices will surge to over 20 billion is a startling substantiation of Big Data’s longevity. The data influx is now measured in petabytes and zettabytes.

Limited Expertise hampers Big Data Maturity Expertise

Cisco recently issued a prediction that global internet traffic will increase to 105,800 GB per second in the year 2021. To place this in a more revealing context, by the same year, if you wanted to sit down and watch all of the videos uploaded each month, you’d spend more than 5 million years doing so.

However, not all data is equal in value. Arguably, the raw data flowing into your databases is only valuable if it’s utilized, and the analysis of its utility is recognized after it’s been processed. Your enterprise objectives, both internally and externally, drive data evaluation and decision making.

But, there’s an infrastructure issue at hand in terms of personnel. While data warehousing and data management solutions continue to appear on the Big Data landscape, companies still require experts in data engineering, data science, and data analytics to parse through the data swamp.

Referring to data scientists, in particular (and these are the employees or partners who specialize in the advanced predictive analytics discussed below), there is a growing shortage. Indeed, by 2018, there will be a need for 181,000 people with “deep analytical skills and 5 times that number” for the purpose of data management and advanced analytical skillsets.

Under Utilization of Advanced (Predictive) Analytics

In KPMG’s summary report “Going Beyond the Data” only 14% of the companies surveyed stated the belief that they currently have “all the talent and capabilities they need” to support their Big Data initiatives. In fact, within the same survey:

  • 8% were rarely data driven
  • 53% were somewhat data driven
  • 39% were highly data driven

There are many trends occurring in Big Data and one of the most noticeable point of departure between these self-reported classifications was the percentage of predictive analytics they used. Of the 39% “highly data driven” enterprises, 36% used predictive analytics. Meanwhile, the rarely data driven and somewhat data driven used predictive analytics 13% and 27% of the time. There are a lot of organisations whose Big Data maturity has a long way to grow.

Your predictive analytics are crucial to answering the question, “What is likely to happen next based on the data?” While the foundation beneath prediction includes descriptive, diagnostic and prescriptive analysis, they only provide:

  1. What happened in the past.
  2. Why “x” happened.
  3. What might be

Meanwhile, predictive analytics is present as a signal of what is likely to happen. The emphasis is on the word “likely” due to the probabilistic nature of prediction. But, underlying predictive models are mathematically complex. And though machine learning and Artificial Intelligence applications are alleviating some of the personnel quandaries, “machines don’t eliminate human judgment; they change where it’s needed.”

As such, one of the primary decisions to be made when working to shift your enterprise to a data driven operation (DataOps) is determining whether or not outsourcing certain analytics personnel is feasible. The aforementioned Deloitte report gives a snapshot as to how Cisco met the data analytics talent shortage.

Another factor to consider is security and accessibility with regard to non-employees. If you outsource this process to another firm, the who, what, where, when, why, and how of the data must be assessed. For example, in the U.S., health data has privacy regulations attached. Also, given the EU’s passage of the General Data and Protection Regulation Act, companies who collect data from individuals within the European Union have additional restrictions and/or mandates in place.

Once you’ve established, or while you’re in the process of establishing your data personnel solution, there’s an additional infrastructure dimension to focus on. How you structure the data culture within your enterprise dictates how quickly you can manage, derive insight, and make robust data driven decisions.

Backend vs. Frontend Data Driven Decision Making

Backend and frontend data driven decision are deeply intertwined and cannot wholly be uncoupled from one another. Certainly, data analytics is dependent on data engineering and management protocols such as extraction, transformation, and loading. Also, without accessibility pipelines that align with your frontend users purpose for data usage and analytics, no actionable insight can be derived from the data lake sitting on the cloud or your on premise servers.

Backend data decisions to consider:

Your backend Big Data focus includes the hardware and software used to collect and store your data. The information technology and data engineering teams are the backbone for designing the required schematics based on the type of data and the required pipelines allowing access to other personnel. Several key questions that help to identify Big Data maturity within an organisation include:

  • How many data sources are flowing into your organization? (e.g. a company such as Under Armour has multiple sources such as wearable devices, app downloads from iTunes, user data entered through their website, sales data, etc.)
  • Are you currently using a cloud based data storage and management system?
  • Do you have certain regulatory requirements regarding the storage of your data (e.g. medical information, financial data, etc.)?
  • Will your data engineering and IT teams use an out of the box solution for your database and management systems? If so, is that solution scalable as data volume and velocity increase?

Frontend data decisions to consider:

There’s a distinct difference between your data scientists and your data analysts. Your data scientists need access to a certain amount and type of data to train and test predictive models. As such, they use preferred programming languages to do so (R, Python, and Scala are still the most widely used as of 2017). This bears mentioning because, despite the fact they aren’t data engineers, data scientists have a hybrid approach between the backend and frontend data. Of course, Software as a Service (SaaS) companies are working to shift everything to a frontend graphical user interface (GUI).

Meanwhile, your data analytics team is wholesale frontend. The dependent factor is how you decide to organize the data flow. But, data analysts work primarily with frontend software applications to run their descriptive, diagnostic, and prescriptive analytics. Frontend questions often include:

  • Which analytics platform can best handle the various data inputs (social media analytics, CRM, customer service, inventory, accounting, etc.)?
  • Which analytics platform can integrate with your current backend infrastructure?
  • Do any or all of your teams need tailored access to the database?

With regards to data access, Data Scientists usually require access to a specific area of the database where they can pull data directly, clean the dataset and test predicitve models; whereas data analysts may need access to a limited subset of data depending on their departmental objectives.

These are, by no means, exhaustive lines of self-query. But, they are foundational in assisting you with designing a data flow framework for your data team. The reliability of your decisions largely depends on the expertise of your data team which, in turn, provide answers and solutions for your particular enterprise data infrastructure and analyses. Be honest about questions like these will go a long way in helping to assessed the Big Data maturity of your organisation.

DataOps and Decision Making in the Enterprise

Per Ashish Thusoo, former Engineering Manager at Facebook, who co-authored the book Creating a Data-Driven Enterprise with Data Ops, “DataOps is a new way of managing data that promotes communication between, and integration of, formerly siloed data, teams, and systems. DataOps closely connects the people who collect and prepare the data, those who analyze the data, and those who put the findings from those analyses to good business use.”

At every point in the Big Data chain there is a decision made that has a domino effect on all other interconnected systems. A DataOps oriented organization integrates and facilitates a democratization of data through a centralized infrastructure.

  • Data is kept all in one place.
  • Though data engineers are still in charge of maintaining and providing access to the data, specialized subgroups of your data team work closely with each department to ensure continuous and accurate access to meet the departmental needs for the right data at the right time.
  • Create a self-service framework for all of the internal data users.
  • Organize a data driven culture.

But, as we will see, Ashish takes this a step further and defines the process for achieving DataOps decision making maturity through the five-step Big Data Maturity Model.

Big Data Maturity Model: Moving from the Data Lake to Data Driven Decisions

Your analytics team needs access to data, that’s a given. But, your data team will perpetually be the gatekeepers of access in some way, shape or form. Given Ashish’s direct experience with one of the behemoths of Big Data – Facebook – he has a first-hand understanding as to the trials and tribulations of Big Data. Recently, he constructed a Big Data Maturity Model which reflects how Facebook achieved data driven decision ‘Nirvana’.

Stage 1: Aspiration

  • You’re still researching the ROI of investing in Big Data infrastructure.
  • Data is entering the enterprise through a number of different software applications or other sources.
  • You’ve hired or are about to hire a data engineer — perhaps you already know that you need a data engineering team.
  • If you have a data team already in place, all internal teams or departments must request access to the data through the data team.

Stage 2: Experiment

  • Further research into a Big Data platform including cloud based infrastructure and data warehousing.
  • You’ve specified one particular problem encountered with the increasing amount of data, and you’re preparing to solve it — though you might not be sure how to do so at this time.
  • You’re still working on the cost-benefit analysis of the project and might not have secured a budget to move ahead.
  • At this point, there may be an ingrained culture of the data team as gatekeepers, and that also requires a shift.

Stage 3: Expansion

  • By now you have achieved a level of Big Data maturity that has true buy in from decision makers in your organization.
  • You’re aware and in the planning or implementing stages of incorporating further extensions of your data team such as data scientists and departmental analysts (e.g. business analysts, marketing analysts, etc.).
  • You’re most likely running different software products for each department such as inventory, sales and marketing, analytics, accounting, and these require access to the real-time data influx.
  • Scalability is providing the pressure to find highly skilled personnel and prioritize the several possible avenues for managing the near constant innovation in how you store, retrieve, and analyze real time data.

Stage 4: Inversion

  • Data accessibility for both consumers and employees has been achieved to a large extent, but there are still bugs in the system.
  • The data team is still the stewardship of ETL, building pipelines, and the requisite processes for data warehousing; however, everyone has streamlined access to a single data warehouse.
  • Scaling is still an issue because the data flow fluctuates.
  • Costs are continuing to rise, database infrastructure (particularly if it’s on premise) isn’t scaling as fast as the need for accessibility, and there is danger in not meeting your service level agreements.

Stage 5: Nirvana

  • Your data team can now tailor pipelines to meet user needs, and data self service has been achieved.
  • The data is collected into a single accessible place — which democratizes the data.
  • Perhaps you’ve deployed an AI or machine learning model to enable Autonomous Data Management (ADM).
  • Scaling the database infrastructure up or down is simplified such as through a cloud based platform, where less time and money are spent on infrastructure management.

Key Takeaways

There’s no turning back now. As long as internet connected devices continue to pump raw data into servers across the globe, the Big Data Era is continuing in perpetuity. In a sense, it can be likened to a 21st Century gold rush. However, a distinct difference is that data is not scarce; it’s overwhelmingly available. The “mining” has now shifted to finding the right data at the right time to make the right decisions for your enterprise. Automated tools via machine learning and AI may help automate many of the lower level processes. But, if you’re not operating an internal culture of data driven decision making, then those tools will not be utilized efficiently and to the their full capabilities.

As such, before you can begin functioning from a data driven perspective, your first actionable decisions require establishing a data driven culture complete with a team of experts who perpetuate the DataOps framework. This will increase the speed and efficiency of your organization reaching the Nirvana stage of the Big Data Maturity Model.