Data Governance in the World of Big Data

The Gartner Group recently published a study that stated:

“… by 2017, 33 percent of Fortune 100 organizations will experience an information crisis, due to their inability to effectively value, govern and trust their enterprise information.”–Gartner Press Release, February 27, 2014

They go on to blame this on the ever-increasing amount of data that is being collected, stored and analyzed.

While I agree with the conclusion, I disagree with the timetable. They are describing a situation that has been brewing for the past 15-20 years and is now accelerating to crisis proportions. Today many companies have multiple data sources that contain different definitions of basic things like revenue, customer, sales and fulfillment rate. Companies routinely extract information from enterprise data sources and manipulate them in Excel spreadsheets. At this stage data governance goes out the window.

Establishing appropriate governance over corporate Data has always proved to be a challenge. The situation is indeed getting worse. With the addition of Big the problem has gained velocity.

Big Data and Analytics Tax the Traditional Data Governance Constructs

One of the primary reasons that Data Governance is a problem is current constructs of Data Governance are centered on two types of data: Transactional and Master. Implementers of ERP systems focus of these two types of data almost exclusively.

The game changes when instead of just wanting to process transactions (such as orders, payments, payroll) we want to consolidate or analyze transactional information to make better business decisions.

It should be noted that transactional data isn’t only coming from ERP systems; in the Big Data world it is also a tweet, a post or a click. The key is consolidating this disparate information so we are able to get “Business Decision Data” out of it. Business Decision data results from the combination of transactional and master-data to assess a situation. It could be as simple as finding the mean and variance of orders of a given product. Business Decision data could also be the customer sentiment about a given product created from consolidating tweets. In short any time we combine data to create new information we are creating, Business Decision Data.

Business Decision Data has been largely ignored in the field of Data Governance. The calculations and algorithms that they employ often go undocumented. As the number of big data and analytics projects increases, so does the amount of business decision data, accelerating the problem.

How Business Decision Data Multiplies

For example if company does customer sentiment analysis on a product line, they have to develop definitions of what constitute positive sentiment. The question is where does this definition exist. Is it the carefully documented ina metadata repository or is it solely within the mathematical or computational algorithm that is being used to calculate customer sentiment?

The key point is, if the definition it is not well documented or governed and really just captured in the algorithm then problems will arise. Multiple organizations may duplicate effort and produce their own definitions, making cross organization comparisons difficult if not impossible. Or worse yet, the existence of such conflicting definitions will lead to decision-making which is flawed and which harms the business. A situation where unbundling the algorithms to reconcile these definitions is required is a situation best avoided.

New Data Governance Considerations

We need to consider data governance in terms of some new conditions that have resulted from the Big Data movement:

  • Data now comes in a wider variety of different forms
  • The form of the data and what we intend to do with it determines how we process it.
  • Solutions that served us well in the ERP/Relational world cannot be relied upon to solve all of the situations that are present in the current Big Data world.
  • Silos of data, each inaccessible to the other, continue to block the ability to create analytics to assist business decision makers.

So the question is how do we develop a migration path or perhaps an evolution path that moves us to break down silos and create the data layer that allows us to apply analytics to a readily accessible data set that answers our business questions.

The Changing Data Landscape

Two big changes are going to drive the information infrastructure for years to come

  • Business Decision Data is becoming far more important. The importance comes not only from analytics applications, but also traditional business intelligence uses. This data may come from a combination of transactional and Master Data, but it may also come from other sources, like social media, sensors or weblogs. A key factor is business decision data is calculated. This added complexity and needs to be governed.
  • Transactional Data is no longer only contained in the ERP system. An email or a Tweet is a transaction, so is a phone call or information coming from a sensor on a railroad track.

Business Decision Data has become more important (not that getting the transaction right is any less important). Answering even simple questions (like ‘what is the average number of minutes that a viewer watched a video on a website’) requires a very different set of infrastructure than the ERP/EDW technology of the past.

The nature of what we think of as a transaction has changed. A transaction used to be a well-ordered predefined set of information. It fit neatly into predefined fields (whether the rules for entry were adhered to by the users or the quality of such data was good is a discussion for another day).

Today the transaction can be globally thought of as a packet of information. A subset of transactions (or yesterday’s transaction) is what might be called a structured or template transaction. A key difference between the two is that in a templated transaction all of the relevant parts have been predefined. This is not true for transactions in general. In fact, the relevant information in an email is dependent on the question we are trying to answer. Some items are standard like Sender, Receiver, and Time. But others are not. In fact what we find is highly dependent on what we are looking for. Are we looking for “words of praise” or “litigious phases”? The “meaning” to be gleaned from a transaction is dependent on the question or context we interested in.

The importance is from the standpoint of developing consistent definition of terms that allow for aggregation and analysis of data; things have just gotten a lot more complicated.

As semantic experts have noted transactions can have many contexts. Each of these contexts requires adequate and consistent definitions and governance. For a more detailed discussion on this area see Kurt Cagle’s blog, “Metadata Management – Semantics for Big Data?


Some closing thoughts:

  • Data Governance in the “Big Data” era will require governance of Business Decision Data.
  • Metadata and context definition will become more important and more difficult to maintain
  • New data models such as semantics need to be examined to augment the existing relational data structures.

This will be an evolutionary path, but if we don’t begin we will get to exactly where Gartner predicts.


Wayne Applebaum About Wayne Applebaum

VP of analytics and Data Science for Avalon Consulting, LLC.

I have over 30 years of experience in data analytics and enterprise consulting. It is my belief that, Great Analytics can only be enabled by Great Data.

I hold a doctorate in statistics and have spent 30 years working for companies like SAP, Oracle, Business Objects, and EDS to guided Fortune 500 executives in aligning analytics with business needs.

Many companies are only using a fraction of the data they need. When information is lacking, analytics projects fail. Events over the past 10 or so years have provided us with a unique opportunity to help companies leverage both structured and unstructured data to create a foundation that drives innovation and measurable business results.

Leave a Comment