Want More Value from Analytics? Ask the Right Questions.

When it comes to analytics, if you don’t get the questions right the answers may not matter. Bad or ill formulated question sets (or as we will call them in this blog, “Analytics Question Space”) lead to answers that are far less usable then they should be.

Just like the joke about the Genie and the Three Wishes, you need to be careful what you ask for. One example of the joke is:

A poor starving peasant couple are granted three wishes and the woman, just taking the first thing that comes to her mind, wishes for one sausage, which she receives immediately. Her husband, pointing out that she could have wished for immense wealth or food to last them a lifetime, becomes angry with her for making such a stupid wish and, not thinking, wishes the sausage were stuck on her nose. Sure enough, the sausage is stuck in the middle of her face, and then they have to use the third wish to make it go away, upon which it disappears completely.

And just like the wishes in the joke we find that when it comes to analytics people too often choose their questions badly.

Why do I say this? On reason is based on observation of Fortune 1000 companies and the number of Excel Spreadsheets that they use to analyze their data. A few years ago I had an analyst asked me ‘Why don’t we have a tool that will take the results from report A and add them to report B and place them into the same Excel spreadsheet so I can analyze them without have to do the cut and paste myself?’

I told her we have such at tool. The real surprised came when she found out that it was the same tool they were using to develop the two original reports. The reason why the results weren’t in the same report was no one ever asked. But the issue goes further. Once she got the spreadsheet she did the same series of manipulations each week until she got the information that she needed to allow her to make the decisions and recommendations that were the core of her job. In essence an analytics tool was being used to extract data to Excel where it can then be analyzed. Somewhere along the way, someone asked for the single sausage. Not for what they really needed.

As the discussion proceeded, I asked about additional data she would have liked to make the analysis better. She had lots of good ideas. She also said she didn’t have the time to combine and manipulate more spreadsheets. Analytics were sub-optimized.

It is not a matter of assigning blame to the analyst or the IT organization or anyone else involved in the process. The point is how to do it better.

Doing it Better

 The root cause of the problem is really not asking the right question or set of questions. Lets define a “question space” as being the set of questions (and their associated answers) required to make a decision. Let’s take an example from The Stelter Company, an organization that, per their mission, specializes in “Personal Philanthropy marketing services to strengthen connections with donors and to inspire charitable giving.”

Stelter uses and continues to refine a variety of indicators to find persons more likely to make a contribution. For the purpose of this blog, lets assume that we are seeking to assist a university. Let’s begin to define things we might want to know about potential donors. Off the top of our heads we might come up with a list like this:

  • Annual Income
  • Net Worth
  • GPA while at the college
  • Age
  • Current affiliations with the college (alumni, athletics etc.)

This will yield a large but finite list of elements to include in the question space for our analysis. Stelter’s research has shown that age bands are an indicator of giving potential. They also found that what makes age a better predictor is when it is combined with “age of children.”

A reason why organizations have so much difficulty getting the answers they need is because they ask for a limited amount of information and don’t investigate the:

  • Complete analytics question space
  • Data that feeds the question space (or the Data Space)
  • Questions that others will ask them, once they have delivered the initial answers

We could spend time asking why this is so difficult, but I would rather focus on ways to solve the problem.

Defining the Analytics Question Space

Edward De Bono asks the question:

If you have 217 players in a single elimination tennis tournament, how many matches do you have to play to get a champion?

If you are like most people (and I am afraid that much of the time I am), you will want to grab some paper and a pencil and start drawing brackets. Or perhaps you will note that 124 is the closest number that is a power of 2 (the point where I can have simple brackets the rest of the way). Now all I have to do is figure out how to get the remaining 93 players down to 124. Once I have finish this exercise I count up the matches and provide the answer.

Having a bias for action is generally a good thing, but taking a step back and considering the analytics question space is also good.

The question asked, “How many matches?” We assumed that in order to get the number of matches we had to create brackets. But we didn’t examine the problem. To get a winner in out of a field of 217 players, we have to get 216 losers. Since each match produces one and only one loser, the answer is it takes 216 matches to get a champion.

A trick question, but it makes the point, consider the question before you plunge headlong into developing the answer. The phrase I use with my clients is, “Take a step back and look at the questions you are really trying to answer.”  A previous blog on Big Data, Analytics and Really Big Data Analytics, provides a nice example of this for the manufacturing space.

In the tennis example we would have eventually gotten the same answer but it would have taken us longer. With an enterprise question space, we sometimes plunge in too quickly and risk never getting the answer we really need because we don’t examine the question space fully.

So what is the definition of a analytics question space?  I’m going to take the liberty of defining the phrase, based on my 30+ years of experience, since a search of “Analytics Question Space” on Wikipedia and Google did not yield any appropriate results. The Question Space is the set of questions that exist or are necessary to answer a higher order question. Let’s look at an example. If we ask the question, “Is purchasing this set of loans for $10M a good investment?” The answer is, “It depends.” Depends on what? What defines the question space?

To list a few ‘it depends’ questions:

  • Do you have $10M?
  • Can you get $10M?
  • What interest will you have to pay?
  • What are the characteristics of the people who are paying on the loan(s)

There may be a large number of questions that are associated with this “good investment” question. While it is large, for most questions it is probably finite or at least given the limits of our imaginations all the questions are finite. We get to a certain point and don’t have any more ideas.

The same is true with the sub-question, “characteristics of people who are paying on the loan(s).” In identifying the question space, we are able to quickly understand what questions we need to answer in order to answer the main question. We can also determine the data needed to answer the question and what we will have to do to obtain the data. Another way of saying this is that we need to define the:

  • Analytics Question Space
  • Data Space
  • Data Availability

The point is to delay our bias for action, just long enough to feel comfortable that we have identified the question and the data space. Once you embark on this process and capture the results in a structure that you can analyze and reuse, you will find it takes a relatively short amount of time (days or weeks).  It creates a framework that can allow you to move faster to develop a more complete and valid analytics solutions

Some tips on how to apply this:

Have you really identified the stakeholders?

These are the folks who will benefit from you being successful. Not necessarily just who funded the project or the interested parties in your immediate organization. This is the extended community of people who will use the results.

The better you identify your stakeholders the better you can define your question space. This is not a go/no-go decision making body. This is a group of willing collaborators who say, “If you can answer these questions, I’m interested.”

Treat the Analytics Question Space Like a Wish List

When defining the Analytics Question Space you’re like Santa Claus. Everyone gets to make his or her wish. When you are defining the project, that’s where scoping and culling needs to occur. Collaborating with all (or at least as many as possible) stakeholders means accepting their questions freely as something in the Question Space.

Define the Data Needed to Support the Analytics Question Space Independently of the Data Availability

Again bias for actions tends to get in the way here. It is a relatively fast exercise to think about all the data you need to answer a set of questions. It is also an exercise that requires focus. This application of lateral thinking allows us to focus on defining the possibilities first without being inhibited by the details of the solutions.

Defining the Data Availability Last Allows You to Find Low-Hanging Fruit

Once you have described the question and the data space the work can begin to define the data availability. Some data will be readily available; other data might be harder to come by. By doing a high level pass through the data space, we can quickly see what is readily available and what data might be more difficult to acquire. Once we have done this we can determine the low hanging fruit and what questions, while more difficult, are still worthwhile.

The people involved at each of these stages (question space, data space and data availability) might not be the same.

Choose the best people to participate in each stage of development. The people who are best at defining the big questions may not be have the time or patience to define the data set to answer them. Likewise the people who can help with data availability may only be able to do so after the data that is needed is defined for them.

Reality Shrinks Project Scope

As you scope an individual project the question space gets smaller, either because the question space is constrained or because of the practicalities of data availability or other resource constraints.

So why ask about something that you may not deliver?  There are two answers to this question:

First, there is a big difference between knowing the complete target and deciding to go after only a part of it as compared to going down the path that is most apparent or expedient. By knowing the overall target we can build on the results of each project we attempt.

Second, defining the question space, data space and data availability early in an analytics program can be done quickly and accurately.  These pieces provide continuity between individual analytics projects and allow us to create analytic programs that deliver results.

Getting the right question space reminds of this scene from I, Robot. “That Detective is the right question.”

 

(If you found this post informative, see the next post in this series, “Want Answers to your Analytics Question Space? Don’t Be Afraid to Ask”  )

 

Wayne Applebaum About Wayne Applebaum

VP of analytics and Data Science for Avalon Consulting, LLC.

I have over 30 years of experience in data analytics and enterprise consulting. It is my belief that, Great Analytics can only be enabled by Great Data.

I hold a doctorate in statistics and have spent 30 years working for companies like SAP, Oracle, Business Objects, and EDS to guided Fortune 500 executives in aligning analytics with business needs.

Many companies are only using a fraction of the data they need. When information is lacking, analytics projects fail. Events over the past 10 or so years have provided us with a unique opportunity to help companies leverage both structured and unstructured data to create a foundation that drives innovation and measurable business results.

Comments

  1. An intriguing point, and some colorful examples. I was hoping you could share at least one example, though, that would be more germane to business analytics. The university donor pool case would be ideal. Can you identify improvements to question space, data space, and data availability specific to that?

Leave a Comment

*