The Promise of Big Data is More Than Big Throughput

There are good reasons for the buzz behind “Big Data” and the promise it offers for business, government, etc.  We hear about the importance of data-driven decisions and how they can deliver exponential improvements in business performance.  But along the path to this better place some are distracted by the simplistic label “Big Data” and lose their way.

Specifically, for many, the idea that new technology can solve current problems faster and cheaper is just too enticing.  They like technologies built and priced for big data because they crunch their small data so easily.  The risk they run, however, is pigeon-holing big data technologies as just another incremental technology.  Make no mistake, the data revolution has far more potential than just solving current problems faster.

If you want to tap the full potential of the data revolution, the key is in the queries.  New systems running the same old queries don’t tap the potential of big data.  Or worse, some big data systems are not even focused on queries–they’re all about transactions and throughput.  There’s nothing wrong with making incremental improvements by doing more transactions or running old queries faster.  But the exponential benefits of big data come from amazing queries.  Add queries that include unstructured data.  Add queries that include structured but messy data you avoided before.  Think about big data answers in terms of fuzzy answers coming from many angles which triangulate highly accurate answers, like a smart phone positioning from triangulated cell towers, or Google targeting relevant ads based on the many things Google knows about each of us.  If we embrace the broader scope of data and queries made possible by big data technologies, we’re on the path to the exponential gains.

As evidence, I’ll point to the announcements by most big data vendors integrating search engines.  Cloudera and Datastax integrated Solr.  CouchDB and Hortonworks integrated Elasticsearch.  MongoDB is building its own integrated search engine.  As I’ve talked to these various vendors about these integrations, their answers surprised me.  They don’t start off emphasizing keyword search or even faceted navigation.  They start off emphasizing queries with good old fashioned filters, like country=US and month=December and color=red, or color=blue and year=2012.  While most popular big data key value stores or columnar databases can do some filter queries, the search engines are much more flexible at accommodating ad-hoc queries with an unknown number of fields in no pre-determined order.  So good-old field filtering plus the side-benefit of keyword search, faceted navigation, etc. is key enough to the big data vendors that they’ve done the integration work for you and offer a bundled search engine.

My recommendation is take advantage of these bundled search engines and study up on how to use them for filter queries, grouping and aggregations, and unstructured information analysis.  If that sounds too overwhelming, consider tried-and-true data analysis tools like OLAP solutions or even Excel.  I can hear the mob gathering to lynch me for that last sentence, but hear me out.  Queries are the key, or in other words, data-driven decision making is the key to exponential gains.  I submit that organizations using old technologies to achieve a new focus on data-driven decision making will out-pace those using shiny new big data technologies just to deliver higher throughput.  I further submit that those achieving exponential benefits are using big data technologies to incorporate new data and new queries into their laser-focus on data-driven decision making.

Avalon Consulting, LLC About Avalon Consulting, LLC

Avalon Consulting, LLC implements Big Data, Web Presence, Content Publishing, and Enterprise Search solutions. We are the trusted partner to over one hundred clients, primarily Global 2000 companies, public agencies, and institutions of higher learning.

Headquartered in Plano, Texas, Avalon also maintains offices in Austin, Texas, Chicago, Illinois, and Washington, DC.

Leave a Comment

*