A Wake-up Call: Google, support Faceted Search!

I believe confusion persists about whether Google’s search appliance truly supports faceted search.  First I’ll point out that Google Labs has two projects which claim to add faceted search features to the GSA (more details on those below).  Those projects probably work for some and looked tempting when we first found them, but as we dug deeper we found they are what I’ll call “bolt-on” faceted search, which simply isn’t scalable for most of our clients, because we’ve only seen two approaches to “bolt-on” faceted search when a search engine doesn’t provide native support:

  1. Run a separate query for each facet value – this multiplies the number of queries by the number of facet values, obviously adding significantly more, usually too much more, search engine traffic
  2. Pull a large set of results and calculate facets on that set – this approach obviously cannot provide an accurate list of facets or matching results for each facet across the entire result set when there are too many results to reasonably pull all matching results into the “bolt-on” code to run the calculations

In both cases the core problem is that the search engine doesn’t offer native support for faceted search.  Faceted Search is one of those features that simply must be supported by the engine in order to provide accurate, scalable facets for high-volume or high-traffic enterprise search implementations.

So the simple fact is: as of version 6, Google Search Appliance does not offer native support for faceted search.  As I said, there are two “bolt-on” solutions which we can’t recommend, and here’s why:

  1. gsa-faceted-search takes approach #1 above, running a query on the search engine once for each facet value. You can see this in code-snippet-1.1.txt on lines 75 and 79. For the example they provide, that would multiply your search engine traffic by 11x whatever traffic you had before adding this feature. For our customers it’s not unusual to have over 50 facet values, so that would require the search engine to handle 50x the normal traffic.
  2. <span class="kwd">41:    for (var i in facetDefinition) {
    . . .
    </span><span class="kwd">75:        xmlDoc.load(countURL);
    . . .
    79:            xmlhttp.open("GET", countURL, false);
    </span><span class="pln">
  3. GSA Lab’s parametric project takes approach #2 above (Pull a large set of results and calculate facets on that set).  If you look at googleParametric.js, you’ll see that line 168 only requests the first 100 results to calculate its values, so you’ll never see a facet value unless it’s attached to one of the first 100 results, and the counts for each facet are not accurate because they can only know how many of the first 100 results matched that facet.
  4. <span class="kwd">168:   var</span><span class="pln"> url </span><span class="pun">=</span><span class="pln"> </span><span class="str">"http://"</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> mTGSAHost </span><span class="pun">+</span><span class="pln"> </span><span class="str">"/search?"</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> mTURL </span><span class="pun">+</span><span class="pln"> </span><span class="str">"&amp;num=100"</span><span class="pun">;</span><span class="pln">

This problem is not unique to Google.  I’ve heard many Ultraseek implementers brag about their success with creative ways to “bolt-on” faceted search.  But every time I’ve dug deeper, I’ve found one of the two approaches listed above, either not scalable or not accurate.  Luckily, Ultraseek 6 added IDOL (which has long had native support for facet search) under the hood, so Ultraseek customers can now upgrade and have scalable and accurate faceted search.  We’ve helped many customers through this process, and been very pleased with the outcome.

Why Google has not yet woken up to faceted search, I cannot explain . . . for more hand-wringing on that topic, see Daniel Takenlung’s post.  Hopefully for their tens of thousands of Google Search Appliance customers, Google will resolve this issue soon.

Avalon Consulting, LLC About Avalon Consulting, LLC

Avalon Consulting, LLC implements Big Data, Web Presence, Content Publishing, and Enterprise Search solutions. We are the trusted partner to over one hundred clients, primarily Global 2000 companies, public agencies, and institutions of higher learning.

Headquartered in Plano, Texas, Avalon also maintains offices in Austin, Texas, Chicago, Illinois, and Washington, DC.

Leave a Comment