Posts Tagged ‘google’

Is MarkLogic a Search Engine?

Monday, September 26th, 2011

I am frequently asked if MarkLogic is really a search engine.  It is easy to debate whether MarkLogic fits the classic definition of a search engine.  In my opinion, this is the wrong question.  The question you should be asking is “Does MarkLogic enable great search experiences?”  The answer is undeniably Yes.

MarkLogic comes with all of the standard search capabilities like: keyword search, synonyms, fuzzy search, hit highlighting, sorting, faceted navigation and relevance.  These are the basic features that every search engine should have.   MarkLogic checks the box on every one of these and more.

The fact that MarkLogic can do all of the basics makes it just like all of the other search engines on the market.  What sets MarkLogic apart is that it is not just a search engine.  MarkLogic combines some of the best features of search with a fast performing XML database.  This combination allows MarkLogic to offer features that traditional search engines lack.  Four of the most important differentiators are:

  • multi-level searching,
  • editable search results,
  • schema flexibility,
  • and simplified architectures.

MarkLogic allows for multi-level searching.  Most search engines require you to flatten out the data for search results.  MarkLogic is an XML database.  As a result, information can be stored in a hierarchical format and queried at multiple levels.  This is particularly important in more complex search experiences.  For example, if you are searching large documents, you may want to show the documents that contain your search term along with the sections of the documents that have that term.  Normal search engines would require you to create multiple collections or a complex search screen.  MarkLogic handles these situations naturally.

MarkLogic’s database features allow you to create applications with editable search results.  Our architects call it a “Live” search tool as opposed to a “read only” search tool.  Traditional search engines are designed to be read only.  Edits to existing search data require re-indexing.  Solution providers like Avalon create special indexing routines to allow for updates to content.  These solutions are not real-time and they are not simple.  Fields can be updated or added to a MarkLogic database at any time, transactionally, with full ACID protection.  This flexibility allows us to create a number of really interesting search applications that would have been much more difficult with standard search engines.  For example, we have created tools that allow end-users or administrators to “tag” one or more search results (similar to the functionality in Flickr).  In other applications, we have created search screens where the users can edit the search results without leaving the screen.  Adding these cool features to our search applications is much easier with a combined database and search engine.

As an XML database, MarkLogic provides schema flexibility for storing and querying information.  Our developers and our clients love MarkLogic because it is easy to add new fields to the index.  Traditional search engines typically require administrators to delete and reload the data in order to add specific fields.  In extreme cases you have to re-index an entire data set.  MarkLogic’s schema flexibility becomes even more important when you are working with techniques like entity extraction.  Text Analytics tools can identify people, places and things within unstructured text.  Through this process our clients often find interesting things they want to include in their search applications.  MarkLogic makes it easy to run text analytics against unstructured documents and include the entities in the search results.  Traditional search engines add a great deal of complexity to the process and do not allow for changing structures.

Our architects like MarkLogic because of its simplified architecture.  The next time you meet with your search engine vendor, ask them for a physical architecture diagram of one of their larger implementations.  At a minimum you will have a database or file system to store documents and data, a search indexer, a search server, and a web server.  Large data sets get even more complicated.  Search results have to be clustered and replicated.  You will need multiple indexers and search servers running.  You will also likely need more than one web server and application server for your front end application.  MarkLogic is a database server, search engine and applications server in one tool.  It also has built in replication.  This means fewer servers and less complexity in your dev, test and prod environments.

One final reason to use MarkLogic to power your search applications is that MarkLogic is not just a search engine.  Traditional search engines are very powerful, but they are expensive and limited to search-based use cases.

  • Want to publish thousands of documents to your website or mobile devices.  Some of the largest publishers in the world use MarkLogic to do this every day.
  • Want to build an application that allows users to build reports on the fly by combining sections from other documents.  Those same publishers use MarkLogic offer custom publishing solutions.
  • Want to create a central repository tracking all of your digital assets.  We are working with three different customers using MarkLogic as a central repository across all of their content management systems.
  • Do you need a tool to capture unstructured information for your Big Data solution.  MarkLogic does this for numerous government customers.

At the end of the day, when your management asks you how much you spent on your search solution, it is nice to say that the tool you bought does more than just search.

In fairness, MarkLogic may not be the best solution for an organization that is looking to build a vanilla search intranet that indexes content from numerous secure repositories.   Search engines like Endeca, Autonomy, Vivisimo and Lucene/Solr were designed for these types of solutions.  If, however, you need to build a powerful search application that will change over time, MarkLogic is a great choice.  It offers many valuable features that are not available in any other search engine.

A Wake-up Call: Google, support Faceted Search!

Monday, January 11th, 2010

I believe confusion persists about whether Google’s search appliance truly supports faceted search.  First I’ll point out that Google Labs has two projects which claim to add faceted search features to the GSA (more details on those below).  Those projects probably work for some and looked tempting when we first found them, but as we dug deeper we found they are what I’ll call “bolt-on” faceted search, which simply isn’t scalable for most of our clients, because we’ve only seen two approaches to “bolt-on” faceted search when a search engine doesn’t provide native support:

  1. Run a separate query for each facet value – this multiplies the number of queries by the number of facet values, obviously adding significantly more, usually too much more, search engine traffic
  2. Pull a large set of results and calculate facets on that set – this approach obviously cannot provide an accurate list of facets or matching results for each facet across the entire result set when there are too many results to reasonably pull all matching results into the “bolt-on” code to run the calculations

In both cases the core problem is that the search engine doesn’t offer native support for faceted search.  Faceted Search is one of those features that simply must be supported by the engine in order to provide accurate, scalable facets for high-volume or high-traffic enterprise search implementations.

So the simple fact is: as of version 6, Google Search Appliance does not offer native support for faceted search.  As I said, there are two “bolt-on” solutions which we can’t recommend, and here’s why:

  1. gsa-faceted-search takes approach #1 above, running a query on the search engine once for each facet value. You can see this in code-snippet-1.1.txt on lines 75 and 79. For the example they provide, that would multiply your search engine traffic by 11x whatever traffic you had before adding this feature. For our customers it’s not unusual to have over 50 facet values, so that would require the search engine to handle 50x the normal traffic.
  2. 41:    for (var i in facetDefinition) {
    . . .
    75:        xmlDoc.load(countURL);
    . . .
    79:            xmlhttp.open("GET", countURL, false);
    
    
  3. GSA Lab’s parametric project takes approach #2 above (Pull a large set of results and calculate facets on that set).  If you look at googleParametric.js, you’ll see that line 168 only requests the first 100 results to calculate its values, so you’ll never see a facet value unless it’s attached to one of the first 100 results, and the counts for each facet are not accurate because they can only know how many of the first 100 results matched that facet.
  4. 168:   var url = "http://" + mTGSAHost + "/search?" + mTURL + "&num=100";
    

This problem is not unique to Google.  I’ve heard many Ultraseek implementers brag about their success with creative ways to “bolt-on” faceted search.  But every time I’ve dug deeper, I’ve found one of the two approaches listed above, either not scalable or not accurate.  Luckily, Ultraseek 6 added IDOL (which has long had native support for facet search) under the hood, so Ultraseek customers can now upgrade and have scalable and accurate faceted search.  We’ve helped many customers through this process, and been very pleased with the outcome.

Why Google has not yet woken up to faceted search, I cannot explain . . . for more hand-wringing on that topic, see Daniel Takenlung’s post.  Hopefully for their tens of thousands of Google Search Appliance customers, Google will resolve this issue soon.

Does Ebay beat Google at its own Game?

Tuesday, October 7th, 2008

I just read a New York Times article recommending we use Google for everything including searching Ebay, and while I can see the reasoning, I can see at least one critical feature Google must add before I can recommend it as the ideal search experience: Faceted Search.

When our Enterprise Search clients think of improving the search experience for their users, they want to learn from the best, and frequently they look first to get great ideas from Google. Nobody questions that Google is the King of Web Search. Google has captured our hearts and our clicks, and profited tremendously. But is the Google search experience the one to copy for Enterprise Search?

I find it very interesting that Ebay has Faceted Search, yet Google still doesn’t. Why does Google omit this feature? I can see many reasons…but then I could see many reasons they omitted suggest as a default feature, and they surprised me last month and added it to their default search box. So Google may soon surprise us and add Faceted Search, but for now I will keep telling clients that the Ebay search experience is a better one to copy than Google’s.

The first step on Ebay is what we all expect: enter a few keywords, and get back the best results the engine could find based on those keywords. The second step on Ebay is not found on Google: click the most appropriate category link on the left to narrow results to what you were really looking for (box A below).

You’ll also notice several other very handy ways to filter there on the left: by price, new/used, auction/buy it now (box B above). But these filters apply to any product on Ebay. Now comes my favorite feature of all when you click on the “Laptops & Notebooks” category, you now get filters especially helpful for narrowing to exactly the laptop and notebook you’re looking for, like screen size, processor speed, and memory:

The fundamental difference here is that Ebay is working with more structured information, whereas Google is working with mostly unstructured information. The better the structured information, the better the Faceted Search experience. Of course, Google is expert at taking unstructured information and extracting structure, as best shown by the amazing features on Google Map’s Local Search. If only they would do the same for Froogle, it could quickly be as good, or better, than the search experience on Ebay.

I know the search experience on Ebay isn’t perfect, none is…but with the powerful Faceted Search offered by Ebay, we have a much higher chance of finding what we’re looking for. Don’t your users deserve the same? Next time you’re contemplating improvements to your one of your organization’s search interfaces, don’t forget to include Faceted Search.