Scalable Search Auto-Complete

For us search integrators, auto-complete is one of our most enjoyable challenges.  But make no mistake, implementing scalable search auto-complete can be a challenge.

Search Auto-Complete (aka instant search, find-as-you-type, look-ahead, predictive search, and many other names) is becoming more popular because if done right it can be great for end users, but the performance is no small issue since it has to respond for *every key stroke* of every user, and if it doesn’t respond almost instantly it still bogs down your servers but doesn’t really help end users.  So a search auto-complete solution must be:

  1. responsive (sub 100 ms) – so users see responses as they type
  2. high throughput (5-10x your existing search traffic) – so it can handle every keystroke for every user

Our most recent auto-complete project was for one of the most recognizable names in the financial industry.  For their implementation we had to execute auto-complete on over 50,000 items, and while auto-complete obviously only works well if it’s *very fast* (less than 100 ms), we also had to make it scalable enough for a very high traffic site.  50,000 items is too much to send to the web browser to do my favorite approach, client-side auto-complete, so we had to use AJAX in this case.  To make a long story short, we found that the most responsive, scalable, and flexible solution was to search strings in memory in the application layer (Java in this case).  No, we didn’t go back to the search engine for each keystroke, because for this use case it simply wasn’t responsive nor scalable enough.

I keep looking closely at offerings from search vendors to see if they provide a more packaged solution than the Java approach we used, and I’m excited to report what I’ve found in MarkLogic.   Our enterprise web team has been heavy into MarkLogic work, and I decided to experiment with that platform.  I found a gem called search:suggest which has two things I like a lot:

  1. It provides an uncommonly great user experience starting down the path of a demo I’ve long looked at as the future of advanced auto-complete
  2. My testing shows it is *very responsive and scalable*

My results with 50 concurrent JMeter threads show:

  • 23 ms average response time with 411 ms max response time
  • 82 qps average throughput

I must say I’m impressed!

For another comparison point, I implemented auto-complete the way a large MarkLogic client showed me they were doing it, using cts:element query, with a wildcard (*) appended to each search item.  It didn’t do nearly as well as search:suggest, with average response times taking several seconds, and max response times over 11 seconds.  So, no big surprise here . . . standard wildcard searches are expensive.  Good thing MarkLogic offers us search:suggest.

These were quick tests, and I hope to explore this with much more rigor, but I had to share my initial findings.  If you want to understand more details about what I did, feel free to reach out and I can provide more details.  Briefly, here’s what I did:  I pulled out an old JMeter configuration I’ve used in the past to test auto-complete.  It’s based on a database culled from geonames.org, with a test script matching one letter at a time for each city name, mimicking a user typing the city names . . . know I was matching over 20,000 items, with several keystrokes per item, and a configuration to *hammer* the system to test performance and maximum throughput.

Sam Mefford About Sam Mefford

Sam contends that the next information revolution will be built on Search--not web search, Enterprise Search. His current role as Enterprise Search Architect and Practice Lead for the Enterprise Search team at Avalon Consulting, LLC not only puts Sam in contact with the engineers, product managers, and CEOs driving innovation at leading Enterprise Search vendors, but at the other end of the spectrum also allows Sam to meet and spend time face-to-face with end users of Enterprise Search to understand their requirements and educate them on what is possible with modern Search technology. His background as a hard-core developer and his passion for great user experiences put him in a unique position to cut through the hype and promote the technologies really making a difference for end users, such as Guided Navigation, Entity Extraction, Spotlighting, Virtual Documents, etc.

Leave a Comment

*