Scalable Search Auto-Complete

For us search integrators, auto-complete is one of our most enjoyable challenges.  But make no mistake, implementing scalable search auto-complete can be a challenge.

Search Auto-Complete (aka instant search, find-as-you-type, look-ahead, predictive search, and many other names) is becoming more popular because if done right it can be great for end users, but the performance is no small issue since it has to respond for *every key stroke* of every user, and if it doesn’t respond almost instantly it still bogs down your servers but doesn’t really help end users.  So a search auto-complete solution must be:

  1. responsive (sub 100 ms) – so users see responses as they type
  2. high throughput (5-10x your existing search traffic) – so it can handle every keystroke for every user

Our most recent auto-complete project was for one of the most recognizable names in the financial industry.  For their implementation we had to execute auto-complete on over 50,000 items, and while auto-complete obviously only works well if it’s *very fast* (less than 100 ms), we also had to make it scalable enough for a very high traffic site.  50,000 items is too much to send to the web browser to do my favorite approach, client-side auto-complete, so we had to use AJAX in this case.  To make a long story short, we found that the most responsive, scalable, and flexible solution was to search strings in memory in the application layer (Java in this case).  No, we didn’t go back to the search engine for each keystroke, because for this use case it simply wasn’t responsive nor scalable enough.

I keep looking closely at offerings from search vendors to see if they provide a more packaged solution than the Java approach we used, and I’m excited to report what I’ve found in MarkLogic.   Our enterprise web team has been heavy into MarkLogic work, and I decided to experiment with that platform.  I found a gem called search:suggest which has two things I like a lot:

  1. It provides an uncommonly great user experience starting down the path of a demo I’ve long looked at as the future of advanced auto-complete
  2. My testing shows it is *very responsive and scalable*

My results with 50 concurrent JMeter threads show:

  • 23 ms average response time with 411 ms max response time
  • 82 qps average throughput

I must say I’m impressed!

For another comparison point, I implemented auto-complete the way a large MarkLogic client showed me they were doing it, using cts:element query, with a wildcard (*) appended to each search item.  It didn’t do nearly as well as search:suggest, with average response times taking several seconds, and max response times over 11 seconds.  So, no big surprise here . . . standard wildcard searches are expensive.  Good thing MarkLogic offers us search:suggest.

These were quick tests, and I hope to explore this with much more rigor, but I had to share my initial findings.  If you want to understand more details about what I did, feel free to reach out and I can provide more details.  Briefly, here’s what I did:  I pulled out an old JMeter configuration I’ve used in the past to test auto-complete.  It’s based on a database culled from geonames.org, with a test script matching one letter at a time for each city name, mimicking a user typing the city names . . . know I was matching over 20,000 items, with several keystrokes per item, and a configuration to *hammer* the system to test performance and maximum throughput.

Avalon Consulting, LLC About Avalon Consulting, LLC

Avalon Consulting, LLC implements Big Data, Web Presence, Content Publishing, and Enterprise Search solutions. We are the trusted partner to over one hundred clients, primarily Global 2000 companies, public agencies, and institutions of higher learning.

Headquartered in Plano, Texas, Avalon also maintains offices in Austin, Texas, Chicago, Illinois, and Washington, DC.

Leave a Comment

*