Mashable just published an interesting article about Googles latest advancements in search. The article, Google Knowledge Graph Could Change Search Forever, describes a “developing vision for search that takes it beyond mere words and into the world of entities, attributes and the relationship between those entities.”
Google is building a huge knowledge base describing entities, relationships and attributes of those entities. Google’s vision is to make search smarter by identifying the people, places and things in the content that is being indexed. Once Google understands what is in the content, the engine can recommend related information and present more relevant results. To do this, Google is evolving from using statistical algorithms, thesauri and key word matching to capturing over 200 million entities and their associated relationships.
For example, if a user searches on Jeremy Lin Google will understand that you are looking for a basketball player in New York who played for Harvard. Proper results will include New York Knicks box scores and recent basketball news from the NBA, New York Knicks and Harvard. This is a lot more powerful than a list of links to articles containing the words “Jeremy Lin.”
Can we make Enterprise Search (search on your websites) just as smart? We need to. Even well-developed search, using facets and other techniques, is not satisfying the needs of your customers, stakeholders, and employees.
I believe entity-based search is a coming trend and I am not alone in this thought. After posting the story about Google on Avalon’s internal collaboration site, my peers responded with positive and informative comments.
When I first saw the description, I thought it sounded like a step toward RDBMS, but this is more like eduction on steroids. Not only are they extracting entities, but deciding how they relate with other entities. I thought that a leading search engine vendors entity extraction with eduction was impressive, but this goes far beyond that.
The mention of Freebase is significant, because Freebase is (was?) a Semantic Web database. In essence, you take 12 million entities and decompose all of the facts known about them into collections of assertions, some of them class relations:
person:BarackObama politics:holdsOffice office:PresidentOfTheUnitedStates.
person:BarackObama potus:predecessor person:GeorgeBush.
person:BarackObama politics:affiliation politics:USDemocraticParty.
person:MichelleObama relationship:spouseOf person:BarackObama.
person:MichelleObama educated:matriculatedFrom school:UniversityOfChicago.
some of them value relationships or labels:
person:BarackObama potus:firstTermStart 2008.
person:BarackObama potus:firstTermEnd 2012.
person:BarackObama foaf:FirstName “Barack”.
with several hundred million such assertions you can then ask generalized questions such as “give me all Democratic presidents who were preceded by Republican presidents in the twentieth-first century, and whose wives matriculated at the University of Chicago.
The resulting datasets in turn form graphs, which don’t necessarily correspond to the tree structures that you would expect from XML groves.
Bing has been going down this road for a couple of years – it started out as a Semantic database rather than simply a lexical search base. This is also important because this is more than just data enrichment, which is still largely lexical in nature – it also established relationships between entities within the datasets (and the significance of this is important because the same entity may have multiple names or designators, but if you have a system that can identify sameAs relationships, then your database becomes intelligent in that it can make inferences.
This is part of the reason why its important to not lose sight of this aspect of Big Data – Hadoop/MapReduce is significant because it provides a standardized way of performing parallel processing across commodity hardware. Semantic Web is important because it provides a way to make the resulting data internally aware and referential.
Interesting. It seems like this has the potential to close (eliminate?) the gap between 2 majorly different types of search – Discovery vs Targeted. The difference today is a big reason why Google search works so well and why so many firms have terrible Enterprise search. As the article said, I can google “the 10 deepest lakes in the U.S” and get decent results because google indexes so much content and their “cross our fingers and hope someone on the web has written about these things or topics” works.
But in the Enterprise, people’s searches are often much more targeted. On our Client X call today, we went through some very specific use cases where users needed to locate a very specific regulatory filing document. Here is one: “I’ve received a question from the agency regarding the US NDO sBLA submission and need to find the original submission information to answer it”. That is a different problem to solve. Because our engine doesn’t have the AI this article talks about, our solution is to leverage the real Intelligence of the user’s mind by giving him the ability to manipulate and filter the results using a combination of facets, keywords and sorting. And it works pretty well – much better than their current solution.
The approach this article takes would theoretically make it possble to cut right to the chase based on the original query. Pretty cool if they can get it to work. Question is – when will something like that be available – and affordable – to businesses?
Mike is asking the right question. Your search experience doesn’t need to invest in all possible searches, as Google has. Instead, I believe most enterprises can enrich their search by adding as few as 200 relevant entities and relationships. An intranet or public site search that has been made intelligent in this way can deliver the results that are specifically relevant to your users.
I’m excited about this new trend in search. Let me know if your organization has started exploring this approach. Avalon has helped dozens of organizations improve the usability and effectiveness of their search experience. We are ready to help you take the next step.