There is a flurry of announcements around horizontal and vertical search engines offering ‘better’ and ‘more relevant’ content retrieval. SEOmoz.org has an interesting article on the ability (or inability) of search engines to return relevant results. In this entire discussion of retrieving more relevant there are two seemingly related problems – ‘personalized search’ and ‘disambiguation (of search terms)’.
Personalized search is the ability to convolute the search engine results with the user’s profile in order to return the most relevant search results. The user profile can be created by using the following information:
- User browsing (search, click-through, time-visited on site, etc) history
- Desktop data (files, content, most-accessed files)
- User feedback (asking users to rank order the results based on their preferences)
I tried searching for ‘jaguar’ using my iGoogle account. The results were mostly about the news of TATA Motor’s latest acquisition of Land Rover and Jaguar brands from Ford. The sponsored links included Reliance Money. The results were biased towards Jaguar – the Car.
iGoogle returns “related searches” for users to narrow down the search results. The privacy issues related to personalized search engines like iGoogle are still a concern for many users who would not like to share their browsing history and personal information with the search engine (read out of their personal computer). Another disadvantage of personalized search engine (in general many profile based data mining techniques) is their inability to return relevant results when the user is searching for some new information.
I might have searched for wildlife all my life because I am a wildlife photographer. I have made tons of money selling the scenic photographs and now i want to buy Jaguar – the mean machine! A black swan for the search engine – I will have to turn off the personalization feature to get my relevant results this time.
Disambiguation on the other hand tries to obfuscate the search terms into multiple ‘meanings’ and allow users to search across one or more of these concepts. The challenge here is the ability to understand all possible meanings – a harder problem for a horizontal search engine to solve. Vertical search engines work on a narrower data source with limited concept sources, hence disambiguation being a plausible solution to shake the sieve for relevant data.The figure below demonstrates a search for ‘jaguar’ using clusty.com:
The search results have been grouped into one or more clusters (cars, photos, clubs, cat panthera, etc). Clusty uses a suite of text clustering technologies that allows it to identify and label clusters in real time. This is an example of post-search context disambiguation where the search results are dynamically clustered to identify concepts. I personally prefer this approach to search engine optimization because it is easier for an uninformed user to narrow down his results through these concept groups.
Other forms of disambiguation would entail providing possible interpretations of the search term and then retrieving results for the identified concepts. This is a tougher problem because of it needs a more deterministic solution. How many interpretations can a given term have? Can we ascertain the content to have the same concept as the disambiguated one?
Post-search context disambiguation is my preferred approach for both horizontal and vertical search engines. As an (relatively) informed user I always modify my query if I don’t find useful results in the first few pages. A mix of personalization and clustering of search results is a good combination for relevant retrieval of search results.