Semantic vector search is a topic that’s getting a lot of attention, and many search developers are wondering if, and how, they should add it to their search solution.
In a recent Haystack LIVE! Meetup titled Evolving from Keyword to Neural Search, Branden Chan of Deepset presented their Haystack NLP framework and was asked what approach he would recommend for adding semantic vector search to a platform based on bag-of-words (keyword) search.
In this post, I will review Branden’s recommendation and propose a way to implement his proposal.
Traditional search engines, like Elasticsearch, rely heavily on a bag-of-words (keyword) approach that uses an inverted index along with TF-IDF or BM25 ranking functions. A sparse vector, based on literal keyword matching, is used to find similar/relevant information.
As we wrote about in a previous blog post, this keyword approach has some limitations in terms of understanding user intent. For example, it could omit documents that don’t share many keywords with the query but are relevant documents.
Combining Keyword and Semantic Search
Recognizing these limitations, many search developers have been wondering if they should add dense vector semantic search to their platform to better capture user intent. In fact, one of the attendees of Branden Chan’s recent talk titled Evolving from Keyword to Neural Search asked Braden for his thoughts about that.
The attendee mentioned that they have a content server, and they are currently using a bag-of-words (BoW) approach for search. They said that sometimes their users know exactly what they’re searching for, but other times their search is exploratory and, in that case, if you’re limited to a BoW approach, it’s hard to understand exactly what the user wants.
He said that he is interested in seeing if dense vectors could help him better understand user intent. He mentioned that he couldn’t switch to a dense vector solution “overnight,” due to implementation reasons and concerns of how it could change performance over time. He was interested in Branden’s advice on how to incrementally take advantage of dense vector search to make search better.
Branden said that you could use the existing sparse (keyword) score and combine it with a dense (semantic) vector score. He suggested passing the query to both a dense and sparse path, with each path having a score that you could weight.
Branden said you could take a cautious path of lightly weighting the dense score initially and then incrementally bumping it up and experimenting to see how to optimally weight it.
How to Add Semantic Search Incrementally
That would allow the user to leverage the powerful keyword search (sparse vector) capabilities of Elasticsearch and OpenSearch and combine it with semantic vector search (dense vector) of one of GSI’s k-NN plugins.
Installing the plugins is easy, and they allow for vector similarity search to be run as simply as any standard Elasticsearch query. The Elasticsearch k-NN plugin provides similarity search results in the standard Elasticsearch format, so a user could follow Branden’s advice of combining the sparse and dense vector scores. The user could lightly weight the dense score (k-NN result) initially and then incrementally bump it up and experiment to see how to optimally weight it.
For more information about GSI’s k-NN plugins, contact us at firstname.lastname@example.org.