With massive billion-scale inventories, it is critical for Alibaba and eBay to find a way to compactly store their visual data and to do so in a way that is computationally efficient. One way that both companies are using to accomplish those tasks is by converting floating point feature vectors to compact semantic binary vectors. An efficient Hamming space search can then be performed on those binary vectors.
Walmart Labs/Jet.com is another example of a company that is using binary vectors in their similarity search pipeline.
Multimodal Search with Elasticsearch
In one of our previous posts, we detailed how Walmart Labs/Jet.com used Elasticsearch, a popular text search engine, to build a visual search engine that works with their existing text search engine to provide multimodal search.
Illustration of multimodal search where customers can express their interests both visually and textually.
Walmart Labs/Jet.com has built upon that previous research and recently published a paper detailing how they used Hamming search to further empower their multimodal Elasticsearch architecture. In discussions with the researchers from Walmart Labs/Jet.com, they stated that their tests showed competitive performance against existing approaches (e.g., Facebook AI Similarity Search (FAISS)).
In particular, they found that their approach ,which they call FENSHSES (Fast Exact Neighbor Search in Hamming Space on Elasticsearch), is much faster when the objective is finding images with high similarity, (e.g., images within <5% difference in their semantic binary representation relative to the query image). FENSHSES also substantially reduces the RAM consumption (by at least half) compared to traditional nearest neighbor search solutions.
You can get more information about how Walmart Labs/Jet.com used Tensorflow Serving and Elasticsearch to integrate textual and visual search by visting their git repo page and this video demo.
Author: Pat Lasserre