Similarity search plays a key role in many applications, such as e-commerce, drug discovery, Natural Language Processing (NLP), and visual search.
One of the key challenges in the era of big data is managing similarity search in applications where databases scale to 1B items and beyond, doing so with low latency and cost-effectiveness. Low latency is critical for many online applications.
In this paper, we introduce the Gemini® Associative Processing Unit (APU) and present its role in an Approximate Nearest Neighbor (ANN) similarity search pipeline. Latency and recall numbers for query-by-query ANN searches using the DEEP1B dataset will be provided. DEEP1B consists of 1 billion, 96-dimensional vectors with each dimension being a 32-bit floating point (FP32) number. The Gemini APU can efficiently handle either batch mode or query-by-query requests, but in this paper we present query-by-query results rather that batch mode numbers because in many real-world online applications, requests arrive one by one.
To read the entire whitepaper, please request it here.