Scaling Visual Search with Locally Optimized Product Quantization


Locally Optimized Product Quantization is an efficient and accurate approximate nearest neighbor algorithm for large scale and high dimensionality data sets. I presented this talk on the paper and Curalate’s use of it for the Papers We Love meetup in Philadelphia.

Being a meetup, I kept the talk pretty casual and encouraged interruptions from the audience.

Slides

[Keynote]

Abstract

Approximate nearest neighbor (ANN) search is an essential technique for big data applications, and has only become more relevant as the scale of our data has increased dramatically in the past decade. At Curalate, we use ANN to power our visual search technology that enables our clients to identify apparel products in user-generated photos from social networks.

Kalantidis and Avrithi present an extremely fast and accurate ANN algorithm in their paper “Locally Optimized Product Quantization for Approximate Nearest Neighbor Search” (LOPQ). Their key contribution is a quantization method that minimizes distortion by leveraging Principle Component Analysis on local regions of the data. This technique enables fast and accurate searching over data sets of millions or even billions of items.

In this talk, I will present the LOPQ paper by Kalantidis and Avrithi, and follow it with a deep dive into how we have implemented it at Curalate to power our visual search technology. By leveraging LOPQ, we reduced our storage requirements by 90% allowing us to store an index of 31 million images in 1GB of RAM and search it in a only few hundred milliseconds.

Resources