Recognizing objects using a billion-image database

FindFace, an out-of-the-box solution by NtechLab, allows recognizing human faces and silhouettes, cars, and license plates. It is used in areas such as access control, fraud prevention, public safety, and behavior analysis.

Tasks and

The FindFace database stores image vectors—basically arrays of numbers. At the input, the system accepts a video stream from security cameras. The system recognizes target images such as faces, silhouettes, car models, and license plates. It also records new vectors. These vectors are then compared with the number arrays that are already in the database. Finally, the system displays search results.

In 2016, FindFace was a working prototype that encountered problems like database corruption and lack of scalability. At the same time, new functionality and reliability requirements arose. To make FindFace a scalable out-of-the-box solution, the developers had to do the following:

  1. Increase operation speed. Running pairwise comparisons on objects from large databases took more than 10 seconds and put a burden on the DBMS. Even with the use of indexes, it was impossible to avoid comparisons with all the images in the database.
  2. Make database expansion possible. Not only image vectors, but event metadata (like date of detection or camera name) was highly valuable. This metadata is necessary for queries like “show the top 10 most similar faces that were on cameras 1 and 2 from Thursday to Friday”. It was kept apart from the vectors, which was inconvenient. It was impossible to filter metadata by index quickly, and sometimes, there were desynchronization issues.

The NtechLab algorithm

Requirements scheme


To choose the best performing solution, the NtechLab team compared technologies from different vendors. The NtechLab developers wrote a function for comparing two vectors and tested it on databases of up to 10 million images. Tarantool showed the best performance in the comparative tests.

As of today, there is a Tarantool-based biometric database that stores biometric samples (feature vectors) and face detection events.

The production version uses sharding via a dedicated service that connects to Tarantool, that is, this service has addresses of all Tarantool servers, and it determines which instances should be accessed for which query. For heavy demands and fault tolerance, read-only replicas can be added. They partly relieve the master, taking care of the search queries. Developers use Tarantool as a repository with quick access to a certain range of data entries and the ability to filter it with indexes.

Solutions scheme

As a result, searches in the database of biometric vectors can be performed in the following configurations.

  • “Regular index” allows you to use camera and time indexes at the same time. Thanks to the regular index, you can quickly process queries in typical cases where new database entries are added very frequently. Vector search and comparison queries are performed not for all database entries, but only for those filtered by index.
  • “Quick index” was designed specifically for quick searches on biometric vectors. This separate feature is a great fit for cases with enormous databases (billions of entries) and strict SLA on searches. This index is built on biometric vectors, which allows to significantly reduce the number of comparison operations and increases the speed of searching through the database by more than 10 times. The search process via “quick index” is inverted—first we find entries by vectors, and then we filter them by metadata.
  • This index is built on biometric vectors. It significantly reduces the number of comparison operations and accelerates database search more than tenfold. The “quick index” search is inverted—first we find entries by vectors, and then we filter them by metadata.
  • Yet another operating mode allows connecting both indexes simultaneously. “Quick index” is used for old data, while “regular index” is used for new data. Such a solution combines the advantages of both approaches, but requires more resources. It is also much more difficult to administrate and operate.


The FindFace service can process information from cameras in less than 0.5 seconds, even if the database contains millions of images. A full walk through the database in search of a similar image no longer overloads the DBMS and is now six times faster. Note that Tarantool incorporates several functions: saving images to the database, searching through the databases, and implementing direct queries to the databases.

  • 0.5 s

    Recognition and search speed for a single image from a database with a billion images using the quick index.

  • > 1 bn

    More than a billion images can be stored in Tarantool.

  • 13.4 mln

    images per second are processed during search and verification with “regular index” on one Tarantool shard. In live projects, the number of shards can go up to hundreds and thousands with parallel processing.

Tell us all about your project’s objectives, and we’ll build a tailored Tarantool-based solution

success stories

How we combined all data for the investment business of Alfa-Bank
Alfa Bank Logo
How Gazprombank speed up internal systems 50 times using the Tarantool
GPB Logo

a consultation

a demo

Thank you for your request

Tarantool experts will contact you shortly