Proximity Graph-based High-dimensional Approximate Nearest Neighbor Search: From Theory to Practice

PhD Qualifying-Exam

Proximity Graph-based High-dimensional Approximate Nearest Neighbor Search: From Theory to Practice

The Hong Kong University of Science and Technology (Guangzhou)

Data Science and Analytics Thrust

PhD Qualifying Examination

By Mr. Mingyu YANG

Abstract

With the rapid development of machine learning techniques and large language models, vector databases, as databases for managing and retrieving high-dimensional vector data, have garnered widespread attention. The applications of vector databases are particularly significant in areas such as recommendation systems, data mining, multimedia data retrieval, and RAGs (Retrieval-Augmented Generation) for large language models. The essence of vector databases lies in efficiently implementing similarity search for high-dimensional data. Therefore, numerous indexes for vector similarity search have been proposed. Among the various indexes, methods based on proximity graphs (PGs) significantly outperform others in terms of retrieval efficiency and represent the current state-of-the-art vector similarity search algorithm which also serves as the core index in numerous vector databases. This survey focuses on the theoretical background of proximity graphs, as well as a variety of construction strategies for different proximity graphs, along with their corresponding search, insertion, and deletion algorithms. Additionally, this survey also analyzes the integration of proximity graphs with other high-dimensional vector similarity search algorithms. Finally, this survey discusses the potential research direction of proximity graph-based vector database index.

PQE Committee

Chairperson: Prof. Nan TANG

Prime Supervisor: Prof Wei WANG

Co-Supervisor: Prof Lei LI

Examiner: Prof Xinlei HE

Date

04 June 2024

Time

11:10:00 - 12:25:00

Location

E1-149