博士资格考试

Proximity Graph-based High-dimensional Approximate Nearest Neighbor Search: From Theory to Practice

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Qualifying Examination

By Mr. Mingyu YANG

摘要

With the rapid development of machine learning techniques and large language models, vector databases, as databases for managing and retrieving high-dimensional vector data, have garnered widespread attention. The applications of vector databases are particularly significant in areas such as recommendation systems, data mining, multimedia data retrieval, and RAGs (Retrieval-Augmented Generation) for large language models. The essence of vector databases lies in efficiently implementing similarity search for high-dimensional data. Therefore, numerous indexes for vector similarity search have been proposed. Among the various indexes, methods based on proximity graphs (PGs) significantly outperform others in terms of retrieval efficiency and represent the current state-of-the-art vector similarity search algorithm which also serves as the core index in numerous vector databases. This survey focuses on the theoretical background of proximity graphs, as well as a variety of construction strategies for different proximity graphs, along with their corresponding search, insertion, and deletion algorithms. Additionally, this survey also analyzes the integration of proximity graphs with other high-dimensional vector similarity search algorithms. Finally, this survey discusses the potential research direction of proximity graph-based vector database index.

PQE Committee

Chairperson: Prof. Nan TANG

Prime Supervisor: Prof Wei WANG

Co-Supervisor: Prof Lei LI

Examiner: Prof Xinlei HE

日期

2024年6月4日

时间

11:10:00 - 12:25:00

地点

E1-149