博士资格考试

A Survey on Retrieval-Augmented Table Analysis

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Qualifying Examination

By Mr Yang Chenyu

摘要

In this survey, we present a comprehensive framework for Retrieval-Augmented Table Analysis, a paradigm integrating robust retrieval mechanisms with Large Language Models (LLMs) to enhance analytic tasks over structured tables. We propose a novel two-layer conceptual framework comprising a Retrieval Layer and an Application Layer, systematically organizing burgeoning research in this field. The Retrieval Layer focuses on acquiring contextual information through table content retrieval (table-level, column-level, tuple-level, cell-level) and external knowledge retrieval (metadata, operational, domain, and commonsense knowledge). The Application Layer details the utilization of retrieved information in key tasks such as Table Question Answering (TableQA), Natural Language to SQL (NL2SQL), and Data Imputation. Our empirical findings highlight the efficacy of Siamese Dual Encoders and question-passage pre-training for TableQA, while revealing that explicit table structure encoding is less crucial than expected. For Data Imputation, we introduce RetFill, a Retrieve-Rerank-Reason framework demonstrating significant improvements over LLM-only approaches by leveraging tuplelevel retrieval. We discuss current limitations, including the underexplored multi-table retrieval and the need for adaptive orchestration in table joinability tasks, outlining promising future research directions.

PQE Committee

Chair of Committee: Prof. WANG Wei

Prime Supervisor: Prof. TANG Nan

Co-Supervisor: Prof. LUO Yuyu

Examiner: Prof. ZHANG Yongqi

日期

11 June 2025

时间

13:00:00 - 14:00:00

地点

E1-148 (HKUST-GZ)