A Survey on Retrieval-Augmented Table Analysis
The Hong Kong University of Science and Technology (Guangzhou)
Data Science and Analytics Thrust
PhD Qualifying Examination
By Mr Yang Chenyu
Abstract
In this survey, we present a comprehensive framework for Retrieval-Augmented Table Analysis, a paradigm integrating robust retrieval mechanisms with Large Language Models (LLMs) to enhance analytic tasks over structured tables. We propose a novel two-layer conceptual framework comprising a Retrieval Layer and an Application Layer, systematically organizing burgeoning research in this field. The Retrieval Layer focuses on acquiring contextual information through table content retrieval (table-level, column-level, tuple-level, cell-level) and external knowledge retrieval (metadata, operational, domain, and commonsense knowledge). The Application Layer details the utilization of retrieved information in key tasks such as Table Question Answering (TableQA), Natural Language to SQL (NL2SQL), and Data Imputation. Our empirical findings highlight the efficacy of Siamese Dual Encoders and question-passage pre-training for TableQA, while revealing that explicit table structure encoding is less crucial than expected. For Data Imputation, we introduce RetFill, a Retrieve-Rerank-Reason framework demonstrating significant improvements over LLM-only approaches by leveraging tuplelevel retrieval. We discuss current limitations, including the underexplored multi-table retrieval and the need for adaptive orchestration in table joinability tasks, outlining promising future research directions.
PQE Committee
Chair of Committee: Prof. WANG Wei
Prime Supervisor: Prof. TANG Nan
Co-Supervisor: Prof. LUO Yuyu
Examiner: Prof. ZHANG Yongqi
Date
11 June 2025
Time
13:00:00 - 14:00:00
Location
E1-148 (HKUST-GZ)