PhD Qualifying-Exam

A Survey on Retrieval-Augmented Table Analysis

The Hong Kong University of Science and Technology (Guangzhou)

Data Science and Analytics Thrust

PhD Qualifying Examination

By Mr Yang Chenyu

Abstract

In this survey, we present a comprehensive framework for Retrieval-Augmented Table Analysis, a paradigm integrating robust retrieval mechanisms with Large Language Models (LLMs) to enhance analytic tasks over structured tables. We propose a novel two-layer conceptual framework comprising a Retrieval Layer and an Application Layer, systematically organizing burgeoning research in this field. The Retrieval Layer focuses on acquiring contextual information through table content retrieval (table-level, column-level, tuple-level, cell-level) and external knowledge retrieval (metadata, operational, domain, and commonsense knowledge). The Application Layer details the utilization of retrieved information in key tasks such as Table Question Answering (TableQA), Natural Language to SQL (NL2SQL), and Data Imputation. Our empirical findings highlight the efficacy of Siamese Dual Encoders and question-passage pre-training for TableQA, while revealing that explicit table structure encoding is less crucial than expected. For Data Imputation, we introduce RetFill, a Retrieve-Rerank-Reason framework demonstrating significant improvements over LLM-only approaches by leveraging tuplelevel retrieval. We discuss current limitations, including the underexplored multi-table retrieval and the need for adaptive orchestration in table joinability tasks, outlining promising future research directions.

PQE Committee

Chair of Committee: Prof. WANG Wei

Prime Supervisor: Prof. TANG Nan

Co-Supervisor: Prof. LUO Yuyu

Examiner: Prof. ZHANG Yongqi

Date

11 June 2025

Time

13:00:00 - 14:00:00

Location

E1-148 (HKUST-GZ)