A Survey of Data Engineering for Graphical Reasoning in Multimodal Large Language Models

博士资格考试

A Survey of Data Engineering for Graphical Reasoning in Multimodal Large Language Models

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Qualifying Examination

By Mr ZENG, Xingchen

摘要

Multimodal Large Language Models (MLLMs) have demonstrated graphical reasoning ability in understanding and generating visual representations like charts and diagrams. Developing this capability relies on extensive, high-quality multimodal training datasets. However, data engineering for these datasets, including collection, augmentation, and filtering, presents significant challenges. Real-world graphics often lack essential metadata (e.g., underlying data tables), hindering accurate training instance synthesis. Conversely, synthetic data via LLM-powered code generation offers a robust alternative by providing inherent metadata and enabling complex data creation. The proliferation of diverse real-world and synthetic datasets has created a fragmented landscape, complicating researchers’ efforts. This survey addresses this by providing a comprehensive overview and structured understanding of data engineering methodologies for MLLM graphical reasoning. It examines current practices, discusses their strengths and weaknesses, and derives future research directions.

PQE Committee

Chair of Committee: Prof. TANG Nan

Prime Supervisor: Prof. ZENG Wei

Co-Supervisor: Prof. WANG Wei

Examiner: Prof. YANG Weikai

日期

10 June 2025

时间

17:00:00 - 18:00:00

地点

E1-149 (HKUST-GZ)