A Survey of Data Engineering for Graphical Reasoning in Multimodal Large Language Models
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Qualifying Examination
By Mr ZENG, Xingchen
摘要
Multimodal Large Language Models (MLLMs) have demonstrated graphical reasoning ability in understanding and generating visual representations like charts and diagrams. Developing this capability relies on extensive, high-quality multimodal training datasets. However, data engineering for these datasets, including collection, augmentation, and filtering, presents significant challenges. Real-world graphics often lack essential metadata (e.g., underlying data tables), hindering accurate training instance synthesis. Conversely, synthetic data via LLM-powered code generation offers a robust alternative by providing inherent metadata and enabling complex data creation. The proliferation of diverse real-world and synthetic datasets has created a fragmented landscape, complicating researchers’ efforts. This survey addresses this by providing a comprehensive overview and structured understanding of data engineering methodologies for MLLM graphical reasoning. It examines current practices, discusses their strengths and weaknesses, and derives future research directions.
PQE Committee
Chair of Committee: Prof. TANG Nan
Prime Supervisor: Prof. ZENG Wei
Co-Supervisor: Prof. WANG Wei
Examiner: Prof. YANG Weikai
日期
10 June 2025
时间
17:00:00 - 18:00:00
地点
E1-149 (HKUST-GZ)