A Survey on Cluster Resource Scheduling for Large Language Models
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Qualifying Examination
By Mr LIU, Hongbin
摘要
The transition from traditional deep learning to Large Language Models (LLMs) has fundamentally redefined the requirements for distributed system infrastructure. Resource scheduling, once primarily concerned with fairness and job packing, must now navigate the extreme scale, memory intensity, and complex dependencies unique to Generative AI. This survey presents a systematic taxonomy of LLM resource scheduling, tracing its evolution from general-purpose cluster managers to specialized, fine grained orchestration engines. We organize the state-ofthe-art into three distinct paradigms: Training, Inference Engine Optimization, and ClusterLevel Serving. For training, we analyze strategies for automated 3D parallelism and topologyaware placement that maximize hardware utilization (MFU) while ensuring fault tolerance for month-long jobs. At the inference engine level (Micro-Scheduling), we dissect innovations in memory management (e.g., PagedAttention) and flow control (e.g., continuous batching) designed to dismantle the KV cache memory wall. For cluster-level serving (Macro-Scheduling), we explore architectures for disaggregated serving, prefix caching, and multi-tenant LoRA orchestration that balance the conflicting demands of throughput and latency. Finally, we identify the open challenges of training-inference co-location, serverless LLMs, and the scheduling of complex AI agents, charting the path toward a unified ”AI Operating System.
PQE Committee
Chair of Committee: Prof. LUO, Qiong
Prime Supervisor: Prof. CHU, Xiaowen
Co-Supervisor: Prof. CUI, Ying (Online)
Examiner: Prof. WEN, Zeyi
日期
10 December 2025
时间
10:00:00 - 11:00:00
地点
E3-201 (HKUST-GZ)