A Survey on Cluster Resource Scheduling for Large Language Models

博士资格考试

A Survey on Cluster Resource Scheduling for Large Language Models

The Hong Kong University of Science and Technology (Guangzhou)

数据科学与分析学域

PhD Qualifying Examination

By Mr LIU, Hongbin

摘要

The transition from traditional deep learning to Large Language Models (LLMs) has fundamentally redefined the requirements for distributed system infrastructure. Resource scheduling, once primarily concerned with fairness and job packing, must now navigate the extreme scale, memory intensity, and complex dependencies unique to Generative AI. This survey presents a systematic taxonomy of LLM resource scheduling, tracing its evolution from general-purpose cluster managers to specialized, fine grained orchestration engines. We organize the state-ofthe-art into three distinct paradigms: Training, Inference Engine Optimization, and ClusterLevel Serving. For training, we analyze strategies for automated 3D parallelism and topologyaware placement that maximize hardware utilization (MFU) while ensuring fault tolerance for month-long jobs. At the inference engine level (Micro-Scheduling), we dissect innovations in memory management (e.g., PagedAttention) and flow control (e.g., continuous batching) designed to dismantle the KV cache memory wall. For cluster-level serving (Macro-Scheduling), we explore architectures for disaggregated serving, prefix caching, and multi-tenant LoRA orchestration that balance the conflicting demands of throughput and latency. Finally, we identify the open challenges of training-inference co-location, serverless LLMs, and the scheduling of complex AI agents, charting the path toward a unified ”AI Operating System.

PQE Committee

Chair of Committee: Prof. LUO, Qiong

Prime Supervisor: Prof. CHU, Xiaowen

Co-Supervisor: Prof. CUI, Ying (Online)

Examiner: Prof. WEN, Zeyi

日期

10 December 2025

时间

10:00:00 - 11:00:00

地点

E3-201 (HKUST-GZ)