A Survey on Efficient Large Language ModelServing System
The Hong Kong University of Science and Technology (Guangzhou)
数据科学与分析学域
PhD Qualifying Examination
By Ms. Xuemei Peng
摘要
Large Language Models (LLMs) have become ubiquitous in various applications, ranging from natural language processing to complex decision-making tasks. With their increasing prevalence, the need for an efficient LLM serving system is critical to ensure both optimal performance and effective resource utilization. This survey explores the latest techniques for optimizing LLM serving, offering a comprehensive overview that includes advancements in memory management, computation optimization, and the development of advanced LLM paradigms. We also present our attempts on improving LLM serving efficiency, featuring innovations such as dynamic request packaging, adaptive GPU resource allocation, and the strategic duplication and merging of pipelines. The experimental results that validate the effectiveness of our approach. Finally, we propose directions for future research in efficient LLM serving, with the goal of further enhancing performance and resource management.
PQE Committee
Chair of Committee: Prof. Xiaowen CHU
Prime Supervisor: Prof. Zeyi WEN
Co-Supervisor: Prof. Xinyu CHEN
Examiner: Prof. Zeke XIE
日期
27 November 2024
时间
15:00:00 - 16:00:00
地点
E3-105
Join Link
Zoom Meeting ID: 945 2523 7448
Passcode: dsa2024