Cost-Effective Acceleration Methods for Large-Scale Model Training

Abstract

In recent years, remarkable advancements have been made in foundation models. These models can tackle complex tasks effectively across diverse domains by extracting implicit information and knowledge from vast datasets. They have led to a paradigm shift in how users interact with AI systems, allowing them to generate desired outputs by providing customized prompts without the need for task-specific training or continuous updating of model weights. In particular, Large Language Models (LLMs), serving as practical im-plementations of foundation models, have played an important role in democratizing AI solutions across various domains. However, training an LLM requires substantial hardware resources as well as considerable time resources. For instance, a single training attempt may necessitate thousands of GPUs and span over several months, which is financially unfeasible for the majority of researchers in academia and industry.

The aim of this PhD study is to investigate and develop a set of new methods and tech-niques for reducing the cost of LLM training. We introduce a theorem of “impossible trin-ity” for LLM training systems, which provides guidance for balancing different design forces and informing various system designs. We also propose a dynamic offloading mechanism that loads tensor data from CPU memory into GPU memory only when nec-essary. This mechanism is based on the design of a working window that specifies opti-mal time windows for overlapping computation and communication operations, effective-ly reducing training durations. We examine low-level fine-grained task partitioning at the level of CUDA streams and develop various strategies for dynamic resource allocation to facilitate the execution of tasks in parallel. In a real production environment of GPU clus-ters, leveraging a combination of our new methods and techniques, we achieve a 6.5 times larger trainable model size and a 3.7 times improvement in training throughput compared to leading state-of-the-art solutions.

Metadata

Supervisors:	Xu, Jie and Wang, Zheng and Djemame, Karim
Keywords:	Large Language Models, Distributed Training Acceleration, Cost Effectiveness, Offloading, Task Partitioning, Heterogeneous Resources.
Awarding institution:	University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering (Leeds) > School of Computing (Leeds)
Depositing User:	Mr Xiaoyang Sun
Date Deposited:	18 Dec 2024 15:48
Last Modified:	18 Dec 2024 15:48
Open Archives Initiative ID (OAI ID):	oai:etheses.whiterose.ac.uk:35914

Download

Final eThesis - complete (pdf)

Embargoed until: 1 December 2029

Please use the button below to request a copy.

Filename: SUN_Computing_PhD_2023.pdf

Request a copy

Please use the 'Request a copy' link(s) in the 'Downloads' section above to request this thesis. This will be sent directly to someone who may authorise access.
You can contact us about this thesis. If you need to make a general enquiry, please see the Contact us page.

Cost-Effective Acceleration Methods for Large-Scale Model Training

Abstract

Metadata

Download

Final eThesis - complete (pdf)

Export

Statistics