Layer-wise Top-k Gradient Sparsification for Distributed Deep Learning | |
Guangyao Li | |
2023-03 | |
会议日期 | 2023年2月24-26日 |
会议地点 | 中国广州 |
英文摘要 | Distributed training is widely used in training large-scale deep learning models, and data parallelism is one of the dominant approaches. Data-parallel training has additional communication overhead, which might be the bottleneck of training system. Top-k sparsification is a successful technique to reduce the communication volume to break the bottleneck. However, top-k sparsification cannot be executed until backpropagation is completed, which disables the overlap of backpropagation computations and gradient communications, leading to limiting the system scaling efficiency. In this paper, we propose a new distributed optimization approach named LKGS-SGD, which combines synchronous SGD (S-SGD) with a novel layer-wise top-k sparsification algorithm (LKGS). The LKGS-SGD enables the overlap of computations and communications, and adapts to gradient exchange at layer-wise. Evaluations are conducted by real-world applications. Experimental results show that LKGS-SGD achieves similar convergence to dense S-SGD, while outperforming the original S-SGD and S-SGD with top-k sparsification. |
会议录出版者 | 1 |
内容类型 | 会议论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/52215] |
专题 | 融合创新中心 |
作者单位 | 中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Guangyao Li. Layer-wise Top-k Gradient Sparsification for Distributed Deep Learning[C]. 见:. 中国广州. 2023年2月24-26日. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论