Layer-wise Top-k Gradient Sparsification for Distributed Deep Learning

	Layer-wise Top-k Gradient Sparsification for Distributed Deep Learning
	Guangyao Li
	2023-03
会议日期	2023年2月24-26日
会议地点	中国广州
英文摘要	Distributed training is widely used in training large-scale deep learning models, and data parallelism is one of the dominant approaches. Data-parallel training has additional communication overhead, which might be the bottleneck of training system. Top-k sparsification is a successful technique to reduce the communication volume to break the bottleneck. However, top-k sparsification cannot be executed until backpropagation is completed, which disables the overlap of backpropagation computations and gradient communications, leading to limiting the system scaling efficiency. In this paper, we propose a new distributed optimization approach named LKGS-SGD, which combines synchronous SGD (S-SGD) with a novel layer-wise top-k sparsification algorithm (LKGS). The LKGS-SGD enables the overlap of computations and communications, and adapts to gradient exchange at layer-wise. Evaluations are conducted by real-world applications. Experimental results show that LKGS-SGD achieves similar convergence to dense S-SGD, while outperforming the original S-SGD and S-SGD with top-k sparsification.
会议录出版者	1
内容类型	会议论文
源URL	[http://ir.ia.ac.cn/handle/173211/52215]
专题	融合创新中心
作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Guangyao Li. Layer-wise Top-k Gradient Sparsification for Distributed Deep Learning[C]. 见:. 中国广州. 2023年2月24-26日.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们