Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs
Li, Zhihao1,2,3; Jia, Haipeng2; Zhang, Yunquan2; Chen, Tun2; Yuan, Liang2; Vuduc, Richard4
刊名IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
2020-08-01
卷号31期号:8页码:1925-1941
关键词AutoFFT FFT code generation template DFT
ISSN号1045-9219
DOI10.1109/TPDS.2020.2977629
英文摘要This article presents AutoFFT, a template-based code generation framework that can automatically generate high-performance FFT kernels for all natural-number radices. AutoFFT is based on the Cooley-Tukey FFT algorithm, which exploits the symmetric and periodic properties of the DFT matrix, as the outer parallelization framework. Because butterflies are the core operations of the Cooley-Tukey algorithm, we explore additional symmetric and periodic properties of the DFT matrix and formulate multiple optimized calculation templates to further reduce the number of floating-point operations for butterflies of arbitrary natural numbers. To fully exploit hardware resources, we encapsulate a series of optimizations in an assembly template optimizer. Given any DFT problem, AutoFFT automatically generates C FFT kernels using these calculation templates and converts them into efficient assembly kernels using the template optimizer. Through a series of experiments on Arm, Intel, and AMD processors, we show that AutoFFT-generated kernels can outperform those in Fastest Fourier Transform in the West (FFTW), the Arm Performance Libraries (ARMPL), and the Intel Math Kernel Library (MKL).
资助项目National Key Research and Development Program of China[2107YFB0202105] ; National Key Research and Development Program of China[2016YFB0200803] ; National Key Research and Development Program of China[2017YFB0202302] ; National Natural Science Foundation of China[61602443] ; National Natural Science Foundation of China[61432018] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61502450]
WOS研究方向Computer Science ; Engineering
语种英语
出版者IEEE COMPUTER SOC
WOS记录号WOS:000561084300003
内容类型期刊论文
源URL[http://119.78.100.204/handle/2XEOYT63/15790]  
专题中国科学院计算技术研究所期刊论文_英文
通讯作者Jia, Haipeng
作者单位1.Georgia Inst Technol, Atlanta, GA 30332 USA
2.Chinese Acad Sci, Inst Comp Technol, SKL Comp Architecture, Beijing 100864, Peoples R China
3.Univ Chinese Acad Sci, Beijing 100049, Peoples R China
4.Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
推荐引用方式
GB/T 7714
Li, Zhihao,Jia, Haipeng,Zhang, Yunquan,et al. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2020,31(8):1925-1941.
APA Li, Zhihao,Jia, Haipeng,Zhang, Yunquan,Chen, Tun,Yuan, Liang,&Vuduc, Richard.(2020).Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,31(8),1925-1941.
MLA Li, Zhihao,et al."Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 31.8(2020):1925-1941.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace