Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs | |
Li, Zhihao1,2,3; Jia, Haipeng2; Zhang, Yunquan2; Chen, Tun2; Yuan, Liang2; Vuduc, Richard4 | |
刊名 | IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS |
2020-08-01 | |
卷号 | 31期号:8页码:1925-1941 |
关键词 | AutoFFT FFT code generation template DFT |
ISSN号 | 1045-9219 |
DOI | 10.1109/TPDS.2020.2977629 |
英文摘要 | This article presents AutoFFT, a template-based code generation framework that can automatically generate high-performance FFT kernels for all natural-number radices. AutoFFT is based on the Cooley-Tukey FFT algorithm, which exploits the symmetric and periodic properties of the DFT matrix, as the outer parallelization framework. Because butterflies are the core operations of the Cooley-Tukey algorithm, we explore additional symmetric and periodic properties of the DFT matrix and formulate multiple optimized calculation templates to further reduce the number of floating-point operations for butterflies of arbitrary natural numbers. To fully exploit hardware resources, we encapsulate a series of optimizations in an assembly template optimizer. Given any DFT problem, AutoFFT automatically generates C FFT kernels using these calculation templates and converts them into efficient assembly kernels using the template optimizer. Through a series of experiments on Arm, Intel, and AMD processors, we show that AutoFFT-generated kernels can outperform those in Fastest Fourier Transform in the West (FFTW), the Arm Performance Libraries (ARMPL), and the Intel Math Kernel Library (MKL). |
资助项目 | National Key Research and Development Program of China[2107YFB0202105] ; National Key Research and Development Program of China[2016YFB0200803] ; National Key Research and Development Program of China[2017YFB0202302] ; National Natural Science Foundation of China[61602443] ; National Natural Science Foundation of China[61432018] ; National Natural Science Foundation of China[61521092] ; National Natural Science Foundation of China[61502450] |
WOS研究方向 | Computer Science ; Engineering |
语种 | 英语 |
出版者 | IEEE COMPUTER SOC |
WOS记录号 | WOS:000561084300003 |
内容类型 | 期刊论文 |
源URL | [http://119.78.100.204/handle/2XEOYT63/15790] |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Jia, Haipeng |
作者单位 | 1.Georgia Inst Technol, Atlanta, GA 30332 USA 2.Chinese Acad Sci, Inst Comp Technol, SKL Comp Architecture, Beijing 100864, Peoples R China 3.Univ Chinese Acad Sci, Beijing 100049, Peoples R China 4.Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA |
推荐引用方式 GB/T 7714 | Li, Zhihao,Jia, Haipeng,Zhang, Yunquan,et al. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2020,31(8):1925-1941. |
APA | Li, Zhihao,Jia, Haipeng,Zhang, Yunquan,Chen, Tun,Yuan, Liang,&Vuduc, Richard.(2020).Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,31(8),1925-1941. |
MLA | Li, Zhihao,et al."Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 31.8(2020):1925-1941. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论