High performance non-uniform FFT on modern X86-based multi-core systems

Dhiraj D. Kalamkar; Joshua D. Trzaskoz; Srinivas Sridharan; Mikhail Smelyanskiy; Daehyun Kim; Armando Manduca; Yunhong Shu; Matt A. Bernstein; Bharat Kaul; Pradeep Dubey

doi:10.1109/IPDPS.2012.49

High performance non-uniform FFT on modern X86-based multi-core systems

Dhiraj D. Kalamkar, Joshua D. Trzaskoz, Srinivas Sridharan, Mikhail Smelyanskiy, Daehyun Kim, Armando Manduca, Yunhong Shu, Matt A. Bernstein, Bharat Kaul, Pradeep Dubey

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Scopus citations

Abstract

The Non-Uniform Fast Fourier Transform (NUFFT) is a generalization of FFT to non-equidistant samples. It has many applications which vary from medical imaging to radio astronomy to the numerical solution of partial differential equations. Despite recent advances in speeding up NUFFT on various platforms, its practical applications are still limited, due to its high computational cost, which is significantly dominated by the convolution of a signal between a non-uniform and uniform grids. The computational cost of the NUFFT is particularly detrimental in cases which require fast reconstruction times, such as iterative 3D non-Cartesian MRI reconstruction. We propose novel and highly scalable parallel algorithm for performing NUFFT on x86-based multi-core CPUs. The high performance of our algorithm relies on good SIMD utilization and high parallel efficiency. On convolution, we demonstrate on average 90% SIMD efficiency using SSE, as well up to linear scalability using a quad-socket 40-core Intel® Xeon® E7-4870 Processors based system. As a result, on dual socket Intel® Xeon® X5670 based server, our NUFFT implementation is more than 4x faster compared to the best available NUFFT3D implementation, when run on the same hardware. On Intel® Xeon® E5-2670 processor based server, our NUFFT implementation is 1.5X faster than any published NUFFT implementation today. Such speed improvement opens new usages for NUFFT. For example, iterative multi channel reconstruction of a 240x240x240 image could execute in just over 3 minutes, which is on the same order as contemporary non-iterative (and thus less-accurate) 3D NUFFT-based MRI reconstructions.

Original language	English (US)
Title of host publication	Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Pages	449-460
Number of pages	12
DOIs	https://doi.org/10.1109/IPDPS.2012.49
State	Published - 2012
Event	2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012 - Shanghai, China Duration: May 21 2012 → May 25 2012

Publication series

Name	Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Other

Other	2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Country/Territory	China
City	Shanghai
Period	5/21/12 → 5/25/12

Keywords

Non-uniform FFT
Parallelization
Scalability
Vectorization

ASJC Scopus subject areas

Software

Access to Document

10.1109/IPDPS.2012.49

Cite this

Kalamkar, D. D., Trzaskoz, J. D., Sridharan, S., Smelyanskiy, M., Kim, D., Manduca, A., Shu, Y., Bernstein, M. A., Kaul, B., & Dubey, P. (2012). High performance non-uniform FFT on modern X86-based multi-core systems. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012 (pp. 449-460). Article 6267881 (Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012). https://doi.org/10.1109/IPDPS.2012.49

High performance non-uniform FFT on modern X86-based multi-core systems. / Kalamkar, Dhiraj D.; Trzaskoz, Joshua D.; Sridharan, Srinivas et al.
Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. 2012. p. 449-460 6267881 (Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kalamkar, DD, Trzaskoz, JD, Sridharan, S, Smelyanskiy, M, Kim, D, Manduca, A , Shu, Y , Bernstein, MA, Kaul, B & Dubey, P 2012, High performance non-uniform FFT on modern X86-based multi-core systems. in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012., 6267881, Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012, pp. 449-460, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012, Shanghai, China, 5/21/12. https://doi.org/10.1109/IPDPS.2012.49

Kalamkar DD, Trzaskoz JD, Sridharan S, Smelyanskiy M, Kim D, Manduca A et al. High performance non-uniform FFT on modern X86-based multi-core systems. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. 2012. p. 449-460. 6267881. (Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012). doi: 10.1109/IPDPS.2012.49

Kalamkar, Dhiraj D. ; Trzaskoz, Joshua D. ; Sridharan, Srinivas et al. / High performance non-uniform FFT on modern X86-based multi-core systems. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. 2012. pp. 449-460 (Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012).

@inproceedings{5fa41168cdb24e26a8d60e65a0f11ef5,

title = "High performance non-uniform FFT on modern X86-based multi-core systems",

abstract = "The Non-Uniform Fast Fourier Transform (NUFFT) is a generalization of FFT to non-equidistant samples. It has many applications which vary from medical imaging to radio astronomy to the numerical solution of partial differential equations. Despite recent advances in speeding up NUFFT on various platforms, its practical applications are still limited, due to its high computational cost, which is significantly dominated by the convolution of a signal between a non-uniform and uniform grids. The computational cost of the NUFFT is particularly detrimental in cases which require fast reconstruction times, such as iterative 3D non-Cartesian MRI reconstruction. We propose novel and highly scalable parallel algorithm for performing NUFFT on x86-based multi-core CPUs. The high performance of our algorithm relies on good SIMD utilization and high parallel efficiency. On convolution, we demonstrate on average 90% SIMD efficiency using SSE, as well up to linear scalability using a quad-socket 40-core Intel{\textregistered} Xeon{\textregistered} E7-4870 Processors based system. As a result, on dual socket Intel{\textregistered} Xeon{\textregistered} X5670 based server, our NUFFT implementation is more than 4x faster compared to the best available NUFFT3D implementation, when run on the same hardware. On Intel{\textregistered} Xeon{\textregistered} E5-2670 processor based server, our NUFFT implementation is 1.5X faster than any published NUFFT implementation today. Such speed improvement opens new usages for NUFFT. For example, iterative multi channel reconstruction of a 240x240x240 image could execute in just over 3 minutes, which is on the same order as contemporary non-iterative (and thus less-accurate) 3D NUFFT-based MRI reconstructions.",

keywords = "Non-uniform FFT, Parallelization, Scalability, Vectorization",

author = "Kalamkar, {Dhiraj D.} and Trzaskoz, {Joshua D.} and Srinivas Sridharan and Mikhail Smelyanskiy and Daehyun Kim and Armando Manduca and Yunhong Shu and Bernstein, {Matt A.} and Bharat Kaul and Pradeep Dubey",

year = "2012",

doi = "10.1109/IPDPS.2012.49",

language = "English (US)",

isbn = "9780769546759",

series = "Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012",

pages = "449--460",

booktitle = "Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012",

note = "2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012 ; Conference date: 21-05-2012 Through 25-05-2012",

}

TY - GEN

T1 - High performance non-uniform FFT on modern X86-based multi-core systems

AU - Kalamkar, Dhiraj D.

AU - Trzaskoz, Joshua D.

AU - Sridharan, Srinivas

AU - Smelyanskiy, Mikhail

AU - Kim, Daehyun

AU - Manduca, Armando

AU - Shu, Yunhong

AU - Bernstein, Matt A.

AU - Kaul, Bharat

AU - Dubey, Pradeep

PY - 2012

Y1 - 2012

N2 - The Non-Uniform Fast Fourier Transform (NUFFT) is a generalization of FFT to non-equidistant samples. It has many applications which vary from medical imaging to radio astronomy to the numerical solution of partial differential equations. Despite recent advances in speeding up NUFFT on various platforms, its practical applications are still limited, due to its high computational cost, which is significantly dominated by the convolution of a signal between a non-uniform and uniform grids. The computational cost of the NUFFT is particularly detrimental in cases which require fast reconstruction times, such as iterative 3D non-Cartesian MRI reconstruction. We propose novel and highly scalable parallel algorithm for performing NUFFT on x86-based multi-core CPUs. The high performance of our algorithm relies on good SIMD utilization and high parallel efficiency. On convolution, we demonstrate on average 90% SIMD efficiency using SSE, as well up to linear scalability using a quad-socket 40-core Intel® Xeon® E7-4870 Processors based system. As a result, on dual socket Intel® Xeon® X5670 based server, our NUFFT implementation is more than 4x faster compared to the best available NUFFT3D implementation, when run on the same hardware. On Intel® Xeon® E5-2670 processor based server, our NUFFT implementation is 1.5X faster than any published NUFFT implementation today. Such speed improvement opens new usages for NUFFT. For example, iterative multi channel reconstruction of a 240x240x240 image could execute in just over 3 minutes, which is on the same order as contemporary non-iterative (and thus less-accurate) 3D NUFFT-based MRI reconstructions.

AB - The Non-Uniform Fast Fourier Transform (NUFFT) is a generalization of FFT to non-equidistant samples. It has many applications which vary from medical imaging to radio astronomy to the numerical solution of partial differential equations. Despite recent advances in speeding up NUFFT on various platforms, its practical applications are still limited, due to its high computational cost, which is significantly dominated by the convolution of a signal between a non-uniform and uniform grids. The computational cost of the NUFFT is particularly detrimental in cases which require fast reconstruction times, such as iterative 3D non-Cartesian MRI reconstruction. We propose novel and highly scalable parallel algorithm for performing NUFFT on x86-based multi-core CPUs. The high performance of our algorithm relies on good SIMD utilization and high parallel efficiency. On convolution, we demonstrate on average 90% SIMD efficiency using SSE, as well up to linear scalability using a quad-socket 40-core Intel® Xeon® E7-4870 Processors based system. As a result, on dual socket Intel® Xeon® X5670 based server, our NUFFT implementation is more than 4x faster compared to the best available NUFFT3D implementation, when run on the same hardware. On Intel® Xeon® E5-2670 processor based server, our NUFFT implementation is 1.5X faster than any published NUFFT implementation today. Such speed improvement opens new usages for NUFFT. For example, iterative multi channel reconstruction of a 240x240x240 image could execute in just over 3 minutes, which is on the same order as contemporary non-iterative (and thus less-accurate) 3D NUFFT-based MRI reconstructions.

KW - Non-uniform FFT

KW - Parallelization

KW - Scalability

KW - Vectorization

UR - http://www.scopus.com/inward/record.url?scp=84866847512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866847512&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2012.49

DO - 10.1109/IPDPS.2012.49

M3 - Conference contribution

AN - SCOPUS:84866847512

SN - 9780769546759

T3 - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

SP - 449

EP - 460

BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

T2 - 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Y2 - 21 May 2012 through 25 May 2012

ER -

High performance non-uniform FFT on modern X86-based multi-core systems

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this