TY - GEN
T1 - Accelerating coordinate descent in iterative reconstruction
AU - Hsieh, Scott S.
AU - Hoffman, John M.
AU - Noo, Frederic
N1 - Publisher Copyright:
© SPIE. Downloading of the abstract is permitted for personal use only.
PY - 2019
Y1 - 2019
N2 - Iterative coordinate descent (ICD) is an optimization strategy for iterative reconstruction that is sometimes considered incompatible with parallel compute architectures such as graphics processing units (GPUs). We present a series of modifications that render ICD compatible with GPUs and demonstrate the code on a diagnostic, helical CT dataset. Our reference code is an open-source package, FreeCT ICD, which requires several hours for convergence. Three modifications are used. First, as with our reference code FreeCT ICD, the reconstruction is performed on a rotating coordinate grid, enabling the use of a stored system matrix. Second, every other voxel in the z-is updated direction simultaneously, and the sinogram data is shuffled to coalesce memory access. This increases the parallelism available to the GPU. Third, NS voxels in the xy-plane are updated simultaneously. This introduces possible crosstalk between updated voxels, but because the interaction between non-adjacent voxels is small, small values of NS still converge effectively. We find NS = 16 enables faster reconstruction via greater parallelism, and NS = 256 remains stable but has no additional computational benefit. When tested on a pediatric dataset of size 736x16x14000 reconstructed to a matrix size of 512x512x128 on a single GPU, our implementation of ICD can converge within 10 HU RMS in less than 5 minutes. This suggests that ICD could be competitive with simultaneous update algorithms on modern, parallel compute architectures.
AB - Iterative coordinate descent (ICD) is an optimization strategy for iterative reconstruction that is sometimes considered incompatible with parallel compute architectures such as graphics processing units (GPUs). We present a series of modifications that render ICD compatible with GPUs and demonstrate the code on a diagnostic, helical CT dataset. Our reference code is an open-source package, FreeCT ICD, which requires several hours for convergence. Three modifications are used. First, as with our reference code FreeCT ICD, the reconstruction is performed on a rotating coordinate grid, enabling the use of a stored system matrix. Second, every other voxel in the z-is updated direction simultaneously, and the sinogram data is shuffled to coalesce memory access. This increases the parallelism available to the GPU. Third, NS voxels in the xy-plane are updated simultaneously. This introduces possible crosstalk between updated voxels, but because the interaction between non-adjacent voxels is small, small values of NS still converge effectively. We find NS = 16 enables faster reconstruction via greater parallelism, and NS = 256 remains stable but has no additional computational benefit. When tested on a pediatric dataset of size 736x16x14000 reconstructed to a matrix size of 512x512x128 on a single GPU, our implementation of ICD can converge within 10 HU RMS in less than 5 minutes. This suggests that ICD could be competitive with simultaneous update algorithms on modern, parallel compute architectures.
KW - GPU programming
KW - Iterative reconstruction
UR - http://www.scopus.com/inward/record.url?scp=85068399883&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068399883&partnerID=8YFLogxK
U2 - 10.1117/12.2512615
DO - 10.1117/12.2512615
M3 - Conference contribution
AN - SCOPUS:85068399883
T3 - Progress in Biomedical Optics and Imaging - Proceedings of SPIE
BT - Medical Imaging 2019
A2 - Schmidt, Taly Gilat
A2 - Chen, Guang-Hong
A2 - Bosmans, Hilde
PB - SPIE
T2 - Medical Imaging 2019: Physics of Medical Imaging
Y2 - 17 February 2019 through 20 February 2019
ER -