Iterative coordinate descent (ICD) is an optimization strategy for iterative reconstruction that is sometimes considered incompatible with parallel compute architectures such as graphics processing units (GPUs). We present a series of modifications that render ICD compatible with GPUs and demonstrate the code on a diagnostic, helical CT dataset. Our reference code is an open-source package, FreeCT ICD, which requires several hours for convergence. Three modifications are used. First, as with our reference code FreeCT ICD, the reconstruction is performed on a rotating coordinate grid, enabling the use of a stored system matrix. Second, every other voxel in the z-is updated direction simultaneously, and the sinogram data is shuffled to coalesce memory access. This increases the parallelism available to the GPU. Third, NS voxels in the xy-plane are updated simultaneously. This introduces possible crosstalk between updated voxels, but because the interaction between non-adjacent voxels is small, small values of NS still converge effectively. We find NS = 16 enables faster reconstruction via greater parallelism, and NS = 256 remains stable but has no additional computational benefit. When tested on a pediatric dataset of size 736x16x14000 reconstructed to a matrix size of 512x512x128 on a single GPU, our implementation of ICD can converge within 10 HU RMS in less than 5 minutes. This suggests that ICD could be competitive with simultaneous update algorithms on modern, parallel compute architectures.