Assessing atrophy measurement techniques in dementia: Results from the MIRIAD atrophy challenge

David M. Cash; Chris Frost; Leonardo O. Iheme; Devrim Ünay; Melek Kandemir; Jurgen Fripp; Olivier Salvado; Pierrick Bourgeat; Martin Reuter; Bruce Fischl; Marco Lorenzi; Giovanni B. Frisoni; Xavier Pennec; Ronald K. Pierson; Jeffrey L. Gunter; Matthew L. Senjem; Clifford R. Jack; Nicolas Guizard; Vladimir S. Fonov; D. Louis Collins; Marc Modat; M. Jorge Cardoso; Kelvin K. Leung; Hongzhi Wang; Sandhitsu R. Das; Paul A. Yushkevich; Ian B. Malone; Nick C. Fox; Jonathan M. Schott; Sebastien Ourselin

doi:10.1016/j.neuroimage.2015.07.087

Assessing atrophy measurement techniques in dementia: Results from the MIRIAD atrophy challenge

David M. Cash, Chris Frost, Leonardo O. Iheme, Devrim Ünay, Melek Kandemir, Jurgen Fripp, Olivier Salvado, Pierrick Bourgeat, Martin Reuter, Bruce Fischl, Marco Lorenzi, Giovanni B. Frisoni, Xavier Pennec, Ronald K. Pierson, Jeffrey L. Gunter, Matthew L. Senjem, Clifford R. Jack, Nicolas Guizard, Vladimir S. Fonov, D. Louis CollinsMarc Modat, M. Jorge Cardoso, Kelvin K. Leung, Hongzhi Wang, Sandhitsu R. Das, Paul A. Yushkevich, Ian B. Malone, Nick C. Fox, Jonathan M. Schott, Sebastien Ourselin

Radiology

Research output: Contribution to journal › Article › peer-review

41 Scopus citations

Abstract

Structural MRI is widely used for investigating brain atrophy in many neurodegenerative disorders, with several research groups developing and publishing techniques to provide quantitative assessments of this longitudinal change. Often techniques are compared through computation of required sample size estimates for future clinical trials. However interpretation of such comparisons is rendered complex because, despite using the same publicly available cohorts, the various techniques have been assessed with different data exclusions and different statistical analysis models. We created the MIRIAD atrophy challenge in order to test various capabilities of atrophy measurement techniques. The data consisted of 69 subjects (46 Alzheimer's disease, 23 control) who were scanned multiple (up to twelve) times at nine visits over a follow-up period of one to two years, resulting in 708 total image sets. Nine participating groups from 6 countries completed the challenge by providing volumetric measurements of key structures (whole brain, lateral ventricle, left and right hippocampi) for each dataset and atrophy measurements of these structures for each time point pair (both forward and backward) of a given subject. From these results, we formally compared techniques using exactly the same dataset. First, we assessed the repeatability of each technique using rates obtained from short intervals where no measurable atrophy is expected. For those measures that provided direct measures of atrophy between pairs of images, we also assessed symmetry and transitivity. Then, we performed a statistical analysis in a consistent manner using linear mixed effect models. The models, one for repeated measures of volume made at multiple time-points and a second for repeated "direct" measures of change in brain volume, appropriately allowed for the correlation between measures made on the same subject and were shown to fit the data well. From these models, we obtained estimates of the distribution of atrophy rates in the Alzheimer's disease (AD) and control groups and of required sample sizes to detect a 25% treatment effect, in relation to healthy ageing, with 95% significance and 80% power over follow-up periods of 6, 12, and 24. months. Uncertainty in these estimates, and head-to-head comparisons between techniques, were carried out using the bootstrap. The lateral ventricles provided the most stable measurements, followed by the brain. The hippocampi had much more variability across participants, likely because of differences in segmentation protocol and less distinct boundaries. Most methods showed no indication of bias based on the short-term interval results, and direct measures provided good consistency in terms of symmetry and transitivity. The resulting annualized rates of change derived from the model ranged from, for whole brain: - 1.4% to - 2.2% (AD) and - 0.35% to - 0.67% (control), for ventricles: 4.6% to 10.2% (AD) and 1.2% to 3.4% (control), and for hippocampi: - 1.5% to - 7.0% (AD) and - 0.4% to - 1.4% (control). There were large and statistically significant differences in the sample size requirements between many of the techniques. The lowest sample sizes for each of these structures, for a trial with a 12. month follow-up period, were 242 (95% CI: 154 to 422) for whole brain, 168 (95% CI: 112 to 282) for ventricles, 190 (95% CI: 146 to 268) for left hippocampi, and 158 (95% CI: 116 to 228) for right hippocampi. This analysis represents one of the most extensive statistical comparisons of a large number of different atrophy measurement techniques from around the globe. The challenge data will remain online and publicly available so that other groups can assess their methods.

Original language	English (US)
Pages (from-to)	149-164
Number of pages	16
Journal	NeuroImage
Volume	123
DOIs	https://doi.org/10.1016/j.neuroimage.2015.07.087
State	Published - Dec 1 2015

ASJC Scopus subject areas

Neurology
Cognitive Neuroscience

Access to Document

10.1016/j.neuroimage.2015.07.087

Cite this

Cash, D. M., Frost, C., Iheme, L. O., Ünay, D., Kandemir, M., Fripp, J., Salvado, O., Bourgeat, P., Reuter, M., Fischl, B., Lorenzi, M., Frisoni, G. B., Pennec, X., Pierson, R. K., Gunter, J. L., Senjem, M. L., Jack, C. R., Guizard, N., Fonov, V. S., ... Ourselin, S. (2015). Assessing atrophy measurement techniques in dementia: Results from the MIRIAD atrophy challenge. NeuroImage, 123, 149-164. https://doi.org/10.1016/j.neuroimage.2015.07.087

Cash, DM, Frost, C, Iheme, LO, Ünay, D, Kandemir, M, Fripp, J, Salvado, O, Bourgeat, P, Reuter, M, Fischl, B, Lorenzi, M, Frisoni, GB, Pennec, X, Pierson, RK, Gunter, JL, Senjem, ML, Jack, CR, Guizard, N, Fonov, VS, Collins, DL, Modat, M, Cardoso, MJ, Leung, KK, Wang, H, Das, SR, Yushkevich, PA, Malone, IB, Fox, NC, Schott, JM & Ourselin, S 2015, 'Assessing atrophy measurement techniques in dementia: Results from the MIRIAD atrophy challenge', NeuroImage, vol. 123, pp. 149-164. https://doi.org/10.1016/j.neuroimage.2015.07.087

@article{e82d385763a34e4f8e8eab3ffd0e7ae6,

title = "Assessing atrophy measurement techniques in dementia: Results from the MIRIAD atrophy challenge",

abstract = "Structural MRI is widely used for investigating brain atrophy in many neurodegenerative disorders, with several research groups developing and publishing techniques to provide quantitative assessments of this longitudinal change. Often techniques are compared through computation of required sample size estimates for future clinical trials. However interpretation of such comparisons is rendered complex because, despite using the same publicly available cohorts, the various techniques have been assessed with different data exclusions and different statistical analysis models. We created the MIRIAD atrophy challenge in order to test various capabilities of atrophy measurement techniques. The data consisted of 69 subjects (46 Alzheimer's disease, 23 control) who were scanned multiple (up to twelve) times at nine visits over a follow-up period of one to two years, resulting in 708 total image sets. Nine participating groups from 6 countries completed the challenge by providing volumetric measurements of key structures (whole brain, lateral ventricle, left and right hippocampi) for each dataset and atrophy measurements of these structures for each time point pair (both forward and backward) of a given subject. From these results, we formally compared techniques using exactly the same dataset. First, we assessed the repeatability of each technique using rates obtained from short intervals where no measurable atrophy is expected. For those measures that provided direct measures of atrophy between pairs of images, we also assessed symmetry and transitivity. Then, we performed a statistical analysis in a consistent manner using linear mixed effect models. The models, one for repeated measures of volume made at multiple time-points and a second for repeated {"}direct{"} measures of change in brain volume, appropriately allowed for the correlation between measures made on the same subject and were shown to fit the data well. From these models, we obtained estimates of the distribution of atrophy rates in the Alzheimer's disease (AD) and control groups and of required sample sizes to detect a 25% treatment effect, in relation to healthy ageing, with 95% significance and 80% power over follow-up periods of 6, 12, and 24. months. Uncertainty in these estimates, and head-to-head comparisons between techniques, were carried out using the bootstrap. The lateral ventricles provided the most stable measurements, followed by the brain. The hippocampi had much more variability across participants, likely because of differences in segmentation protocol and less distinct boundaries. Most methods showed no indication of bias based on the short-term interval results, and direct measures provided good consistency in terms of symmetry and transitivity. The resulting annualized rates of change derived from the model ranged from, for whole brain: - 1.4% to - 2.2% (AD) and - 0.35% to - 0.67% (control), for ventricles: 4.6% to 10.2% (AD) and 1.2% to 3.4% (control), and for hippocampi: - 1.5% to - 7.0% (AD) and - 0.4% to - 1.4% (control). There were large and statistically significant differences in the sample size requirements between many of the techniques. The lowest sample sizes for each of these structures, for a trial with a 12. month follow-up period, were 242 (95% CI: 154 to 422) for whole brain, 168 (95% CI: 112 to 282) for ventricles, 190 (95% CI: 146 to 268) for left hippocampi, and 158 (95% CI: 116 to 228) for right hippocampi. This analysis represents one of the most extensive statistical comparisons of a large number of different atrophy measurement techniques from around the globe. The challenge data will remain online and publicly available so that other groups can assess their methods.",

author = "Cash, {David M.} and Chris Frost and Iheme, {Leonardo O.} and Devrim {\"U}nay and Melek Kandemir and Jurgen Fripp and Olivier Salvado and Pierrick Bourgeat and Martin Reuter and Bruce Fischl and Marco Lorenzi and Frisoni, {Giovanni B.} and Xavier Pennec and Pierson, {Ronald K.} and Gunter, {Jeffrey L.} and Senjem, {Matthew L.} and Jack, {Clifford R.} and Nicolas Guizard and Fonov, {Vladimir S.} and Collins, {D. Louis} and Marc Modat and Cardoso, {M. Jorge} and Leung, {Kelvin K.} and Hongzhi Wang and Das, {Sandhitsu R.} and Yushkevich, {Paul A.} and Malone, {Ian B.} and Fox, {Nick C.} and Schott, {Jonathan M.} and Sebastien Ourselin",

note = "Publisher Copyright: {\textcopyright} 2015.",

year = "2015",

month = dec,

day = "1",

doi = "10.1016/j.neuroimage.2015.07.087",

language = "English (US)",

volume = "123",

pages = "149--164",

journal = "NeuroImage",

issn = "1053-8119",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Assessing atrophy measurement techniques in dementia

T2 - Results from the MIRIAD atrophy challenge

AU - Cash, David M.

AU - Frost, Chris

AU - Iheme, Leonardo O.

AU - Ünay, Devrim

AU - Kandemir, Melek

AU - Fripp, Jurgen

AU - Salvado, Olivier

AU - Bourgeat, Pierrick

AU - Reuter, Martin

AU - Fischl, Bruce

AU - Lorenzi, Marco

AU - Frisoni, Giovanni B.

AU - Pennec, Xavier

AU - Pierson, Ronald K.

AU - Gunter, Jeffrey L.

AU - Senjem, Matthew L.

AU - Jack, Clifford R.

AU - Guizard, Nicolas

AU - Fonov, Vladimir S.

AU - Collins, D. Louis

AU - Modat, Marc

AU - Cardoso, M. Jorge

AU - Leung, Kelvin K.

AU - Wang, Hongzhi

AU - Das, Sandhitsu R.

AU - Yushkevich, Paul A.

AU - Malone, Ian B.

AU - Fox, Nick C.

AU - Schott, Jonathan M.

AU - Ourselin, Sebastien

PY - 2015/12/1

Y1 - 2015/12/1

N2 - Structural MRI is widely used for investigating brain atrophy in many neurodegenerative disorders, with several research groups developing and publishing techniques to provide quantitative assessments of this longitudinal change. Often techniques are compared through computation of required sample size estimates for future clinical trials. However interpretation of such comparisons is rendered complex because, despite using the same publicly available cohorts, the various techniques have been assessed with different data exclusions and different statistical analysis models. We created the MIRIAD atrophy challenge in order to test various capabilities of atrophy measurement techniques. The data consisted of 69 subjects (46 Alzheimer's disease, 23 control) who were scanned multiple (up to twelve) times at nine visits over a follow-up period of one to two years, resulting in 708 total image sets. Nine participating groups from 6 countries completed the challenge by providing volumetric measurements of key structures (whole brain, lateral ventricle, left and right hippocampi) for each dataset and atrophy measurements of these structures for each time point pair (both forward and backward) of a given subject. From these results, we formally compared techniques using exactly the same dataset. First, we assessed the repeatability of each technique using rates obtained from short intervals where no measurable atrophy is expected. For those measures that provided direct measures of atrophy between pairs of images, we also assessed symmetry and transitivity. Then, we performed a statistical analysis in a consistent manner using linear mixed effect models. The models, one for repeated measures of volume made at multiple time-points and a second for repeated "direct" measures of change in brain volume, appropriately allowed for the correlation between measures made on the same subject and were shown to fit the data well. From these models, we obtained estimates of the distribution of atrophy rates in the Alzheimer's disease (AD) and control groups and of required sample sizes to detect a 25% treatment effect, in relation to healthy ageing, with 95% significance and 80% power over follow-up periods of 6, 12, and 24. months. Uncertainty in these estimates, and head-to-head comparisons between techniques, were carried out using the bootstrap. The lateral ventricles provided the most stable measurements, followed by the brain. The hippocampi had much more variability across participants, likely because of differences in segmentation protocol and less distinct boundaries. Most methods showed no indication of bias based on the short-term interval results, and direct measures provided good consistency in terms of symmetry and transitivity. The resulting annualized rates of change derived from the model ranged from, for whole brain: - 1.4% to - 2.2% (AD) and - 0.35% to - 0.67% (control), for ventricles: 4.6% to 10.2% (AD) and 1.2% to 3.4% (control), and for hippocampi: - 1.5% to - 7.0% (AD) and - 0.4% to - 1.4% (control). There were large and statistically significant differences in the sample size requirements between many of the techniques. The lowest sample sizes for each of these structures, for a trial with a 12. month follow-up period, were 242 (95% CI: 154 to 422) for whole brain, 168 (95% CI: 112 to 282) for ventricles, 190 (95% CI: 146 to 268) for left hippocampi, and 158 (95% CI: 116 to 228) for right hippocampi. This analysis represents one of the most extensive statistical comparisons of a large number of different atrophy measurement techniques from around the globe. The challenge data will remain online and publicly available so that other groups can assess their methods.

AB - Structural MRI is widely used for investigating brain atrophy in many neurodegenerative disorders, with several research groups developing and publishing techniques to provide quantitative assessments of this longitudinal change. Often techniques are compared through computation of required sample size estimates for future clinical trials. However interpretation of such comparisons is rendered complex because, despite using the same publicly available cohorts, the various techniques have been assessed with different data exclusions and different statistical analysis models. We created the MIRIAD atrophy challenge in order to test various capabilities of atrophy measurement techniques. The data consisted of 69 subjects (46 Alzheimer's disease, 23 control) who were scanned multiple (up to twelve) times at nine visits over a follow-up period of one to two years, resulting in 708 total image sets. Nine participating groups from 6 countries completed the challenge by providing volumetric measurements of key structures (whole brain, lateral ventricle, left and right hippocampi) for each dataset and atrophy measurements of these structures for each time point pair (both forward and backward) of a given subject. From these results, we formally compared techniques using exactly the same dataset. First, we assessed the repeatability of each technique using rates obtained from short intervals where no measurable atrophy is expected. For those measures that provided direct measures of atrophy between pairs of images, we also assessed symmetry and transitivity. Then, we performed a statistical analysis in a consistent manner using linear mixed effect models. The models, one for repeated measures of volume made at multiple time-points and a second for repeated "direct" measures of change in brain volume, appropriately allowed for the correlation between measures made on the same subject and were shown to fit the data well. From these models, we obtained estimates of the distribution of atrophy rates in the Alzheimer's disease (AD) and control groups and of required sample sizes to detect a 25% treatment effect, in relation to healthy ageing, with 95% significance and 80% power over follow-up periods of 6, 12, and 24. months. Uncertainty in these estimates, and head-to-head comparisons between techniques, were carried out using the bootstrap. The lateral ventricles provided the most stable measurements, followed by the brain. The hippocampi had much more variability across participants, likely because of differences in segmentation protocol and less distinct boundaries. Most methods showed no indication of bias based on the short-term interval results, and direct measures provided good consistency in terms of symmetry and transitivity. The resulting annualized rates of change derived from the model ranged from, for whole brain: - 1.4% to - 2.2% (AD) and - 0.35% to - 0.67% (control), for ventricles: 4.6% to 10.2% (AD) and 1.2% to 3.4% (control), and for hippocampi: - 1.5% to - 7.0% (AD) and - 0.4% to - 1.4% (control). There were large and statistically significant differences in the sample size requirements between many of the techniques. The lowest sample sizes for each of these structures, for a trial with a 12. month follow-up period, were 242 (95% CI: 154 to 422) for whole brain, 168 (95% CI: 112 to 282) for ventricles, 190 (95% CI: 146 to 268) for left hippocampi, and 158 (95% CI: 116 to 228) for right hippocampi. This analysis represents one of the most extensive statistical comparisons of a large number of different atrophy measurement techniques from around the globe. The challenge data will remain online and publicly available so that other groups can assess their methods.

UR - http://www.scopus.com/inward/record.url?scp=84942250272&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942250272&partnerID=8YFLogxK

U2 - 10.1016/j.neuroimage.2015.07.087

DO - 10.1016/j.neuroimage.2015.07.087

M3 - Article

C2 - 26275383

AN - SCOPUS:84942250272

SN - 1053-8119

VL - 123

SP - 149

EP - 164

JO - NeuroImage

JF - NeuroImage

ER -

Assessing atrophy measurement techniques in dementia: Results from the MIRIAD atrophy challenge

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this