Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform

Jeanette E. Eckel-Passow; Elizabeth J. Atkinson; Sooraj Maharjan; Sharon L.R. Kardia; Mariza de Andrade

doi:10.1186/1471-2105-12-220

Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform

Jeanette E. Eckel-Passow, Elizabeth J. Atkinson, Sooraj Maharjan, Sharon L.R. Kardia, Mariza de Andrade

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

40 Scopus citations

Abstract

Background: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments.Results: APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce.Conclusions: If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.

Original language	English (US)
Article number	220
Journal	BMC bioinformatics
Volume	12
DOIs	https://doi.org/10.1186/1471-2105-12-220
State	Published - May 31 2011

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-12-220

Cite this

@article{77ec1eef90d8492188fafcc7163b499f,

title = "Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform",

abstract = "Background: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments.Results: APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce.Conclusions: If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.",

author = "Eckel-Passow, {Jeanette E.} and Atkinson, {Elizabeth J.} and Sooraj Maharjan and Kardia, {Sharon L.R.} and {de Andrade}, Mariza",

note = "Funding Information: We thank Robert Scharpf, Henrik Bengtsson and Kai Wang for their helpful comments about CRLMM, Aroma.Affymetrix and PennCNV, respectively. The Genotyping Shared Resource of the Mayo Clinic Technology Center performed all of the genotyping. This paper was supported by the National Institutes of Health research grant R01 HL87660 (MdA and SLRK) and the Genetic Epidemiology and Risk Assessment (GERA) program (P30CA15083-36) from the National Cancer Institute.",

year = "2011",

month = may,

day = "31",

doi = "10.1186/1471-2105-12-220",

language = "English (US)",

volume = "12",

journal = "BMC bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform

AU - Eckel-Passow, Jeanette E.

AU - Atkinson, Elizabeth J.

AU - Maharjan, Sooraj

AU - Kardia, Sharon L.R.

AU - de Andrade, Mariza

N1 - Funding Information: We thank Robert Scharpf, Henrik Bengtsson and Kai Wang for their helpful comments about CRLMM, Aroma.Affymetrix and PennCNV, respectively. The Genotyping Shared Resource of the Mayo Clinic Technology Center performed all of the genotyping. This paper was supported by the National Institutes of Health research grant R01 HL87660 (MdA and SLRK) and the Genetic Epidemiology and Risk Assessment (GERA) program (P30CA15083-36) from the National Cancer Institute.

PY - 2011/5/31

Y1 - 2011/5/31

N2 - Background: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments.Results: APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce.Conclusions: If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.

AB - Background: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments.Results: APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce.Conclusions: If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.

UR - http://www.scopus.com/inward/record.url?scp=79957638660&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957638660&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-220

DO - 10.1186/1471-2105-12-220

M3 - Article

C2 - 21627824

AN - SCOPUS:79957638660

SN - 1471-2105

VL - 12

JO - BMC bioinformatics

JF - BMC bioinformatics

M1 - 220

ER -

Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this