Quality assessment metrics for whole genome gene expression profiling of paraffin embedded samples

Douglas W. Mahoney, Terry M Therneau, S. Keith Anderson, Jin Jen, Jean-Pierre Kocher, Monica M. Reinholz, Edith A. Perez, Jeanette E Eckel-Passow

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Background: Formalin fixed, paraffin embedded tissues are most commonly used for routine pathology analysis and for long term tissue preservation in the clinical setting. Many institutions have large archives of Formalin fixed, paraffin embedded tissues that provide a unique opportunity for understanding genomic signatures of disease. However, genome-wide expression profiling of Formalin fixed, paraffin embedded samples have been challenging due to RNA degradation. Because of the significant heterogeneity in tissue quality, normalization and analysis of these data presents particular challenges. The distribution of intensity values from archival tissues are inherently noisy and skewed due to differential sample degradation raising two primary concerns; whether a highly skewed array will unduly influence initial normalization of the data and whether outlier arrays can be reliably identified. Findings. Two simple extensions of common regression diagnostic measures are introduced that measure the stress an array undergoes during normalization and how much a given array deviates from the remaining arrays post-normalization. These metrics are applied to a study involving 1618 formalin-fixed, paraffin-embedded HER2-positive breast cancer samples from the N9831 adjuvant trial processed with Illumina's cDNA-mediated Annealing Selection extension and Ligation assay. Conclusion: Proper assessment of array quality within a research study is crucial for controlling unwanted variability in the data. The metrics proposed in this paper have direct biological interpretations and can be used to identify arrays that should either be removed from analysis all together or down-weighted to reduce their influence in downstream analyses.

Original languageEnglish (US)
Article number33
JournalBMC Research Notes
Volume6
Issue number1
DOIs
StatePublished - 2013

Fingerprint

Gene Expression Profiling
Gene expression
Paraffin
Formaldehyde
Genes
Genome
Tissue
Tissue Preservation
RNA Stability
Degradation
Ligation
Pathology
Complementary DNA
Regression Analysis
Breast Neoplasms
Assays
RNA
Annealing
Research

Keywords

  • Formalin-Fixed
  • High-dimensional array quality
  • Outlier detection
  • Paraffin-embedded tissue

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Quality assessment metrics for whole genome gene expression profiling of paraffin embedded samples. / Mahoney, Douglas W.; Therneau, Terry M; Anderson, S. Keith; Jen, Jin; Kocher, Jean-Pierre; Reinholz, Monica M.; Perez, Edith A.; Eckel-Passow, Jeanette E.

In: BMC Research Notes, Vol. 6, No. 1, 33, 2013.

Research output: Contribution to journalArticle

Mahoney, Douglas W. ; Therneau, Terry M ; Anderson, S. Keith ; Jen, Jin ; Kocher, Jean-Pierre ; Reinholz, Monica M. ; Perez, Edith A. ; Eckel-Passow, Jeanette E. / Quality assessment metrics for whole genome gene expression profiling of paraffin embedded samples. In: BMC Research Notes. 2013 ; Vol. 6, No. 1.
@article{6694b7efea1d453d9d10a10c0d62da15,
title = "Quality assessment metrics for whole genome gene expression profiling of paraffin embedded samples",
abstract = "Background: Formalin fixed, paraffin embedded tissues are most commonly used for routine pathology analysis and for long term tissue preservation in the clinical setting. Many institutions have large archives of Formalin fixed, paraffin embedded tissues that provide a unique opportunity for understanding genomic signatures of disease. However, genome-wide expression profiling of Formalin fixed, paraffin embedded samples have been challenging due to RNA degradation. Because of the significant heterogeneity in tissue quality, normalization and analysis of these data presents particular challenges. The distribution of intensity values from archival tissues are inherently noisy and skewed due to differential sample degradation raising two primary concerns; whether a highly skewed array will unduly influence initial normalization of the data and whether outlier arrays can be reliably identified. Findings. Two simple extensions of common regression diagnostic measures are introduced that measure the stress an array undergoes during normalization and how much a given array deviates from the remaining arrays post-normalization. These metrics are applied to a study involving 1618 formalin-fixed, paraffin-embedded HER2-positive breast cancer samples from the N9831 adjuvant trial processed with Illumina's cDNA-mediated Annealing Selection extension and Ligation assay. Conclusion: Proper assessment of array quality within a research study is crucial for controlling unwanted variability in the data. The metrics proposed in this paper have direct biological interpretations and can be used to identify arrays that should either be removed from analysis all together or down-weighted to reduce their influence in downstream analyses.",
keywords = "Formalin-Fixed, High-dimensional array quality, Outlier detection, Paraffin-embedded tissue",
author = "Mahoney, {Douglas W.} and Therneau, {Terry M} and Anderson, {S. Keith} and Jin Jen and Jean-Pierre Kocher and Reinholz, {Monica M.} and Perez, {Edith A.} and Eckel-Passow, {Jeanette E}",
year = "2013",
doi = "10.1186/1756-0500-6-33",
language = "English (US)",
volume = "6",
journal = "BMC Research Notes",
issn = "1756-0500",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Quality assessment metrics for whole genome gene expression profiling of paraffin embedded samples

AU - Mahoney, Douglas W.

AU - Therneau, Terry M

AU - Anderson, S. Keith

AU - Jen, Jin

AU - Kocher, Jean-Pierre

AU - Reinholz, Monica M.

AU - Perez, Edith A.

AU - Eckel-Passow, Jeanette E

PY - 2013

Y1 - 2013

N2 - Background: Formalin fixed, paraffin embedded tissues are most commonly used for routine pathology analysis and for long term tissue preservation in the clinical setting. Many institutions have large archives of Formalin fixed, paraffin embedded tissues that provide a unique opportunity for understanding genomic signatures of disease. However, genome-wide expression profiling of Formalin fixed, paraffin embedded samples have been challenging due to RNA degradation. Because of the significant heterogeneity in tissue quality, normalization and analysis of these data presents particular challenges. The distribution of intensity values from archival tissues are inherently noisy and skewed due to differential sample degradation raising two primary concerns; whether a highly skewed array will unduly influence initial normalization of the data and whether outlier arrays can be reliably identified. Findings. Two simple extensions of common regression diagnostic measures are introduced that measure the stress an array undergoes during normalization and how much a given array deviates from the remaining arrays post-normalization. These metrics are applied to a study involving 1618 formalin-fixed, paraffin-embedded HER2-positive breast cancer samples from the N9831 adjuvant trial processed with Illumina's cDNA-mediated Annealing Selection extension and Ligation assay. Conclusion: Proper assessment of array quality within a research study is crucial for controlling unwanted variability in the data. The metrics proposed in this paper have direct biological interpretations and can be used to identify arrays that should either be removed from analysis all together or down-weighted to reduce their influence in downstream analyses.

AB - Background: Formalin fixed, paraffin embedded tissues are most commonly used for routine pathology analysis and for long term tissue preservation in the clinical setting. Many institutions have large archives of Formalin fixed, paraffin embedded tissues that provide a unique opportunity for understanding genomic signatures of disease. However, genome-wide expression profiling of Formalin fixed, paraffin embedded samples have been challenging due to RNA degradation. Because of the significant heterogeneity in tissue quality, normalization and analysis of these data presents particular challenges. The distribution of intensity values from archival tissues are inherently noisy and skewed due to differential sample degradation raising two primary concerns; whether a highly skewed array will unduly influence initial normalization of the data and whether outlier arrays can be reliably identified. Findings. Two simple extensions of common regression diagnostic measures are introduced that measure the stress an array undergoes during normalization and how much a given array deviates from the remaining arrays post-normalization. These metrics are applied to a study involving 1618 formalin-fixed, paraffin-embedded HER2-positive breast cancer samples from the N9831 adjuvant trial processed with Illumina's cDNA-mediated Annealing Selection extension and Ligation assay. Conclusion: Proper assessment of array quality within a research study is crucial for controlling unwanted variability in the data. The metrics proposed in this paper have direct biological interpretations and can be used to identify arrays that should either be removed from analysis all together or down-weighted to reduce their influence in downstream analyses.

KW - Formalin-Fixed

KW - High-dimensional array quality

KW - Outlier detection

KW - Paraffin-embedded tissue

UR - http://www.scopus.com/inward/record.url?scp=84873027115&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873027115&partnerID=8YFLogxK

U2 - 10.1186/1756-0500-6-33

DO - 10.1186/1756-0500-6-33

M3 - Article

C2 - 23360712

AN - SCOPUS:84873027115

VL - 6

JO - BMC Research Notes

JF - BMC Research Notes

SN - 1756-0500

IS - 1

M1 - 33

ER -