TagRecon: High-throughput mutation identification through sequence tagging

Surendra Dasari, Matthew C. Chambers, Robbert J. Slebos, Lisa J. Zimmerman, A. J L Ham, David L. Tabb

Research output: Contribution to journalArticle

74 Citations (Scopus)

Abstract

Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.

Original languageEnglish (US)
Pages (from-to)1716-1726
Number of pages11
JournalJournal of Proteome Research
Volume9
Issue number4
DOIs
StatePublished - Apr 5 2010
Externally publishedYes

Fingerprint

Throughput
Proteomics
Peptides
Mutation
DNA Mismatch Repair
Databases
Repair
Cells
Tissue
Cell Line
Search Engine
Hydroxyproline
Firearms
Search engines
Colonic Neoplasms
Extracellular Matrix
Tumors
Neoplasms
Colon
Software

Keywords

  • Bioinformatics
  • Hydroxyproline
  • Mutation
  • Sequence tagging

ASJC Scopus subject areas

  • Biochemistry
  • Chemistry(all)

Cite this

Dasari, S., Chambers, M. C., Slebos, R. J., Zimmerman, L. J., Ham, A. J. L., & Tabb, D. L. (2010). TagRecon: High-throughput mutation identification through sequence tagging. Journal of Proteome Research, 9(4), 1716-1726. https://doi.org/10.1021/pr900850m

TagRecon : High-throughput mutation identification through sequence tagging. / Dasari, Surendra; Chambers, Matthew C.; Slebos, Robbert J.; Zimmerman, Lisa J.; Ham, A. J L; Tabb, David L.

In: Journal of Proteome Research, Vol. 9, No. 4, 05.04.2010, p. 1716-1726.

Research output: Contribution to journalArticle

Dasari, S, Chambers, MC, Slebos, RJ, Zimmerman, LJ, Ham, AJL & Tabb, DL 2010, 'TagRecon: High-throughput mutation identification through sequence tagging', Journal of Proteome Research, vol. 9, no. 4, pp. 1716-1726. https://doi.org/10.1021/pr900850m
Dasari, Surendra ; Chambers, Matthew C. ; Slebos, Robbert J. ; Zimmerman, Lisa J. ; Ham, A. J L ; Tabb, David L. / TagRecon : High-throughput mutation identification through sequence tagging. In: Journal of Proteome Research. 2010 ; Vol. 9, No. 4. pp. 1716-1726.
@article{3c27f39e21374705878009919079a771,
title = "TagRecon: High-throughput mutation identification through sequence tagging",
abstract = "Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6{\%} of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.",
keywords = "Bioinformatics, Hydroxyproline, Mutation, Sequence tagging",
author = "Surendra Dasari and Chambers, {Matthew C.} and Slebos, {Robbert J.} and Zimmerman, {Lisa J.} and Ham, {A. J L} and Tabb, {David L.}",
year = "2010",
month = "4",
day = "5",
doi = "10.1021/pr900850m",
language = "English (US)",
volume = "9",
pages = "1716--1726",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "4",

}

TY - JOUR

T1 - TagRecon

T2 - High-throughput mutation identification through sequence tagging

AU - Dasari, Surendra

AU - Chambers, Matthew C.

AU - Slebos, Robbert J.

AU - Zimmerman, Lisa J.

AU - Ham, A. J L

AU - Tabb, David L.

PY - 2010/4/5

Y1 - 2010/4/5

N2 - Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.

AB - Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.

KW - Bioinformatics

KW - Hydroxyproline

KW - Mutation

KW - Sequence tagging

UR - http://www.scopus.com/inward/record.url?scp=77950660595&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950660595&partnerID=8YFLogxK

U2 - 10.1021/pr900850m

DO - 10.1021/pr900850m

M3 - Article

C2 - 20131910

AN - SCOPUS:77950660595

VL - 9

SP - 1716

EP - 1726

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 4

ER -