TagRecon: High-throughput mutation identification through sequence tagging

Surendra Dasari; Matthew C. Chambers; Robbert J. Slebos; Lisa J. Zimmerman; Amy Joan L. Ham; David L. Tabb

doi:10.1021/pr900850m

TagRecon: High-throughput mutation identification through sequence tagging

Surendra Dasari, Matthew C. Chambers, Robbert J. Slebos, Lisa J. Zimmerman, Amy Joan L. Ham, David L. Tabb

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

86 Scopus citations

Abstract

Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.

Original language	English (US)
Pages (from-to)	1716-1726
Number of pages	11
Journal	Journal of Proteome Research
Volume	9
Issue number	4
DOIs	https://doi.org/10.1021/pr900850m
State	Published - Apr 5 2010

Keywords

Bioinformatics
Hydroxyproline
Mutation
Sequence tagging

ASJC Scopus subject areas

General Chemistry
Biochemistry

Access to Document

10.1021/pr900850m

Cite this

@article{3c27f39e21374705878009919079a771,

title = "TagRecon: High-throughput mutation identification through sequence tagging",

abstract = "Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.",

keywords = "Bioinformatics, Hydroxyproline, Mutation, Sequence tagging",

author = "Surendra Dasari and Chambers, {Matthew C.} and Slebos, {Robbert J.} and Zimmerman, {Lisa J.} and Ham, {Amy Joan L.} and Tabb, {David L.}",

year = "2010",

month = apr,

day = "5",

doi = "10.1021/pr900850m",

language = "English (US)",

volume = "9",

pages = "1716--1726",

journal = "Journal of Proteome Research",

issn = "1535-3893",

publisher = "American Chemical Society",

number = "4",

}

TY - JOUR

T1 - TagRecon

T2 - High-throughput mutation identification through sequence tagging

AU - Dasari, Surendra

AU - Chambers, Matthew C.

AU - Slebos, Robbert J.

AU - Zimmerman, Lisa J.

AU - Ham, Amy Joan L.

AU - Tabb, David L.

PY - 2010/4/5

Y1 - 2010/4/5

N2 - Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.

AB - Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.

KW - Bioinformatics

KW - Hydroxyproline

KW - Mutation

KW - Sequence tagging

UR - http://www.scopus.com/inward/record.url?scp=77950660595&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950660595&partnerID=8YFLogxK

U2 - 10.1021/pr900850m

DO - 10.1021/pr900850m

M3 - Article

C2 - 20131910

AN - SCOPUS:77950660595

SN - 1535-3893

VL - 9

SP - 1716

EP - 1726

JO - Journal of Proteome Research

JF - Journal of Proteome Research

IS - 4

ER -

TagRecon: High-throughput mutation identification through sequence tagging

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this