A knowledge-based approach to predict intragenic deletions or duplications

Krishna R. Kalari; Thomas L. Casavant; Todd E. Scheetz

doi:10.1093/bioinformatics/btn370

A knowledge-based approach to predict intragenic deletions or duplications

Krishna R. Kalari, Thomas L. Casavant, Todd E. Scheetz

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Motivation: Despite recent improvements in high-throughput or classic molecular biology approaches it is still challenging to identify intermediate resolution genomic variations (50 bp to 50 kb). Although array-based technologies can be used to detect copy number variations in the human genome they are biased to detect only the largest such deletions or duplications. Several studies have identified deletions or duplications occurring within a gene that directly cause or predispose to disease. We have developed a novel computational system, SPeeDD (system to prioritize deletions or duplications) that utilizes machine learning techniques to predict likely candidate regions that delete or duplicate exon(s) within a gene. Results: Data mining and machine learning methods were applied to identify sequence features that were predictive of homologous recombination events. The logistic model tree (LMT) method yielded the best results. Sensitivity varied from 20% to 71.6% depending on the specific machine learning model used, but specificity exceeded 90% for all methods evaluated. In addition, the SPeeDD system successfully predicted and prioritized a recently published novel BRCA1 mutation. Conclusions: Results suggest that the SPeeDD system is effective at prioritizing candidate deletions and duplications within a gene. Use of SPeeDD enables more focused screening, which reduces the labor and associated costs of the molecular assays and may also lead to targeted design of new array-based screens to focus on candidate areas to accelerate the process of mutation discovery.

Original language	English (US)
Pages (from-to)	1975-1979
Number of pages	5
Journal	Bioinformatics
Volume	24
Issue number	18
DOIs	https://doi.org/10.1093/bioinformatics/btn370
State	Published - Sep 2008

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btn370

Cite this

@article{0e6352058fe341db98fd179d9609bb52,

title = "A knowledge-based approach to predict intragenic deletions or duplications",

abstract = "Motivation: Despite recent improvements in high-throughput or classic molecular biology approaches it is still challenging to identify intermediate resolution genomic variations (50 bp to 50 kb). Although array-based technologies can be used to detect copy number variations in the human genome they are biased to detect only the largest such deletions or duplications. Several studies have identified deletions or duplications occurring within a gene that directly cause or predispose to disease. We have developed a novel computational system, SPeeDD (system to prioritize deletions or duplications) that utilizes machine learning techniques to predict likely candidate regions that delete or duplicate exon(s) within a gene. Results: Data mining and machine learning methods were applied to identify sequence features that were predictive of homologous recombination events. The logistic model tree (LMT) method yielded the best results. Sensitivity varied from 20% to 71.6% depending on the specific machine learning model used, but specificity exceeded 90% for all methods evaluated. In addition, the SPeeDD system successfully predicted and prioritized a recently published novel BRCA1 mutation. Conclusions: Results suggest that the SPeeDD system is effective at prioritizing candidate deletions and duplications within a gene. Use of SPeeDD enables more focused screening, which reduces the labor and associated costs of the molecular assays and may also lead to targeted design of new array-based screens to focus on candidate areas to accelerate the process of mutation discovery.",

author = "Kalari, {Krishna R.} and Casavant, {Thomas L.} and Scheetz, {Todd E.}",

note = "Funding Information: calculation of melting temperatures. T.E.S. was partially supported through a Career Development Award from Research to Prevent Blindness. We thank the reviewers for their many helpful suggestions.",

year = "2008",

month = sep,

doi = "10.1093/bioinformatics/btn370",

language = "English (US)",

volume = "24",

pages = "1975--1979",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "18",

}

TY - JOUR

T1 - A knowledge-based approach to predict intragenic deletions or duplications

AU - Kalari, Krishna R.

AU - Casavant, Thomas L.

AU - Scheetz, Todd E.

N1 - Funding Information: calculation of melting temperatures. T.E.S. was partially supported through a Career Development Award from Research to Prevent Blindness. We thank the reviewers for their many helpful suggestions.

PY - 2008/9

Y1 - 2008/9

N2 - Motivation: Despite recent improvements in high-throughput or classic molecular biology approaches it is still challenging to identify intermediate resolution genomic variations (50 bp to 50 kb). Although array-based technologies can be used to detect copy number variations in the human genome they are biased to detect only the largest such deletions or duplications. Several studies have identified deletions or duplications occurring within a gene that directly cause or predispose to disease. We have developed a novel computational system, SPeeDD (system to prioritize deletions or duplications) that utilizes machine learning techniques to predict likely candidate regions that delete or duplicate exon(s) within a gene. Results: Data mining and machine learning methods were applied to identify sequence features that were predictive of homologous recombination events. The logistic model tree (LMT) method yielded the best results. Sensitivity varied from 20% to 71.6% depending on the specific machine learning model used, but specificity exceeded 90% for all methods evaluated. In addition, the SPeeDD system successfully predicted and prioritized a recently published novel BRCA1 mutation. Conclusions: Results suggest that the SPeeDD system is effective at prioritizing candidate deletions and duplications within a gene. Use of SPeeDD enables more focused screening, which reduces the labor and associated costs of the molecular assays and may also lead to targeted design of new array-based screens to focus on candidate areas to accelerate the process of mutation discovery.

AB - Motivation: Despite recent improvements in high-throughput or classic molecular biology approaches it is still challenging to identify intermediate resolution genomic variations (50 bp to 50 kb). Although array-based technologies can be used to detect copy number variations in the human genome they are biased to detect only the largest such deletions or duplications. Several studies have identified deletions or duplications occurring within a gene that directly cause or predispose to disease. We have developed a novel computational system, SPeeDD (system to prioritize deletions or duplications) that utilizes machine learning techniques to predict likely candidate regions that delete or duplicate exon(s) within a gene. Results: Data mining and machine learning methods were applied to identify sequence features that were predictive of homologous recombination events. The logistic model tree (LMT) method yielded the best results. Sensitivity varied from 20% to 71.6% depending on the specific machine learning model used, but specificity exceeded 90% for all methods evaluated. In addition, the SPeeDD system successfully predicted and prioritized a recently published novel BRCA1 mutation. Conclusions: Results suggest that the SPeeDD system is effective at prioritizing candidate deletions and duplications within a gene. Use of SPeeDD enables more focused screening, which reduces the labor and associated costs of the molecular assays and may also lead to targeted design of new array-based screens to focus on candidate areas to accelerate the process of mutation discovery.

UR - http://www.scopus.com/inward/record.url?scp=51749095722&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51749095722&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btn370

DO - 10.1093/bioinformatics/btn370

M3 - Article

C2 - 18647756

AN - SCOPUS:51749095722

SN - 1367-4803

VL - 24

SP - 1975

EP - 1979

JO - Bioinformatics

JF - Bioinformatics

IS - 18

ER -

A knowledge-based approach to predict intragenic deletions or duplications

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this