PolyMAPr: Programs for polymorphism database mining, annotation, and functional analysis

Robert Freimuth, Gary D. Stormo, Howard L. McLeod

Research output: Contribution to journalArticle

30 Citations (Scopus)

Abstract

Pharmacogenomic and disease-association studies rely on identifying a comprehensive set of polymorphisms within candidate genes. Public SNP databases are a rich source of polymorphism data, but mining them effectively requires overcoming at least four challenges: ensuring accurate annotations for genes and polymorphisms, eliminating both inter- and intra-database redundancy, integrating data from multiple public sources with data generated locally, and prioritizing the variants for further study. PolyMAPr (Polymorphism Mining and Annotation Programs) was developed to overcome these challenges and to improve the efficiency of database mining and polymorphism annotation. PolyMAPr takes as input a file containing a list of genes to be processed and files containing each annotated gene sequence. Polymorphic sequences obtained from public databases (dbSNP, CGAP, and JSNP) or through local SNP discovery efforts, as well as oligonucleotide sequences (e.g., PCR primers), are mapped to the annotated gene sequences and named according to suggested nomenclature guidelines. The functional effects of nonsynonymous coding-region SNPs (cSNPs) and any variants that might alter exon splicing enhancer (ESE) sites, putative transcription factor binding sites, or intron-exon splice sites are predicted. The output files are accessible though a browser interface. In addition, the results are also provided in Extensible Markup Language (XML) format to facilitate uploading them into a local relational database. PolyMAPr increases the efficiency of mining public databases for genetic variants within candidate genes and provides a mechanism by which data from multiple sources (both public and private) can be uniformly integrated, thereby significantly reducing the effort required to obtain a comprehensive set of polymorphisms for pharmacogenomic and disease-association studies. PolyMAPr can be obtained from http://pharmacogenomics.wustl.edu.

Original languageEnglish (US)
Pages (from-to)110-117
Number of pages8
JournalHuman Mutation
Volume25
Issue number2
DOIs
StatePublished - 2005
Externally publishedYes

Fingerprint

Databases
Pharmacogenetics
Single Nucleotide Polymorphism
Genes
Information Storage and Retrieval
Exons
Organizational Efficiency
Genetic Databases
Molecular Sequence Annotation
Data Mining
Terminology
Oligonucleotides
Introns
Transcription Factors
Language
Binding Sites
Guidelines
Polymerase Chain Reaction

Keywords

  • Annotation
  • Database mining
  • Pharmacogenetics
  • Pharmacogenornics
  • Polymorphism
  • SNP

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

PolyMAPr : Programs for polymorphism database mining, annotation, and functional analysis. / Freimuth, Robert; Stormo, Gary D.; McLeod, Howard L.

In: Human Mutation, Vol. 25, No. 2, 2005, p. 110-117.

Research output: Contribution to journalArticle

Freimuth, Robert ; Stormo, Gary D. ; McLeod, Howard L. / PolyMAPr : Programs for polymorphism database mining, annotation, and functional analysis. In: Human Mutation. 2005 ; Vol. 25, No. 2. pp. 110-117.
@article{09313a89788c4402a006dc9b1fb59800,
title = "PolyMAPr: Programs for polymorphism database mining, annotation, and functional analysis",
abstract = "Pharmacogenomic and disease-association studies rely on identifying a comprehensive set of polymorphisms within candidate genes. Public SNP databases are a rich source of polymorphism data, but mining them effectively requires overcoming at least four challenges: ensuring accurate annotations for genes and polymorphisms, eliminating both inter- and intra-database redundancy, integrating data from multiple public sources with data generated locally, and prioritizing the variants for further study. PolyMAPr (Polymorphism Mining and Annotation Programs) was developed to overcome these challenges and to improve the efficiency of database mining and polymorphism annotation. PolyMAPr takes as input a file containing a list of genes to be processed and files containing each annotated gene sequence. Polymorphic sequences obtained from public databases (dbSNP, CGAP, and JSNP) or through local SNP discovery efforts, as well as oligonucleotide sequences (e.g., PCR primers), are mapped to the annotated gene sequences and named according to suggested nomenclature guidelines. The functional effects of nonsynonymous coding-region SNPs (cSNPs) and any variants that might alter exon splicing enhancer (ESE) sites, putative transcription factor binding sites, or intron-exon splice sites are predicted. The output files are accessible though a browser interface. In addition, the results are also provided in Extensible Markup Language (XML) format to facilitate uploading them into a local relational database. PolyMAPr increases the efficiency of mining public databases for genetic variants within candidate genes and provides a mechanism by which data from multiple sources (both public and private) can be uniformly integrated, thereby significantly reducing the effort required to obtain a comprehensive set of polymorphisms for pharmacogenomic and disease-association studies. PolyMAPr can be obtained from http://pharmacogenomics.wustl.edu.",
keywords = "Annotation, Database mining, Pharmacogenetics, Pharmacogenornics, Polymorphism, SNP",
author = "Robert Freimuth and Stormo, {Gary D.} and McLeod, {Howard L.}",
year = "2005",
doi = "10.1002/humu.20123",
language = "English (US)",
volume = "25",
pages = "110--117",
journal = "Human Mutation",
issn = "1059-7794",
publisher = "Wiley-Liss Inc.",
number = "2",

}

TY - JOUR

T1 - PolyMAPr

T2 - Programs for polymorphism database mining, annotation, and functional analysis

AU - Freimuth, Robert

AU - Stormo, Gary D.

AU - McLeod, Howard L.

PY - 2005

Y1 - 2005

N2 - Pharmacogenomic and disease-association studies rely on identifying a comprehensive set of polymorphisms within candidate genes. Public SNP databases are a rich source of polymorphism data, but mining them effectively requires overcoming at least four challenges: ensuring accurate annotations for genes and polymorphisms, eliminating both inter- and intra-database redundancy, integrating data from multiple public sources with data generated locally, and prioritizing the variants for further study. PolyMAPr (Polymorphism Mining and Annotation Programs) was developed to overcome these challenges and to improve the efficiency of database mining and polymorphism annotation. PolyMAPr takes as input a file containing a list of genes to be processed and files containing each annotated gene sequence. Polymorphic sequences obtained from public databases (dbSNP, CGAP, and JSNP) or through local SNP discovery efforts, as well as oligonucleotide sequences (e.g., PCR primers), are mapped to the annotated gene sequences and named according to suggested nomenclature guidelines. The functional effects of nonsynonymous coding-region SNPs (cSNPs) and any variants that might alter exon splicing enhancer (ESE) sites, putative transcription factor binding sites, or intron-exon splice sites are predicted. The output files are accessible though a browser interface. In addition, the results are also provided in Extensible Markup Language (XML) format to facilitate uploading them into a local relational database. PolyMAPr increases the efficiency of mining public databases for genetic variants within candidate genes and provides a mechanism by which data from multiple sources (both public and private) can be uniformly integrated, thereby significantly reducing the effort required to obtain a comprehensive set of polymorphisms for pharmacogenomic and disease-association studies. PolyMAPr can be obtained from http://pharmacogenomics.wustl.edu.

AB - Pharmacogenomic and disease-association studies rely on identifying a comprehensive set of polymorphisms within candidate genes. Public SNP databases are a rich source of polymorphism data, but mining them effectively requires overcoming at least four challenges: ensuring accurate annotations for genes and polymorphisms, eliminating both inter- and intra-database redundancy, integrating data from multiple public sources with data generated locally, and prioritizing the variants for further study. PolyMAPr (Polymorphism Mining and Annotation Programs) was developed to overcome these challenges and to improve the efficiency of database mining and polymorphism annotation. PolyMAPr takes as input a file containing a list of genes to be processed and files containing each annotated gene sequence. Polymorphic sequences obtained from public databases (dbSNP, CGAP, and JSNP) or through local SNP discovery efforts, as well as oligonucleotide sequences (e.g., PCR primers), are mapped to the annotated gene sequences and named according to suggested nomenclature guidelines. The functional effects of nonsynonymous coding-region SNPs (cSNPs) and any variants that might alter exon splicing enhancer (ESE) sites, putative transcription factor binding sites, or intron-exon splice sites are predicted. The output files are accessible though a browser interface. In addition, the results are also provided in Extensible Markup Language (XML) format to facilitate uploading them into a local relational database. PolyMAPr increases the efficiency of mining public databases for genetic variants within candidate genes and provides a mechanism by which data from multiple sources (both public and private) can be uniformly integrated, thereby significantly reducing the effort required to obtain a comprehensive set of polymorphisms for pharmacogenomic and disease-association studies. PolyMAPr can be obtained from http://pharmacogenomics.wustl.edu.

KW - Annotation

KW - Database mining

KW - Pharmacogenetics

KW - Pharmacogenornics

KW - Polymorphism

KW - SNP

UR - http://www.scopus.com/inward/record.url?scp=13544266572&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=13544266572&partnerID=8YFLogxK

U2 - 10.1002/humu.20123

DO - 10.1002/humu.20123

M3 - Article

C2 - 15643605

AN - SCOPUS:13544266572

VL - 25

SP - 110

EP - 117

JO - Human Mutation

JF - Human Mutation

SN - 1059-7794

IS - 2

ER -