NdPASA: A novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities

Junwen Wang, Jin An Feng

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25% sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20% sequence identity. For sequence pairs sharing 13-21% sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6%. NdPASA is most effective for aligning query sequences with template sequences whose structure is known.

Original languageEnglish (US)
Pages (from-to)628-637
Number of pages10
JournalProteins: Structure, Function and Genetics
Volume58
Issue number3
DOIs
StatePublished - Feb 15 2005
Externally publishedYes

Fingerprint

Sequence Alignment
Amino Acids
Proteins
Databases
Computational Biology
Sequence Analysis
Biomedical Research
Bioinformatics
Conformations

Keywords

  • Propensity
  • Protein structures
  • Secondary structure
  • Sequence alignment
  • Sequence pattern

ASJC Scopus subject areas

  • Genetics
  • Structural Biology
  • Biochemistry

Cite this

@article{ae6844c0e2754825853194fd60f045e2,
title = "NdPASA: A novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities",
abstract = "Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25{\%} sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20{\%} sequence identity. For sequence pairs sharing 13-21{\%} sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6{\%}. NdPASA is most effective for aligning query sequences with template sequences whose structure is known.",
keywords = "Propensity, Protein structures, Secondary structure, Sequence alignment, Sequence pattern",
author = "Junwen Wang and Feng, {Jin An}",
year = "2005",
month = "2",
day = "15",
doi = "10.1002/prot.20359",
language = "English (US)",
volume = "58",
pages = "628--637",
journal = "Proteins: Structure, Function and Bioinformatics",
issn = "0887-3585",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - NdPASA

T2 - A novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities

AU - Wang, Junwen

AU - Feng, Jin An

PY - 2005/2/15

Y1 - 2005/2/15

N2 - Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25% sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20% sequence identity. For sequence pairs sharing 13-21% sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6%. NdPASA is most effective for aligning query sequences with template sequences whose structure is known.

AB - Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25% sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20% sequence identity. For sequence pairs sharing 13-21% sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6%. NdPASA is most effective for aligning query sequences with template sequences whose structure is known.

KW - Propensity

KW - Protein structures

KW - Secondary structure

KW - Sequence alignment

KW - Sequence pattern

UR - http://www.scopus.com/inward/record.url?scp=12944288129&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12944288129&partnerID=8YFLogxK

U2 - 10.1002/prot.20359

DO - 10.1002/prot.20359

M3 - Article

C2 - 15616964

AN - SCOPUS:12944288129

VL - 58

SP - 628

EP - 637

JO - Proteins: Structure, Function and Bioinformatics

JF - Proteins: Structure, Function and Bioinformatics

SN - 0887-3585

IS - 3

ER -