Support vector machine-based mucin-type o-linked glycosylation site prediction using enhanced sequence feature encoding.

Manabu Torii, Hongfang Liu, Zhang Zhi Hu

Research output: Contribution to journalArticle

7 Scopus citations

Abstract

Glycosylation is a common and complex protein post-translational modification (PTM). In particular, mucin-type O-linked glycosylation is abundant and plays important biological functions. The number of determined glycosylation sites is still small and there remains the need of accurate computational prediction for annotation and functional understanding of proteins. PTM site prediction can be formulated as a machine learning task. An important step in applying machine learning to this task is encoding protein fragments as feature vectors. Here we assess existing encoding methods as well as an enhanced encoding method named composition of monomer spectrum (CMS) using support vector machines (SVMs). SVMs employing the existing encoding methods achieved AUC (area under ROC curve) of 90.3-91.3%, and ones employing CMS achieved AUC of 92.4%. Analysis of different encoding methods suggests the potential in further improving the prediction.

Original languageEnglish (US)
Pages (from-to)640-644
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Volume2009
StatePublished - 2009

ASJC Scopus subject areas

  • Medicine(all)

Fingerprint Dive into the research topics of 'Support vector machine-based mucin-type o-linked glycosylation site prediction using enhanced sequence feature encoding.'. Together they form a unique fingerprint.

  • Cite this