Glycosylation is a common and complex protein post-translational modification (PTM). In particular, mucin-type O-linked glycosylation is abundant and plays important biological functions. The number of determined glycosylation sites is still small and there remains the need of accurate computational prediction for annotation and functional understanding of proteins. PTM site prediction can be formulated as a machine learning task. An important step in applying machine learning to this task is encoding protein fragments as feature vectors. Here we assess existing encoding methods as well as an enhanced encoding method named composition of monomer spectrum (CMS) using support vector machines (SVMs). SVMs employing the existing encoding methods achieved AUC (area under ROC curve) of 90.3-91.3%, and ones employing CMS achieved AUC of 92.4%. Analysis of different encoding methods suggests the potential in further improving the prediction.
|Original language||English (US)|
|Number of pages||5|
|Journal||AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium|
|State||Published - 2009|
ASJC Scopus subject areas