Overview of the BioCreative VI Precision Medicine Track

Mining protein interactions and mutations for precision medicine

Rezarta Islamaj Doǧan, Sun Kim, Andrew Chatr-Aryamontri, Chih Hsuan Wei, Donald C. Comeau, Rui Antunes, Sdrgio Matos, Qingyu Chen, Aparna Elangovan, Nagesh C. Panyam, Karin Verspoor, Hongfang D Liu, Yanshan Wang, Zhuang Liu, Berna Altlnel, Zehra Melce Hüsünbeyi, Arzucan Özgür, Aris Fergadis, Chen Kai Wang, Hong Jie Dai & 7 others Tung Tran, Ramakanth Kavuluru, Ling Luo, Albert Steppi, Jinfeng Zhang, Jinchan Qu, Zhiyong Lu

Research output: Contribution to journalReview article

Abstract

The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein- protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct textmining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the textmining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine informationrelated curation.

Original languageEnglish (US)
JournalDatabase
Volume2019
DOIs
StatePublished - Jan 28 2019

Fingerprint

Precision Medicine
Medicine
medicine
Data Mining
Proteins
mutation
Mutation
Triage
PubMed
Knowledge Bases
proteins
Genes
Learning systems
protein-protein interactions
Genome
Literature
Molecular interactions
Electronic Health Records
Scaffolds
Molecular Biology

ASJC Scopus subject areas

  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Islamaj Doǧan, R., Kim, S., Chatr-Aryamontri, A., Wei, C. H., Comeau, D. C., Antunes, R., ... Lu, Z. (2019). Overview of the BioCreative VI Precision Medicine Track: Mining protein interactions and mutations for precision medicine. Database, 2019. https://doi.org/10.1093/database/bay147

Overview of the BioCreative VI Precision Medicine Track : Mining protein interactions and mutations for precision medicine. / Islamaj Doǧan, Rezarta; Kim, Sun; Chatr-Aryamontri, Andrew; Wei, Chih Hsuan; Comeau, Donald C.; Antunes, Rui; Matos, Sdrgio; Chen, Qingyu; Elangovan, Aparna; Panyam, Nagesh C.; Verspoor, Karin; Liu, Hongfang D; Wang, Yanshan; Liu, Zhuang; Altlnel, Berna; Hüsünbeyi, Zehra Melce; Özgür, Arzucan; Fergadis, Aris; Wang, Chen Kai; Dai, Hong Jie; Tran, Tung; Kavuluru, Ramakanth; Luo, Ling; Steppi, Albert; Zhang, Jinfeng; Qu, Jinchan; Lu, Zhiyong.

In: Database, Vol. 2019, 28.01.2019.

Research output: Contribution to journalReview article

Islamaj Doǧan, R, Kim, S, Chatr-Aryamontri, A, Wei, CH, Comeau, DC, Antunes, R, Matos, S, Chen, Q, Elangovan, A, Panyam, NC, Verspoor, K, Liu, HD, Wang, Y, Liu, Z, Altlnel, B, Hüsünbeyi, ZM, Özgür, A, Fergadis, A, Wang, CK, Dai, HJ, Tran, T, Kavuluru, R, Luo, L, Steppi, A, Zhang, J, Qu, J & Lu, Z 2019, 'Overview of the BioCreative VI Precision Medicine Track: Mining protein interactions and mutations for precision medicine', Database, vol. 2019. https://doi.org/10.1093/database/bay147
Islamaj Doǧan, Rezarta ; Kim, Sun ; Chatr-Aryamontri, Andrew ; Wei, Chih Hsuan ; Comeau, Donald C. ; Antunes, Rui ; Matos, Sdrgio ; Chen, Qingyu ; Elangovan, Aparna ; Panyam, Nagesh C. ; Verspoor, Karin ; Liu, Hongfang D ; Wang, Yanshan ; Liu, Zhuang ; Altlnel, Berna ; Hüsünbeyi, Zehra Melce ; Özgür, Arzucan ; Fergadis, Aris ; Wang, Chen Kai ; Dai, Hong Jie ; Tran, Tung ; Kavuluru, Ramakanth ; Luo, Ling ; Steppi, Albert ; Zhang, Jinfeng ; Qu, Jinchan ; Lu, Zhiyong. / Overview of the BioCreative VI Precision Medicine Track : Mining protein interactions and mutations for precision medicine. In: Database. 2019 ; Vol. 2019.
@article{f5a4394fb7cf46998393de29ef6a9720,
title = "Overview of the BioCreative VI Precision Medicine Track: Mining protein interactions and mutations for precision medicine",
abstract = "The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein- protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct textmining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the textmining system predictions with human annotations, for the triage task, the best F-score was 69.06{\%}, the best precision was 62.89{\%}, the best recall was 98.0{\%} and the best average precision was 72.5{\%}. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73{\%}, the best precision was 46.5{\%} and the best recall was 54.1{\%}. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine informationrelated curation.",
author = "{Islamaj Doǧan}, Rezarta and Sun Kim and Andrew Chatr-Aryamontri and Wei, {Chih Hsuan} and Comeau, {Donald C.} and Rui Antunes and Sdrgio Matos and Qingyu Chen and Aparna Elangovan and Panyam, {Nagesh C.} and Karin Verspoor and Liu, {Hongfang D} and Yanshan Wang and Zhuang Liu and Berna Altlnel and H{\"u}s{\"u}nbeyi, {Zehra Melce} and Arzucan {\"O}zg{\"u}r and Aris Fergadis and Wang, {Chen Kai} and Dai, {Hong Jie} and Tung Tran and Ramakanth Kavuluru and Ling Luo and Albert Steppi and Jinfeng Zhang and Jinchan Qu and Zhiyong Lu",
year = "2019",
month = "1",
day = "28",
doi = "10.1093/database/bay147",
language = "English (US)",
volume = "2019",
journal = "Database : the journal of biological databases and curation",
issn = "1758-0463",
publisher = "Oxford University Press",

}

TY - JOUR

T1 - Overview of the BioCreative VI Precision Medicine Track

T2 - Mining protein interactions and mutations for precision medicine

AU - Islamaj Doǧan, Rezarta

AU - Kim, Sun

AU - Chatr-Aryamontri, Andrew

AU - Wei, Chih Hsuan

AU - Comeau, Donald C.

AU - Antunes, Rui

AU - Matos, Sdrgio

AU - Chen, Qingyu

AU - Elangovan, Aparna

AU - Panyam, Nagesh C.

AU - Verspoor, Karin

AU - Liu, Hongfang D

AU - Wang, Yanshan

AU - Liu, Zhuang

AU - Altlnel, Berna

AU - Hüsünbeyi, Zehra Melce

AU - Özgür, Arzucan

AU - Fergadis, Aris

AU - Wang, Chen Kai

AU - Dai, Hong Jie

AU - Tran, Tung

AU - Kavuluru, Ramakanth

AU - Luo, Ling

AU - Steppi, Albert

AU - Zhang, Jinfeng

AU - Qu, Jinchan

AU - Lu, Zhiyong

PY - 2019/1/28

Y1 - 2019/1/28

N2 - The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein- protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct textmining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the textmining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine informationrelated curation.

AB - The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate genome annotation, disease association studies, electronic health records and other data types. The biomedical literature provides a rich foundation for populating these KBs, reporting genetic and molecular interactions that provide the scaffold for the cellular regulatory systems and detailing the influence of genetic variants in these interactions. The goal of BioCreative VI Precision Medicine Track was to extract this particular type of information and was organized in two tasks: (i) document triage task, focused on identifying scientific literature containing experimentally verified protein- protein interactions (PPIs) affected by genetic mutations and (ii) relation extraction task, focused on extracting the affected interactions (protein pairs). To assist system developers and task participants, a large-scale corpus of PubMed documents was manually annotated for this task. Ten teams worldwide contributed 22 distinct textmining models for the document triage task, and six teams worldwide contributed 14 different text-mining systems for the relation extraction task. When comparing the textmining system predictions with human annotations, for the triage task, the best F-score was 69.06%, the best precision was 62.89%, the best recall was 98.0% and the best average precision was 72.5%. For the relation extraction task, when taking homologous genes into account, the best F-score was 37.73%, the best precision was 46.5% and the best recall was 54.1%. Submitted systems explored a wide range of methods, from traditional rule-based, statistical and machine learning systems to state-of-the-art deep learning methods. Given the level of participation and the individual team results we find the precision medicine track to be successful in engaging the text-mining research community. In the meantime, the track produced a manually annotated corpus of 5509 PubMed documents developed by BioGRID curators and relevant for precision medicine. The data set is freely available to the community, and the specific interactions have been integrated into the BioGRID data set. In addition, this challenge provided the first results of automatically identifying PubMed articles that describe PPI affected by mutations, as well as extracting the affected relations from those articles. Still, much progress is needed for computer-assisted precision medicine text mining to become mainstream. Future work should focus on addressing the remaining technical challenges and incorporating the practical benefits of text-mining tools into real-world precision medicine informationrelated curation.

UR - http://www.scopus.com/inward/record.url?scp=85060617421&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060617421&partnerID=8YFLogxK

U2 - 10.1093/database/bay147

DO - 10.1093/database/bay147

M3 - Review article

VL - 2019

JO - Database : the journal of biological databases and curation

JF - Database : the journal of biological databases and curation

SN - 1758-0463

ER -