Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision

Liwei Wang, Majid Rastegar-Mojarad, Ravikumar Komandur Elayavilli Komandur Elayavilli, Yanshan Wang, Hongfang Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the era of precision medicine, the clinical utility of next generation sequencing technology highly depends on the ability of interpreting the causality association of genetic variants and phenotyping which can be a labor intensive process. There are various resources available for cataloging such associations such as HGMD or ClinVar. Given the exponential growth in literature in the field, it is desired to accelerate the process by automatically identifying genetic causality statements from literature. Here, we define the task of identifying the statements as a classification task for sentences containing gene and disease entities. We used the cancer gene census available at the Catalogue of Somatic Mutations in Cancer (COSMIC) and to generate a weakly labeled data set for our classification task. We evaluated multiple feature sets such as: words, bi-grams, word embedding, and several machine-learning methods and showed the weighted F-measure around 95%. Evaluation using the top 50 genetic variant disease sentences demonstrated that the proposed method can identify genetic causality statements.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-8
Number of pages8
ISBN (Electronic)9781538667774
DOIs
StatePublished - Jul 16 2018
Event6th IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018 - New York, United States
Duration: Jun 4 2018Jun 7 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018

Other

Other6th IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018
Country/TerritoryUnited States
CityNew York
Period6/4/186/7/18

Keywords

  • ClinVar
  • MutD
  • Semantic Medline
  • cancer
  • causality
  • classification
  • disease
  • distance supervision
  • genetic variant

ASJC Scopus subject areas

  • Information Systems and Management
  • Health Informatics

Fingerprint

Dive into the research topics of 'Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision'. Together they form a unique fingerprint.

Cite this