Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision

Liwei Wang; Majid Rastegar-Mojarad; Ravikumar Komandur Elayavilli Komandur Elayavilli; Yanshan Wang; Hongfang Liu

doi:10.1109/ICHI-W.2018.00008

Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision

Liwei Wang, Majid Rastegar-Mojarad, Ravikumar Komandur Elayavilli Komandur Elayavilli, Yanshan Wang, Hongfang Liu

Digital Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

In the era of precision medicine, the clinical utility of next generation sequencing technology highly depends on the ability of interpreting the causality association of genetic variants and phenotyping which can be a labor intensive process. There are various resources available for cataloging such associations such as HGMD or ClinVar. Given the exponential growth in literature in the field, it is desired to accelerate the process by automatically identifying genetic causality statements from literature. Here, we define the task of identifying the statements as a classification task for sentences containing gene and disease entities. We used the cancer gene census available at the Catalogue of Somatic Mutations in Cancer (COSMIC) and to generate a weakly labeled data set for our classification task. We evaluated multiple feature sets such as: words, bi-grams, word embedding, and several machine-learning methods and showed the weighted F-measure around 95%. Evaluation using the top 50 genetic variant disease sentences demonstrated that the proposed method can identify genetic causality statements.

Original language	English (US)
Title of host publication	Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1-8
Number of pages	8
ISBN (Electronic)	9781538667774
DOIs	https://doi.org/10.1109/ICHI-W.2018.00008
State	Published - Jul 16 2018
Event	6th IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018 - New York, United States Duration: Jun 4 2018 → Jun 7 2018

Publication series

Name	Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018

Other

Other	6th IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018
Country/Territory	United States
City	New York
Period	6/4/18 → 6/7/18

Keywords

ClinVar
MutD
Semantic Medline
cancer
causality
classification
disease
distance supervision
genetic variant

ASJC Scopus subject areas

Information Systems and Management
Health Informatics

Access to Document

10.1109/ICHI-W.2018.00008

Cite this

Wang, L., Rastegar-Mojarad, M., Komandur Elayavilli, R. K. E., Wang, Y., & Liu, H. (2018). Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision. In Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018 (pp. 1-8). Article 8411674 (Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICHI-W.2018.00008

Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision. / Wang, Liwei; Rastegar-Mojarad, Majid; Komandur Elayavilli, Ravikumar Komandur Elayavilli et al.
Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1-8 8411674 (Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Wang, L, Rastegar-Mojarad, M, Komandur Elayavilli, RKE, Wang, Y & Liu, H 2018, Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision. in Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018., 8411674, Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018, Institute of Electrical and Electronics Engineers Inc., pp. 1-8, 6th IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018, New York, United States, 6/4/18. https://doi.org/10.1109/ICHI-W.2018.00008

Wang L, Rastegar-Mojarad M, Komandur Elayavilli RKE, Wang Y, Liu H. Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision. In Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1-8. 8411674. (Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018). doi: 10.1109/ICHI-W.2018.00008

Wang, Liwei ; Rastegar-Mojarad, Majid ; Komandur Elayavilli, Ravikumar Komandur Elayavilli et al. / Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision. Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1-8 (Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018).

@inproceedings{5b72e4c477524bf883c83df9a08bdf13,

title = "Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision",

abstract = "In the era of precision medicine, the clinical utility of next generation sequencing technology highly depends on the ability of interpreting the causality association of genetic variants and phenotyping which can be a labor intensive process. There are various resources available for cataloging such associations such as HGMD or ClinVar. Given the exponential growth in literature in the field, it is desired to accelerate the process by automatically identifying genetic causality statements from literature. Here, we define the task of identifying the statements as a classification task for sentences containing gene and disease entities. We used the cancer gene census available at the Catalogue of Somatic Mutations in Cancer (COSMIC) and to generate a weakly labeled data set for our classification task. We evaluated multiple feature sets such as: words, bi-grams, word embedding, and several machine-learning methods and showed the weighted F-measure around 95%. Evaluation using the top 50 genetic variant disease sentences demonstrated that the proposed method can identify genetic causality statements.",

keywords = "ClinVar, MutD, Semantic Medline, cancer, causality, classification, disease, distance supervision, genetic variant",

author = "Liwei Wang and Majid Rastegar-Mojarad and {Komandur Elayavilli}, {Ravikumar Komandur Elayavilli} and Yanshan Wang and Hongfang Liu",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 6th IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018 ; Conference date: 04-06-2018 Through 07-06-2018",

year = "2018",

month = jul,

day = "16",

doi = "10.1109/ICHI-W.2018.00008",

language = "English (US)",

series = "Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1--8",

booktitle = "Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018",

}

TY - GEN

T1 - Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision

AU - Wang, Liwei

AU - Rastegar-Mojarad, Majid

AU - Komandur Elayavilli, Ravikumar Komandur Elayavilli

AU - Wang, Yanshan

AU - Liu, Hongfang

PY - 2018/7/16

Y1 - 2018/7/16

N2 - In the era of precision medicine, the clinical utility of next generation sequencing technology highly depends on the ability of interpreting the causality association of genetic variants and phenotyping which can be a labor intensive process. There are various resources available for cataloging such associations such as HGMD or ClinVar. Given the exponential growth in literature in the field, it is desired to accelerate the process by automatically identifying genetic causality statements from literature. Here, we define the task of identifying the statements as a classification task for sentences containing gene and disease entities. We used the cancer gene census available at the Catalogue of Somatic Mutations in Cancer (COSMIC) and to generate a weakly labeled data set for our classification task. We evaluated multiple feature sets such as: words, bi-grams, word embedding, and several machine-learning methods and showed the weighted F-measure around 95%. Evaluation using the top 50 genetic variant disease sentences demonstrated that the proposed method can identify genetic causality statements.

AB - In the era of precision medicine, the clinical utility of next generation sequencing technology highly depends on the ability of interpreting the causality association of genetic variants and phenotyping which can be a labor intensive process. There are various resources available for cataloging such associations such as HGMD or ClinVar. Given the exponential growth in literature in the field, it is desired to accelerate the process by automatically identifying genetic causality statements from literature. Here, we define the task of identifying the statements as a classification task for sentences containing gene and disease entities. We used the cancer gene census available at the Catalogue of Somatic Mutations in Cancer (COSMIC) and to generate a weakly labeled data set for our classification task. We evaluated multiple feature sets such as: words, bi-grams, word embedding, and several machine-learning methods and showed the weighted F-measure around 95%. Evaluation using the top 50 genetic variant disease sentences demonstrated that the proposed method can identify genetic causality statements.

KW - ClinVar

KW - MutD

KW - Semantic Medline

KW - cancer

KW - causality

KW - classification

KW - disease

KW - distance supervision

KW - genetic variant

UR - http://www.scopus.com/inward/record.url?scp=85051030840&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051030840&partnerID=8YFLogxK

U2 - 10.1109/ICHI-W.2018.00008

DO - 10.1109/ICHI-W.2018.00008

M3 - Conference contribution

AN - SCOPUS:85051030840

T3 - Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018

SP - 1

EP - 8

BT - Proceedings - 2018 IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 6th IEEE International Conference on Healthcare Informatics Workshops, ICHI-W 2018

Y2 - 4 June 2018 through 7 June 2018

ER -

Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this