Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner

Robinette Renner; Shengyu Li; Yulong Huang; Ada Chaeli Van Der Zijp-Tan; Shaobo Tan; Dongqi Li; Mohan Vamsi Kasukurthi; Ryan Benton; Glen M. Borchert; Jingshan Huang; Guoqian Jiang

doi:10.1186/s12911-019-0979-5

Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner

Robinette Renner, Shengyu Li, Yulong Huang, Ada Chaeli Van Der Zijp-Tan, Shaobo Tan, Dongqi Li, Mohan Vamsi Kasukurthi, Ryan Benton, Glen M. Borchert, Jingshan Huang, Guoqian Jiang

Artificial Intelligence and Informatics

Research output: Contribution to journal › Article › peer-review

Abstract

Background: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. Methods: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. Results: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Discussion: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Conclusions: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.

Original language	English (US)
Article number	276
Journal	BMC Medical Informatics and Decision Making
Volume	19
DOIs	https://doi.org/10.1186/s12911-019-0979-5
State	Published - Dec 23 2019

Keywords

Artificial neural network
Biomedical research integrated domain group (BRIDG) model
Common data element
Schema mapping

ASJC Scopus subject areas

Health Policy
Health Informatics
Computer Science Applications

Access to Document

10.1186/s12911-019-0979-5

Cite this

Renner, R., Li, S., Huang, Y., Van Der Zijp-Tan, A. C., Tan, S., Li, D., Kasukurthi, M. V., Benton, R., Borchert, G. M., Huang, J., & Jiang, G. (2019). Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner. BMC Medical Informatics and Decision Making, 19, Article 276. https://doi.org/10.1186/s12911-019-0979-5

Renner, R, Li, S, Huang, Y, Van Der Zijp-Tan, AC, Tan, S, Li, D, Kasukurthi, MV, Benton, R, Borchert, GM, Huang, J & Jiang, G 2019, 'Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner', BMC Medical Informatics and Decision Making, vol. 19, 276. https://doi.org/10.1186/s12911-019-0979-5

@article{38e60bd02d3a4953801e126d65e791a3,

title = "Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner",

abstract = "Background: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. Methods: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. Results: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Discussion: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Conclusions: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.",

keywords = "Artificial neural network, Biomedical research integrated domain group (BRIDG) model, Common data element, Schema mapping",

author = "Robinette Renner and Shengyu Li and Yulong Huang and {Van Der Zijp-Tan}, {Ada Chaeli} and Shaobo Tan and Dongqi Li and Kasukurthi, {Mohan Vamsi} and Ryan Benton and Borchert, {Glen M.} and Jingshan Huang and Guoqian Jiang",

note = "Publisher Copyright: {\textcopyright} 2019 The Author(s).",

year = "2019",

month = dec,

day = "23",

doi = "10.1186/s12911-019-0979-5",

language = "English (US)",

volume = "19",

journal = "BMC Medical Informatics and Decision Making",

issn = "1472-6947",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner

AU - Renner, Robinette

AU - Li, Shengyu

AU - Huang, Yulong

AU - Van Der Zijp-Tan, Ada Chaeli

AU - Tan, Shaobo

AU - Li, Dongqi

AU - Kasukurthi, Mohan Vamsi

AU - Benton, Ryan

AU - Borchert, Glen M.

AU - Huang, Jingshan

AU - Jiang, Guoqian

PY - 2019/12/23

Y1 - 2019/12/23

N2 - Background: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. Methods: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. Results: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Discussion: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Conclusions: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.

AB - Background: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. Methods: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. Results: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Discussion: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Conclusions: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.

KW - Artificial neural network

KW - Biomedical research integrated domain group (BRIDG) model

KW - Common data element

KW - Schema mapping

UR - http://www.scopus.com/inward/record.url?scp=85077170263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077170263&partnerID=8YFLogxK

U2 - 10.1186/s12911-019-0979-5

DO - 10.1186/s12911-019-0979-5

M3 - Article

C2 - 31865899

AN - SCOPUS:85077170263

SN - 1472-6947

VL - 19

JO - BMC Medical Informatics and Decision Making

JF - BMC Medical Informatics and Decision Making

M1 - 276

ER -

Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this