TY - JOUR
T1 - Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
AU - Renner, Robinette
AU - Li, Shengyu
AU - Huang, Yulong
AU - Van Der Zijp-Tan, Ada Chaeli
AU - Tan, Shaobo
AU - Li, Dongqi
AU - Kasukurthi, Mohan Vamsi
AU - Benton, Ryan
AU - Borchert, Glen M.
AU - Huang, Jingshan
AU - Jiang, Guoqian
N1 - Funding Information:
The publication cost of this article was funded by The National Cancer Institute (NCI) of The National Institutes of Health (NIH), under the Award Numbers U01CA180982 and U01CA180940. The views contained in this paper are solely the responsibility of the authors and do not represent the official views, either expressed or implied, of the NIH or the U.S. government.
Funding Information:
AGNIS: A Growable Network Information System; ANN: Artificial neural network; BRIDG: Biomedical Research Integrated Domain Group; caDSR: Cancer Data Standards Registry and Repository; CBER: Center for Biologics Evaluation and Research; CDASH: Clinical Data Acquisition Standards Harmonization; CDE: Common Data Element; CDER: Center for Drug Evaluation and Research; CDISC: Clinical Data Interchange Standards Consortium; CIBMTR: Center for International Blood and Marrow Transplant Research; DEC: Data Element Concept; FDA: Federal Drug Administration; NCI: National Cancer Institute; NMDP: National Marrow Donor Program; OAANN: Ontology Alignment by Artificial Neural Network; SDTM: Study Data Tabulation Model; UML: Unified Modeling Language; VD: Value Domain
Publisher Copyright:
© 2019 The Author(s).
PY - 2019/12/23
Y1 - 2019/12/23
N2 - Background: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. Methods: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. Results: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Discussion: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Conclusions: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.
AB - Background: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. Methods: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. Results: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Discussion: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Conclusions: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.
KW - Artificial neural network
KW - Biomedical research integrated domain group (BRIDG) model
KW - Common data element
KW - Schema mapping
UR - http://www.scopus.com/inward/record.url?scp=85077170263&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077170263&partnerID=8YFLogxK
U2 - 10.1186/s12911-019-0979-5
DO - 10.1186/s12911-019-0979-5
M3 - Article
C2 - 31865899
AN - SCOPUS:85077170263
SN - 1472-6947
VL - 19
JO - BMC Medical Informatics and Decision Making
JF - BMC Medical Informatics and Decision Making
M1 - 276
ER -