Considerations for automated machine learning in clinical metabolic profiling

Altered homocysteine plasma concentration associated with metformin exposure

Alena Orlenko, Jason H. Moore, Patryk Orzechowski, Randal S. Olson, Junmei Cairns, Pedro Caraballo, Richard M Weinshilboum, Liewei M Wang, Matthew K. Breitenstein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency – evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.

Original languageEnglish (US)
Title of host publicationPACIFIC SYMPOSIUM ON BIOCOMPUTING 2018
PublisherWorld Scientific Publishing Co. Pte Ltd
Pages460-471
Number of pages12
Edition212669
ISBN (Print)9789813235533
DOIs
StatePublished - Jan 1 2018
Event23rd Pacific Symposium on Biocomputing, PSB 2018 - Kohala Coast, United States
Duration: Jan 3 2018Jan 7 2018

Other

Other23rd Pacific Symposium on Biocomputing, PSB 2018
CountryUnited States
CityKohala Coast
Period1/3/181/7/18

Fingerprint

Learning systems
Plasmas
Metabolites
Feature extraction
Vitamins

Keywords

  • Automated machine learning
  • Biobank
  • Clinical metabolic profiling
  • Confounding
  • Homocysteine
  • Metabolomics
  • Metformin
  • Pharmacometabolomics
  • Precision medicine

ASJC Scopus subject areas

  • Biomedical Engineering
  • Computational Theory and Mathematics

Cite this

Orlenko, A., Moore, J. H., Orzechowski, P., Olson, R. S., Cairns, J., Caraballo, P., ... Breitenstein, M. K. (2018). Considerations for automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018 (212669 ed., pp. 460-471). World Scientific Publishing Co. Pte Ltd. https://doi.org/10.1142/9789813235533_0042

Considerations for automated machine learning in clinical metabolic profiling : Altered homocysteine plasma concentration associated with metformin exposure. / Orlenko, Alena; Moore, Jason H.; Orzechowski, Patryk; Olson, Randal S.; Cairns, Junmei; Caraballo, Pedro; Weinshilboum, Richard M; Wang, Liewei M; Breitenstein, Matthew K.

PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018. 212669. ed. World Scientific Publishing Co. Pte Ltd, 2018. p. 460-471.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Orlenko, A, Moore, JH, Orzechowski, P, Olson, RS, Cairns, J, Caraballo, P, Weinshilboum, RM, Wang, LM & Breitenstein, MK 2018, Considerations for automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure. in PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018. 212669 edn, World Scientific Publishing Co. Pte Ltd, pp. 460-471, 23rd Pacific Symposium on Biocomputing, PSB 2018, Kohala Coast, United States, 1/3/18. https://doi.org/10.1142/9789813235533_0042
Orlenko A, Moore JH, Orzechowski P, Olson RS, Cairns J, Caraballo P et al. Considerations for automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018. 212669 ed. World Scientific Publishing Co. Pte Ltd. 2018. p. 460-471 https://doi.org/10.1142/9789813235533_0042
Orlenko, Alena ; Moore, Jason H. ; Orzechowski, Patryk ; Olson, Randal S. ; Cairns, Junmei ; Caraballo, Pedro ; Weinshilboum, Richard M ; Wang, Liewei M ; Breitenstein, Matthew K. / Considerations for automated machine learning in clinical metabolic profiling : Altered homocysteine plasma concentration associated with metformin exposure. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018. 212669. ed. World Scientific Publishing Co. Pte Ltd, 2018. pp. 460-471
@inproceedings{6507cd039002423bb49d80dcc3f6137d,
title = "Considerations for automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure",
abstract = "With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency – evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.",
keywords = "Automated machine learning, Biobank, Clinical metabolic profiling, Confounding, Homocysteine, Metabolomics, Metformin, Pharmacometabolomics, Precision medicine",
author = "Alena Orlenko and Moore, {Jason H.} and Patryk Orzechowski and Olson, {Randal S.} and Junmei Cairns and Pedro Caraballo and Weinshilboum, {Richard M} and Wang, {Liewei M} and Breitenstein, {Matthew K.}",
year = "2018",
month = "1",
day = "1",
doi = "10.1142/9789813235533_0042",
language = "English (US)",
isbn = "9789813235533",
pages = "460--471",
booktitle = "PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018",
publisher = "World Scientific Publishing Co. Pte Ltd",
address = "Singapore",
edition = "212669",

}

TY - GEN

T1 - Considerations for automated machine learning in clinical metabolic profiling

T2 - Altered homocysteine plasma concentration associated with metformin exposure

AU - Orlenko, Alena

AU - Moore, Jason H.

AU - Orzechowski, Patryk

AU - Olson, Randal S.

AU - Cairns, Junmei

AU - Caraballo, Pedro

AU - Weinshilboum, Richard M

AU - Wang, Liewei M

AU - Breitenstein, Matthew K.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency – evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.

AB - With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency – evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.

KW - Automated machine learning

KW - Biobank

KW - Clinical metabolic profiling

KW - Confounding

KW - Homocysteine

KW - Metabolomics

KW - Metformin

KW - Pharmacometabolomics

KW - Precision medicine

UR - http://www.scopus.com/inward/record.url?scp=85048503369&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048503369&partnerID=8YFLogxK

U2 - 10.1142/9789813235533_0042

DO - 10.1142/9789813235533_0042

M3 - Conference contribution

SN - 9789813235533

SP - 460

EP - 471

BT - PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018

PB - World Scientific Publishing Co. Pte Ltd

ER -