TY - JOUR
T1 - Semi-Supervised Topological Analysis for Elucidating Hidden Structures in High-Dimensional Transcriptome Datasets
AU - Feng, Tianshu
AU - Davila, Jaime I.
AU - Liu, Yuanhang
AU - Lin, Sangdi
AU - Huang, Shuai
AU - Wang, Chen
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Topological data analysis (TDA) is a powerful method for reducing data dimensionality, mining underlying data relationships, and intuitively representing the data structure. The Mapper algorithm is one such tool that projects high-dimensional data to 1-dimensional space by using a filter function that is subsequently used to reconstruct the data topology relationships. However, domain context information and prior knowledge have not been considered in current TDA modeling frameworks. Here, we report the development and evaluation of a semi-supervised topological analysis (STA) framework that incorporates discrete or continuously labeled data points and selects the most relevant filter functions accordingly. We validate the proposed STA framework with simulation data and then apply it to samples from Genotype-Tissue Expression data and ovarian cancer transcriptome datasets. The graphs generated by STA for these 2 datasets, based on gene expression profiles, are consistent with prior knowledge, thereby supporting the effectiveness of the proposed framework.
AB - Topological data analysis (TDA) is a powerful method for reducing data dimensionality, mining underlying data relationships, and intuitively representing the data structure. The Mapper algorithm is one such tool that projects high-dimensional data to 1-dimensional space by using a filter function that is subsequently used to reconstruct the data topology relationships. However, domain context information and prior knowledge have not been considered in current TDA modeling frameworks. Here, we report the development and evaluation of a semi-supervised topological analysis (STA) framework that incorporates discrete or continuously labeled data points and selects the most relevant filter functions accordingly. We validate the proposed STA framework with simulation data and then apply it to samples from Genotype-Tissue Expression data and ovarian cancer transcriptome datasets. The graphs generated by STA for these 2 datasets, based on gene expression profiles, are consistent with prior knowledge, thereby supporting the effectiveness of the proposed framework.
KW - Data and knowledge visualization
KW - bioinformatics (genome or protein) databases
KW - data mining
UR - http://www.scopus.com/inward/record.url?scp=85098855789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098855789&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2019.2950657
DO - 10.1109/TCBB.2019.2950657
M3 - Article
C2 - 31675340
AN - SCOPUS:85098855789
SN - 1545-5963
VL - 18
SP - 1620
EP - 1631
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 4
M1 - 8888210
ER -