Semi-Supervised Topological Analysis for Elucidating Hidden Structures in High-Dimensional Transcriptome Datasets

Tianshu Feng, Jaime I. Davila, Yuanhang Liu, Sangdi Lin, Shuai Huang, Chen Wang

Research output: Contribution to journalArticlepeer-review

Abstract

Topological data analysis (TDA) is a powerful method for reducing data dimensionality, mining underlying data relationships, and intuitively representing the data structure. The Mapper algorithm is one such tool that projects high-dimensional data to 1-dimensional space by using a filter function that is subsequently used to reconstruct the data topology relationships. However, domain context information and prior knowledge have not been considered in current TDA modeling frameworks. Here, we report the development and evaluation of a semi-supervised topological analysis (STA) framework that incorporates discrete or continuously labeled data points and selects the most relevant filter functions accordingly. We validate the proposed STA framework with simulation data and then apply it to samples from Genotype-Tissue Expression data and ovarian cancer transcriptome datasets. The graphs generated by STA for these 2 datasets, based on gene expression profiles, are consistent with prior knowledge, thereby supporting the effectiveness of the proposed framework.

Original languageEnglish (US)
Article number8888210
Pages (from-to)1620-1631
Number of pages12
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume18
Issue number4
DOIs
StatePublished - Jul 1 2021

Keywords

  • bioinformatics (genome or protein) databases
  • Data and knowledge visualization
  • data mining

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Semi-Supervised Topological Analysis for Elucidating Hidden Structures in High-Dimensional Transcriptome Datasets'. Together they form a unique fingerprint.

Cite this