SimplePPT: A simple principal tree algorithm

Qi Mao, Le Yang, Li Wang, Steven Goodison, Yijun Sun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. To address this issue, we develop a new model, which captures the local information of the underlying graph structure based on reversed graph embedding. A generalization bound is derived that show that the model is consistent if the number of data points is sufficiently large. As a special case, a principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data. The new algorithm is simple and parameter-free with guaranteed convergence. Experimental results on synthetic and breast cancer datasets show that the proposed method compares favorably with baselines and can discover a breast cancer progression path with multiple branches.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2015, SDM 2015
EditorsJieping Ye, Suresh Venkatasubramanian
PublisherSociety for Industrial and Applied Mathematics Publications
Pages792-800
Number of pages9
ISBN (Electronic)9781510811522
StatePublished - Jan 1 2015
EventSIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada
Duration: Apr 30 2015May 2 2015

Other

OtherSIAM International Conference on Data Mining 2015, SDM 2015
CountryCanada
CityVancouver
Period4/30/155/2/15

Fingerprint

Trees (mathematics)

Keywords

  • Cancer progression path
  • Principal curve
  • Principal graph
  • Reversed graph embedding

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Mao, Q., Yang, L., Wang, L., Goodison, S., & Sun, Y. (2015). SimplePPT: A simple principal tree algorithm. In J. Ye, & S. Venkatasubramanian (Eds.), SIAM International Conference on Data Mining 2015, SDM 2015 (pp. 792-800). Society for Industrial and Applied Mathematics Publications.

SimplePPT : A simple principal tree algorithm. / Mao, Qi; Yang, Le; Wang, Li; Goodison, Steven; Sun, Yijun.

SIAM International Conference on Data Mining 2015, SDM 2015. ed. / Jieping Ye; Suresh Venkatasubramanian. Society for Industrial and Applied Mathematics Publications, 2015. p. 792-800.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mao, Q, Yang, L, Wang, L, Goodison, S & Sun, Y 2015, SimplePPT: A simple principal tree algorithm. in J Ye & S Venkatasubramanian (eds), SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications, pp. 792-800, SIAM International Conference on Data Mining 2015, SDM 2015, Vancouver, Canada, 4/30/15.
Mao Q, Yang L, Wang L, Goodison S, Sun Y. SimplePPT: A simple principal tree algorithm. In Ye J, Venkatasubramanian S, editors, SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications. 2015. p. 792-800
Mao, Qi ; Yang, Le ; Wang, Li ; Goodison, Steven ; Sun, Yijun. / SimplePPT : A simple principal tree algorithm. SIAM International Conference on Data Mining 2015, SDM 2015. editor / Jieping Ye ; Suresh Venkatasubramanian. Society for Industrial and Applied Mathematics Publications, 2015. pp. 792-800
@inproceedings{1b45ab4be7824f2eb7923fc71df6ca8b,
title = "SimplePPT: A simple principal tree algorithm",
abstract = "Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. To address this issue, we develop a new model, which captures the local information of the underlying graph structure based on reversed graph embedding. A generalization bound is derived that show that the model is consistent if the number of data points is sufficiently large. As a special case, a principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data. The new algorithm is simple and parameter-free with guaranteed convergence. Experimental results on synthetic and breast cancer datasets show that the proposed method compares favorably with baselines and can discover a breast cancer progression path with multiple branches.",
keywords = "Cancer progression path, Principal curve, Principal graph, Reversed graph embedding",
author = "Qi Mao and Le Yang and Li Wang and Steven Goodison and Yijun Sun",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
pages = "792--800",
editor = "Jieping Ye and Suresh Venkatasubramanian",
booktitle = "SIAM International Conference on Data Mining 2015, SDM 2015",
publisher = "Society for Industrial and Applied Mathematics Publications",
address = "United States",

}

TY - GEN

T1 - SimplePPT

T2 - A simple principal tree algorithm

AU - Mao, Qi

AU - Yang, Le

AU - Wang, Li

AU - Goodison, Steven

AU - Sun, Yijun

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. To address this issue, we develop a new model, which captures the local information of the underlying graph structure based on reversed graph embedding. A generalization bound is derived that show that the model is consistent if the number of data points is sufficiently large. As a special case, a principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data. The new algorithm is simple and parameter-free with guaranteed convergence. Experimental results on synthetic and breast cancer datasets show that the proposed method compares favorably with baselines and can discover a breast cancer progression path with multiple branches.

AB - Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. To address this issue, we develop a new model, which captures the local information of the underlying graph structure based on reversed graph embedding. A generalization bound is derived that show that the model is consistent if the number of data points is sufficiently large. As a special case, a principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data. The new algorithm is simple and parameter-free with guaranteed convergence. Experimental results on synthetic and breast cancer datasets show that the proposed method compares favorably with baselines and can discover a breast cancer progression path with multiple branches.

KW - Cancer progression path

KW - Principal curve

KW - Principal graph

KW - Reversed graph embedding

UR - http://www.scopus.com/inward/record.url?scp=84961922790&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961922790&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84961922790

SP - 792

EP - 800

BT - SIAM International Conference on Data Mining 2015, SDM 2015

A2 - Ye, Jieping

A2 - Venkatasubramanian, Suresh

PB - Society for Industrial and Applied Mathematics Publications

ER -