### Abstract

Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. To address this issue, we develop a new model, which captures the local information of the underlying graph structure based on reversed graph embedding. A generalization bound is derived that show that the model is consistent if the number of data points is sufficiently large. As a special case, a principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data. The new algorithm is simple and parameter-free with guaranteed convergence. Experimental results on synthetic and breast cancer datasets show that the proposed method compares favorably with baselines and can discover a breast cancer progression path with multiple branches.

Original language | English (US) |
---|---|

Title of host publication | SIAM International Conference on Data Mining 2015, SDM 2015 |

Editors | Jieping Ye, Suresh Venkatasubramanian |

Publisher | Society for Industrial and Applied Mathematics Publications |

Pages | 792-800 |

Number of pages | 9 |

ISBN (Electronic) | 9781510811522 |

State | Published - Jan 1 2015 |

Event | SIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada Duration: Apr 30 2015 → May 2 2015 |

### Other

Other | SIAM International Conference on Data Mining 2015, SDM 2015 |
---|---|

Country | Canada |

City | Vancouver |

Period | 4/30/15 → 5/2/15 |

### Fingerprint

### Keywords

- Cancer progression path
- Principal curve
- Principal graph
- Reversed graph embedding

### ASJC Scopus subject areas

- Computational Theory and Mathematics
- Computer Vision and Pattern Recognition
- Software

### Cite this

*SIAM International Conference on Data Mining 2015, SDM 2015*(pp. 792-800). Society for Industrial and Applied Mathematics Publications.

**SimplePPT : A simple principal tree algorithm.** / Mao, Qi; Yang, Le; Wang, Li; Goodison, Steven; Sun, Yijun.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*SIAM International Conference on Data Mining 2015, SDM 2015.*Society for Industrial and Applied Mathematics Publications, pp. 792-800, SIAM International Conference on Data Mining 2015, SDM 2015, Vancouver, Canada, 4/30/15.

}

TY - GEN

T1 - SimplePPT

T2 - A simple principal tree algorithm

AU - Mao, Qi

AU - Yang, Le

AU - Wang, Li

AU - Goodison, Steven

AU - Sun, Yijun

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. To address this issue, we develop a new model, which captures the local information of the underlying graph structure based on reversed graph embedding. A generalization bound is derived that show that the model is consistent if the number of data points is sufficiently large. As a special case, a principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data. The new algorithm is simple and parameter-free with guaranteed convergence. Experimental results on synthetic and breast cancer datasets show that the proposed method compares favorably with baselines and can discover a breast cancer progression path with multiple branches.

AB - Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. To address this issue, we develop a new model, which captures the local information of the underlying graph structure based on reversed graph embedding. A generalization bound is derived that show that the model is consistent if the number of data points is sufficiently large. As a special case, a principal tree model is proposed and a new algorithm is developed that learns a tree structure automatically from data. The new algorithm is simple and parameter-free with guaranteed convergence. Experimental results on synthetic and breast cancer datasets show that the proposed method compares favorably with baselines and can discover a breast cancer progression path with multiple branches.

KW - Cancer progression path

KW - Principal curve

KW - Principal graph

KW - Reversed graph embedding

UR - http://www.scopus.com/inward/record.url?scp=84961922790&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961922790&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84961922790

SP - 792

EP - 800

BT - SIAM International Conference on Data Mining 2015, SDM 2015

A2 - Ye, Jieping

A2 - Venkatasubramanian, Suresh

PB - Society for Industrial and Applied Mathematics Publications

ER -