Genomic similarity and kernel methods I: Advancements by building on mathematical and statistical foundations

Daniel J. Schaid

doi:10.1159/000312641

Genomic similarity and kernel methods I: Advancements by building on mathematical and statistical foundations

Daniel J. Schaid

Quantitative Health Sciences

Research output: Contribution to journal › Review article › peer-review

68 Scopus citations

Abstract

Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1].

Original language	English (US)
Pages (from-to)	109-131
Number of pages	23
Journal	Human Heredity
Volume	70
Issue number	2
DOIs	https://doi.org/10.1159/000312641
State	Published - Jul 2010

Keywords

Distance
Eigenvalue decomposition
Nonparametric regression
Regularization
Similarity kernel
Support vector machine

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1159/000312641

Cite this

@article{9803075ce0744e7696b47c7cd0d429fc,

title = "Genomic similarity and kernel methods I: Advancements by building on mathematical and statistical foundations",

abstract = "Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1].",

keywords = "Distance, Eigenvalue decomposition, Nonparametric regression, Regularization, Similarity kernel, Support vector machine",

author = "Schaid, {Daniel J.}",

year = "2010",

month = jul,

doi = "10.1159/000312641",

language = "English (US)",

volume = "70",

pages = "109--131",

journal = "Human Heredity",

issn = "0001-5652",

publisher = "S. Karger AG",

number = "2",

}

TY - JOUR

T1 - Genomic similarity and kernel methods I

T2 - Advancements by building on mathematical and statistical foundations

AU - Schaid, Daniel J.

PY - 2010/7

Y1 - 2010/7

N2 - Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1].

AB - Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1].

KW - Distance

KW - Eigenvalue decomposition

KW - Nonparametric regression

KW - Regularization

KW - Similarity kernel

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=77954207265&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954207265&partnerID=8YFLogxK

U2 - 10.1159/000312641

DO - 10.1159/000312641

M3 - Review article

C2 - 20610906

AN - SCOPUS:77954207265

SN - 0001-5652

VL - 70

SP - 109

EP - 131

JO - Human Heredity

JF - Human Heredity

IS - 2

ER -

Genomic similarity and kernel methods I: Advancements by building on mathematical and statistical foundations

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this