OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION

Hongyuan Cao; Jun Chen; Xianyang Zhang

doi:10.1214/21-AOS2128

OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION

Hongyuan Cao, Jun Chen, Xianyang Zhang

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously.We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.

Original language	English (US)
Pages (from-to)	807-857
Number of pages	51
Journal	Annals of Statistics
Volume	50
Issue number	2
DOIs	https://doi.org/10.1214/21-AOS2128
State	Published - Apr 2022

ASJC Scopus subject areas

Statistics and Probability
Statistics, Probability and Uncertainty

Access to Document

10.1214/21-AOS2128

Cite this

@article{ec79cba7f92c413e9329fcd6ce6a039a,

title = "OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION",

abstract = "Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously.We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.",

author = "Hongyuan Cao and Jun Chen and Xianyang Zhang",

note = "Publisher Copyright: {\textcopyright} Institute of Mathematical Statistics, 2022.",

year = "2022",

month = apr,

doi = "10.1214/21-AOS2128",

language = "English (US)",

volume = "50",

pages = "807--857",

journal = "Annals of Statistics",

issn = "0090-5364",

publisher = "Institute of Mathematical Statistics",

number = "2",

}

TY - JOUR

T1 - OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION

AU - Cao, Hongyuan

AU - Chen, Jun

AU - Zhang, Xianyang

N1 - Publisher Copyright: © Institute of Mathematical Statistics, 2022.

PY - 2022/4

Y1 - 2022/4

N2 - Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously.We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.

AB - Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously.We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.

UR - http://www.scopus.com/inward/record.url?scp=85128332948&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85128332948&partnerID=8YFLogxK

U2 - 10.1214/21-AOS2128

DO - 10.1214/21-AOS2128

M3 - Article

AN - SCOPUS:85128332948

SN - 0090-5364

VL - 50

SP - 807

EP - 857

JO - Annals of Statistics

JF - Annals of Statistics

IS - 2

ER -

OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this