Testing Independence in High Dimensions & Identifiability of Graphical Models


Autoria(s): Leung, Dennis
Contribuinte(s)

Drton, Mathias

Data(s)

14/07/2016

14/07/2016

01/06/2016

Resumo

Thesis (Ph.D.)--University of Washington, 2016-06

In this thesis two problems in multivariate statistics will be studied. In the first chaper, we treat the problem of testing independence between m continuous observations when m can be larger than the available sample size n. We consider three types of test statistics that are constructed as sums of many pairwise rank correlation signals. In the asymptotic regime where both m and n converge to infinity, a martingale central limit theorem is applied to show that the null distributions of these statistics converge to Gaussian limits, which are valid with no specific distributional or moment assumptions on the data. Using the framework of U-statistics, our result covers a variety of rank correlations including Kendall's tau and a dominating term of Spearman's rank correlation coefficient (rho), but also degenerate U-statistics such as Hoeffding's D, or the tau* of Bergsma and Dassios. Like the classical theory for U-statistics, the test statistics need to be scaled differently when the rank correlations used to construct them are degenerate U-statistics. The power of the considered tests is explored in rate-optimality theory under a Gaussian equicorrelation alternative as well as in numerical experiments for specific cases of more general alternatives. In the second chapter, we study parameter identifiability of directed Gaussian graphical models with one latent variable. In the scenario we consider, the latent variable is a confounder that forms a source node of the graph and is a parent to all other nodes, which correspond to the observed variables. We give a graphical condition that is sufficient for the Jacobian matrix of the parametrization map to be full rank, which entails that the parametrization is generically finite-to-one, a fact that is sometimes also referred to as local identifiability. We also derive a graphical condition that is necessary for such identifiability. Finally, we give a condition under which generic parameter identifiability can be determined from identifiability of a model associated with a subgraph. The power of these criteria is assessed via an exhaustive algebraic computational study for small models with 4, 5, and 6 observable variables, and a simulation study for large models with 25 or 35 observable variables.

Formato

application/pdf

Identificador

Leung_washington_0250E_15650.pdf

http://hdl.handle.net/1773/36847

Idioma(s)

en_US

Palavras-Chave #Statistics #statistics
Tipo

Thesis