960 resultados para 112 Statistics and probability


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Four papers, written in collaboration with the author’s graduate school advisor, are presented. In the first paper, uniform and non-uniform Berry-Esseen (BE) bounds on the convergence to normality of a general class of nonlinear statistics are provided; novel applications to specific statistics, including the non-central Student’s, Pearson’s, and the non-central Hotelling’s, are also stated. In the second paper, a BE bound on the rate of convergence of the F-statistic used in testing hypotheses from a general linear model is given. The third paper considers the asymptotic relative efficiency (ARE) between the Pearson, Spearman, and Kendall correlation statistics; conditions sufficient to ensure that the Spearman and Kendall statistics are equally (asymptotically) efficient are provided, and several models are considered which illustrate the use of such conditions. Lastly, the fourth paper proves that, in the bivariate normal model, the ARE between any of these correlation statistics possesses certain monotonicity properties; quadratic lower and upper bounds on the ARE are stated as direct applications of such monotonicity patterns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex human diseases are a major challenge for biological research. The goal of my research is to develop effective methods for biostatistics in order to create more opportunities for the prevention and cure of human diseases. This dissertation proposes statistical technologies that have the ability of being adapted to sequencing data in family-based designs, and that account for joint effects as well as gene-gene and gene-environment interactions in the GWA studies. The framework includes statistical methods for rare and common variant association studies. Although next-generation DNA sequencing technologies have made rare variant association studies feasible, the development of powerful statistical methods for rare variant association studies is still underway. Chapter 2 demonstrates two adaptive weighting methods for rare variant association studies based on family data for quantitative traits. The results show that both proposed methods are robust to population stratification, robust to the direction and magnitude of the effects of causal variants, and more powerful than the methods using weights suggested by Madsen and Browning [2009]. In Chapter 3, I extended the previously proposed test for Testing the effect of an Optimally Weighted combination of variants (TOW) [Sha et al., 2012] for unrelated individuals to TOW &ndash F, TOW for Family &ndash based design. Simulation results show that TOW &ndash F can control for population stratification in wide range of population structures including spatially structured populations, is robust to the directions of effect of causal variants, and is relatively robust to percentage of neutral variants. In GWA studies, this dissertation consists of a two &ndash locus joint effect analysis and a two-stage approach accounting for gene &ndash gene and gene &ndash environment interaction. Chapter 4 proposes a novel two &ndash stage approach, which is promising to identify joint effects, especially for monotonic models. The proposed approach outperforms a single &ndash marker method and a regular two &ndash stage analysis based on the two &ndash locus genotypic test. In Chapter 5, I proposed a gene &ndash based two &ndash stage approach to identify gene &ndash gene and gene &ndash environment interactions in GWA studies which can include rare variants. The two &ndash stage approach is applied to the GAW 17 dataset to identify the interaction between KDR gene and smoking status.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The developmental processes and functions of an organism are controlled by the genes and the proteins that are derived from these genes. The identification of key genes and the reconstruction of gene networks can provide a model to help us understand the regulatory mechanisms for the initiation and progression of biological processes or functional abnormalities (e.g. diseases) in living organisms. In this dissertation, I have developed statistical methods to identify the genes and transcription factors (TFs) involved in biological processes, constructed their regulatory networks, and also evaluated some existing association methods to find robust methods for coexpression analyses. Two kinds of data sets were used for this work: genotype data and gene expression microarray data. On the basis of these data sets, this dissertation has two major parts, together forming six chapters. The first part deals with developing association methods for rare variants using genotype data (chapter 4 and 5). The second part deals with developing and/or evaluating statistical methods to identify genes and TFs involved in biological processes, and construction of their regulatory networks using gene expression data (chapter 2, 3, and 6). For the first part, I have developed two methods to find the groupwise association of rare variants with given diseases or traits. The first method is based on kernel machine learning and can be applied to both quantitative as well as qualitative traits. Simulation results showed that the proposed method has improved power over the existing weighted sum method (WS) in most settings. The second method uses multiple phenotypes to select a few top significant genes. It then finds the association of each gene with each phenotype while controlling the population stratification by adjusting the data for ancestry using principal components. This method was applied to GAW 17 data and was able to find several disease risk genes. For the second part, I have worked on three problems. First problem involved evaluation of eight gene association methods. A very comprehensive comparison of these methods with further analysis clearly demonstrates the distinct and common performance of these eight gene association methods. For the second problem, an algorithm named the bottom-up graphical Gaussian model was developed to identify the TFs that regulate pathway genes and reconstruct their hierarchical regulatory networks. This algorithm has produced very significant results and it is the first report to produce such hierarchical networks for these pathways. The third problem dealt with developing another algorithm called the top-down graphical Gaussian model that identifies the network governed by a specific TF. The network produced by the algorithm is proven to be of very high accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This morning Dr. Battle will introduce descriptive statistics and linear regression and how to apply these concepts in mathematical modeling. You will also learn how to use a spreadsheet to help with statistical analysis and to create graphs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: In Switzerland there is a shortage of population-based information on stroke incidence and case fatalities (CF). The aim of this study was to estimate stroke event rates and both in- and out-of-hospital CF rates. METHODS: Data on stroke diagnoses, coded according to I60-I64 (ICD 10), were taken from the Federal Hospital Discharge Statistics database (HOST) and the Cause of Death database (CoD) for the year 2004. The number of total stroke events and of age- and gender-specific and agestandardised event rates were estimated; overall CF, in-hospital and out-of-hospital, were determined. RESULTS: Among the overall number of 13 996 hospital discharges from stroke (HOST) the number was lower in women (n = 6736) than in men (n = 7260). A total of 3568 deaths (2137 women and 1431 men) due to stroke were recorded in the CoD database. The number of estimated stroke events was 15 733, and higher in women (n = 7933) than in men (n = 7800). Men presented significantly higher age-specific stroke event rates and a higher age-standardised event rate (178.7/100 000 versus 119.7/100 000). Overall CF rates were significantly higher for women (26.9%) than for men (18.4%). The same was true of out-of-hospital CF but not of in-hospital CF rates. CONCLUSION: The data on estimated stroke events obtained indicate that stroke discharge rate underestimates the stroke event rate. Out-of-hospital deaths from stroke accounted for the largest proportion of total stroke deaths. Sex differences in both number of total stroke events and deaths could be explained by the higher proportion of women than men aged 55+ in the Swiss population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A characterization is provided for the von Mises–Fisher random variable, in terms of first exit point from the unit hypersphere of the drifted Wiener process. Laplace transform formulae for the first exit time from the unit hypersphere of the drifted Wiener process are provided. Post representations in terms of Bell polynomials are provided for the densities of the first exit times from the circle and from the sphere.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Voting power is commonly measured using a probability. But what kind of probability is this? Is it a degree of belief or an objective chance or some other sort of probability? The aim of this paper is to answer this question. The answer depends on the use to which a measure of voting power is put. Some objectivist interpretations of probabilities are appropriate when we employ such a measure for descriptive purposes. By contrast, when voting power is used to normatively assess voting rules, the probabilities are best understood as classical probabilities, which count possibilities. This is so because, from a normative stance, voting power is most plausibly taken to concern rights and thus possibilities. The classical interpretation also underwrites the use of the Bernoulli model upon which the Penrose/Banzhaf measure is based.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The magnesium isotope composition of diagenetic dolomites and their adjacent pore fluids were studied in a 250 m thick sedimentary section drilled into the Peru Margin during Ocean Drilling Program (ODP) Leg 201 (Site 1230) and Leg 112 (Site 685). Previous studies revealed the presence of two types of dolomite: type I dolomite forms at ~ 6 m below seafloor (mbsf) due to an increase in alkalinity associated with anaerobic methane oxidation, and type II dolomite forms at focused sites below ~ 230 mbsf due to episodic inflow of deep-sourced fluids into an intense methanogenesis zone. The pore fluid delta 26Mg composition becomes progressively enriched in 26Mg with depth from values similar to seawater (i.e. -0.8 per mil, relative to DSM3 Mg reference material) in the top few meters below seafloor (mbsf) to 0.8 ± 0.2 per mil within the sediments located below 100 mbsf. Type I dolomites have a delta 26Mg of -3.5 per mil, and exhibit apparent dolomite-pore fluid fractionation factors of about -2.6 per mil consistent with previous studies of dolomite precipitation from seawater. In contrast, type II dolomites have delta 26Mg values ranging from -2.5 to -3.0 per mil and are up to -3.6 per mil lighter than the modern pore fluid Mg isotope composition. The enrichment of pore fluids in 26Mg and depletion in total Mg concentration below ~ 200 mbsf is likely the result of Mg isotope fractionation during dolomite formation, The 26Mg enrichment of pore fluids in the upper ~ 200 mbsf of the sediment sequence can be attributed to desorption of Mg from clay mineral surfaces. The obtained results indicate that Mg isotopes recorded in the diagenetic carbonate record can distinguish near surface versus deep formed dolomite demonstrating their usefulness as a paleo-diagenetic proxy.