8 resultados para 112 Statistics and probability

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The proportion of births attended by skilled health personnel is one of two indicators used to measure progress towards Millennium Development Goal 5, which aims for a 75% reduction in global maternal mortality ratios by 2015. Rwanda has one of the highest maternal mortality ratios in the world, estimated between 249-584 maternal deaths per 100,000 live births. The objectives of this study were to quantify secular trends in health facility delivery and to identify factors that affect the uptake of intrapartum healthcare services among women living in rural villages in Bugesera District, Eastern Province, Rwanda. METHODS: Using census data and probability proportional to size cluster sampling methodology, 30 villages were selected for community-based, cross-sectional surveys of women aged 18-50 who had given birth in the previous three years. Complete obstetric histories and detailed demographic data were elicited from respondents using iPad technology. Geospatial coordinates were used to calculate the path distances between each village and its designated health center and district hospital. Bivariate and multivariate logistic regressions were used to identify factors associated with delivery in health facilities. RESULTS: Analysis of 3106 lifetime deliveries from 859 respondents shows a sharp increase in the percentage of health facility deliveries in recent years. Delivering a penultimate baby at a health facility (OR = 4.681 [3.204 - 6.839]), possessing health insurance (OR = 3.812 [1.795 - 8.097]), managing household finances (OR = 1.897 [1.046 - 3.439]), attending more antenatal care visits (OR = 1.567 [1.163 - 2.112]), delivering more recently (OR = 1.438 [1.120 - 1.847] annually), and living closer to a health center (OR = 0.909 [0.846 - 0.976] per km) were independently associated with facility delivery. CONCLUSIONS: The strongest correlates of facility-based delivery in Bugesera District include previous delivery at a health facility, possession of health insurance, greater financial autonomy, more recent interactions with the health system, and proximity to a health center. Recent structural interventions in Rwanda, including the rapid scale-up of community-financed health insurance, likely contributed to the dramatic improvement in the health facility delivery rate observed in our study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Uncertainty quantification (UQ) is both an old and new concept. The current novelty lies in the interactions and synthesis of mathematical models, computer experiments, statistics, field/real experiments, and probability theory, with a particular emphasize on the large-scale simulations by computer models. The challenges not only come from the complication of scientific questions, but also from the size of the information. It is the focus in this thesis to provide statistical models that are scalable to massive data produced in computer experiments and real experiments, through fast and robust statistical inference.

Chapter 2 provides a practical approach for simultaneously emulating/approximating massive number of functions, with the application on hazard quantification of Soufri\`{e}re Hills volcano in Montserrate island. Chapter 3 discusses another problem with massive data, in which the number of observations of a function is large. An exact algorithm that is linear in time is developed for the problem of interpolation of Methylation levels. Chapter 4 and Chapter 5 are both about the robust inference of the models. Chapter 4 provides a new criteria robustness parameter estimation criteria and several ways of inference have been shown to satisfy such criteria. Chapter 5 develops a new prior that satisfies some more criteria and is thus proposed to use in practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

© Cambridge University Press 2014.Background Asian Americans (AAs) and Native Hawaiians/Pacific Islanders (NHs/PIs) are the fastest growing segments of the US population. However, their population sizes are small, and thus AAs and NHs/PIs are often aggregated into a single racial/ethnic group or omitted from research and health statistics. The groups' substance use disorders (SUDs) and treatment needs have been under-recognized. Method We examined recent epidemiological data on the extent of alcohol and drug use disorders and the use of treatment services by AAs and NHs/PIs. Results NHs/PIs on average were less educated and had lower levels of household income than AAs. Considered as a single group, AAs and NHs/PIs showed a low prevalence of substance use and disorders. Analyses of survey data that compared AAs and NHs/PIs revealed higher prevalences of substance use (alcohol, drugs), depression and delinquency among NHs than among AAs. Among treatment-seeking patients in mental healthcare settings, NHs/PIs had higher prevalences of DSM-IV diagnoses than AAs (alcohol/drug, mood, adjustment, childhood-onset disruptive or impulse-control disorders), although co-morbidity was common in both groups. AAs and NHs/PIs with an SUD were unlikely to use treatment, especially treatment for alcohol problems, and treatment use tended to be related to involvement with the criminal justice system. Conclusions Although available data are limited by small sample sizes of AAs and NHs/PIs, they demonstrate the need to separate AAs and NHs/PIs in health statistics and increase research into substance use and treatment needs for these fast-growing but understudied population groups.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rivalry between the men's basketball teams of Duke University and the University of North Carolina-Chapel Hill (UNC) is one of the most storied traditions in college sports. A subculture of students at each university form social bonds with fellow fans, develop expertise in college basketball rules, team statistics, and individual players, and self-identify as a member of a fan group. The present study capitalized on the high personal investment of these fans and the strong affective tenor of a Duke-UNC basketball game to examine the neural correlates of emotional memory retrieval for a complex sporting event. Male fans watched a competitive, archived game in a social setting. During a subsequent functional magnetic resonance imaging session, participants viewed video clips depicting individual plays of the game that ended with the ball being released toward the basket. For each play, participants recalled whether or not the shot went into the basket. Hemodynamic signal changes time locked to correct memory decisions were analyzed as a function of emotional intensity and valence, according to the fan's perspective. Results showed intensity-modulated retrieval activity in midline cortical structures, sensorimotor cortex, the striatum, and the medial temporal lobe, including the amygdala. Positively valent memories specifically recruited processing in dorsal frontoparietal regions, and additional activity in the insula and medial temporal lobe for positively valent shots recalled with high confidence. This novel paradigm reveals how brain regions implicated in emotion, memory retrieval, visuomotor imagery, and social cognition contribute to the recollection of specific plays in the mind of a sports fan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

© 2015 Chinese Nursing Association.Background Although self-management approaches have shown strong evidence of positive outcomes for urinary incontinence prevention and management, few programs have been developed for Korean rural communities. Objectives This pilot study aimed to develop, implement, and evaluate a urinary incontinence self-management program for community-dwelling women aged 55 and older with urinary incontinence in rural South Korea. Methods This study used a one-group pre- post-test design to measure the effects of the intervention using standardized urinary incontinence symptom, knowledge, and attitude measures. Seventeen community-dwelling older women completed weekly 90-min group sessions for 5 weeks. Descriptive statistics and paired t-tests and were used to analyze data. Results The mean of the overall interference on daily life from urine leakage (pre-test: M = 5.76 ± 2.68, post-test: M = 2.29 ± 1.93, t = -4.609, p < 0.001) and the sum of International Consultation on Incontinence Questionnaire scores (pre-test: M = 11.59 ± 3.00, post-test: M = 5.29 ± 3.02, t = -5.881, p < 0.001) indicated significant improvement after the intervention. Improvement was also noted on the mean knowledge (pre-test: M = 19.07 ± 3.34, post-test: M = 23.15 ± 2.60, t = 7.550, p < 0.001) and attitude scores (pre-test: M = 2.64 ± 0.19, post-test: M = 3.08 ± 0.41, t = 5.150, p < 0.001). Weekly assignments were completed 82.4% of the time. Participants showed a high satisfaction level (M = 26.82 ± 1.74, range 22-28) with the group program. Conclusions Implementation of a urinary incontinence self-management program was accompanied by improved outcomes for Korean older women living in rural communities who have scarce resources for urinary incontinence management and treatment. Urinary incontinence self-management education approaches have potential for widespread implementation in nursing practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: In the domain of academia, the scholarship of research may include, but not limited to, peer-reviewed publications, presentations, or grant submissions. Programmatic research productivity is one of many measures of academic program reputation and ranking. Another measure or tool for quantifying learning success among physical therapists education programs in the USA is 100 % three year pass rates of graduates on the standardized National Physical Therapy Examination (NPTE). In this study, we endeavored to determine if there was an association between research productivity through artifacts and 100 % three year pass rates on the NPTE. METHODS: This observational study involved using pre-approved database exploration representing all accredited programs in the USA who graduated physical therapists during 2009, 2010 and 2011. Descriptive variables captured included raw research productivity artifacts such as peer reviewed publications and books, number of professional presentations, number of scholarly submissions, total grant dollars, and numbers of grants submitted. Descriptive statistics and comparisons (using chi square and t-tests) among program characteristics and research artifacts were calculated. Univariate logistic regression analyses, with appropriate control variables were used to determine associations between research artifacts and 100 % pass rates. RESULTS: Number of scholarly artifacts submitted, faculty with grants, and grant proposals submitted were significantly higher in programs with 100 % three year pass rates. However, after controlling for program characteristics such as grade point average, diversity percentage of cohort, public/private institution, and number of faculty, there were no significant associations between scholarly artifacts and 100 % three year pass rates. CONCLUSIONS: Factors outside of research artifacts are likely better predictors for passing the NPTE.