14 resultados para Random Variable

em Duke University


Relevância:

60.00% 60.00%

Publicador:

Resumo:

A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.

Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.

The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.

The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.

All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

HIV testing has been promoted as a key HIV prevention strategy in low-resource settings, despite studies showing variable impact on risk behavior. We sought to examine rates of HIV testing and the association between testing and sexual risk behaviors in Kisumu, Kenya. Participants were interviewed about HIV testing and sexual risk behaviors. They then underwent HIV serologic testing. We found that 47% of women and 36% of men reported prior testing. Two-thirds of participants who tested HIV-positive in this study reported no prior HIV test. Women who had undergone recent testing were less likely to report high-risk behaviors than women who had never been tested; this was not seen among men. Although rates of HIV testing were higher than seen in previous studies, the majority of HIV-infected people were unaware of their status. Efforts should be made to increase HIV testing among this population.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consensus HIV-1 genes can decrease the genetic distances between candidate immunogens and field virus strains. To ensure the functionality and optimal presentation of immunologic epitopes, we generated two group-M consensus env genes that contain variable regions either from a wild-type B/C recombinant virus isolate (CON6) or minimal consensus elements (CON-S) in the V1, V2, V4, and V5 regions. C57BL/6 and BALB/c mice were primed twice with CON6, CON-S, and subtype control (92UG37_A and HXB2/Bal_B) DNA and boosted with recombinant vaccinia virus (rVV). Mean antibody titers against 92UG37_A, 89.6_B, 96ZM651_C, CON6, and CON-S Env protein were determined. Both CON6 and CON-S induced higher mean antibody titers against several of the proteins, as compared with the subtype controls. However, no significant differences were found in mean antibody titers in animals immunized with CON6 or CON-S. Cellular immune responses were measured by using five complete Env overlapping peptide sets: subtype A (92UG37_A), subtype B (MN_B, 89.6_B and SF162_B), and subtype C (Chn19_C). The intensity of the induced cellular responses was measured by using pooled Env peptides; T-cell epitopes were identified by using matrix peptide pools and individual peptides. No significant differences in T-cell immune-response intensities were noted between CON6 and CON-S immunized BALB/c and C57BL/6 mice. In BALB/c mice, 10 and eight nonoverlapping T-cell epitopes were identified in CON6 and CON-S, whereas eight epitopes were identified in 92UG37_A and HXB2/BAL_B. In C57BL/6 mice, nine and six nonoverlapping T-cell epitopes were identified after immunization with CON6 and CON-S, respectively, whereas only four and three were identified in 92UG37_A and HXB2/BAL_B, respectively. When combined together from both mouse strains, 18 epitopes were identified. The group M artificial consensus env genes, CON6 and CON-S, were equally immunogenic in breadth and intensity for inducing humoral and cellular immune responses.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Continuing our development of a mathematical theory of stochastic microlensing, we study the random shear and expected number of random lensed images of different types. In particular, we characterize the first three leading terms in the asymptotic expression of the joint probability density function (pdf) of the random shear tensor due to point masses in the limit of an infinite number of stars. Up to this order, the pdf depends on the magnitude of the shear tensor, the optical depth, and the mean number of stars through a combination of radial position and the star's mass. As a consequence, the pdf's of the shear components are seen to converge, in the limit of an infinite number of stars, to shifted Cauchy distributions, which shows that the shear components have heavy tails in that limit. The asymptotic pdf of the shear magnitude in the limit of an infinite number of stars is also presented. All the results on the random microlensing shear are given for a general point in the lens plane. Extending to the general random distributions (not necessarily uniform) of the lenses, we employ the Kac-Rice formula and Morse theory to deduce general formulas for the expected total number of images and the expected number of saddle images. We further generalize these results by considering random sources defined on a countable compact covering of the light source plane. This is done to introduce the notion of global expected number of positive parity images due to a general lensing map. Applying the result to microlensing, we calculate the asymptotic global expected number of minimum images in the limit of an infinite number of stars, where the stars are uniformly distributed. This global expectation is bounded, while the global expected number of images and the global expected number of saddle images diverge as the order of the number of stars. © 2009 American Institute of Physics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences. © 2010 American Statistical Association.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham's-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains. © Institute of Mathematical Statistics, 2010.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome rearrangement often produces chromosomes with two centromeres (dicentrics) that are inherently unstable because of bridge formation and breakage during cell division. However, mammalian dicentrics, and particularly those in humans, can be quite stable, usually because one centromere is functionally silenced. Molecular mechanisms of centromere inactivation are poorly understood since there are few systems to experimentally create dicentric human chromosomes. Here, we describe a human cell culture model that enriches for de novo dicentrics. We demonstrate that transient disruption of human telomere structure non-randomly produces dicentric fusions involving acrocentric chromosomes. The induced dicentrics vary in structure near fusion breakpoints and like naturally-occurring dicentrics, exhibit various inter-centromeric distances. Many functional dicentrics persist for months after formation. Even those with distantly spaced centromeres remain functionally dicentric for 20 cell generations. Other dicentrics within the population reflect centromere inactivation. In some cases, centromere inactivation occurs by an apparently epigenetic mechanism. In other dicentrics, the size of the alpha-satellite DNA array associated with CENP-A is reduced compared to the same array before dicentric formation. Extra-chromosomal fragments that contained CENP-A often appear in the same cells as dicentrics. Some of these fragments are derived from the same alpha-satellite DNA array as inactivated centromeres. Our results indicate that dicentric human chromosomes undergo alternative fates after formation. Many retain two active centromeres and are stable through multiple cell divisions. Others undergo centromere inactivation. This event occurs within a broad temporal window and can involve deletion of chromatin that marks the locus as a site for CENP-A maintenance/replenishment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Antigenically variable RNA viruses are significant contributors to the burden of infectious disease worldwide. One reason for their ubiquity is their ability to escape herd immunity through rapid antigenic evolution and thereby to reinfect previously infected hosts. However, the ways in which these viruses evolve antigenically are highly diverse. Some have only limited diversity in the long-run, with every emergence of a new antigenic variant coupled with a replacement of the older variant. Other viruses rapidly accumulate antigenic diversity over time. Others still exhibit dynamics that can be considered evolutionary intermediates between these two extremes. Here, we present a theoretical framework that aims to understand these differences in evolutionary patterns by considering a virus's epidemiological dynamics in a given host population. Our framework, based on a dimensionless number, probabilistically anticipates patterns of viral antigenic diversification and thereby quantifies a virus's evolutionary potential. It is therefore similar in spirit to the basic reproduction number, the well-known dimensionless number which quantifies a pathogen's reproductive potential. We further outline how our theoretical framework can be applied to empirical viral systems, using influenza A/H3N2 as a case study. We end with predictions of our framework and work that remains to be done to further integrate viral evolutionary dynamics with disease ecology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© 2015 IOP Publishing Ltd & London Mathematical Society.This is a detailed analysis of invariant measures for one-dimensional dynamical systems with random switching. In particular, we prove the smoothness of the invariant densities away from critical points and describe the asymptotics of the invariant densities at critical points.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© 2015 Society for Industrial and Applied Mathematics.We consider parabolic PDEs with randomly switching boundary conditions. In order to analyze these random PDEs, we consider more general stochastic hybrid systems and prove convergence to, and properties of, a stationary distribution. Applying these general results to the heat equation with randomly switching boundary conditions, we find explicit formulae for various statistics of the solution and obtain almost sure results about its regularity and structure. These results are of particular interest for biological applications as well as for their significant departure from behavior seen in PDEs forced by disparate Gaussian noise. Our general results also have applications to other types of stochastic hybrid systems, such as ODEs with randomly switching right-hand sides.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MOTIVATION: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This article develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. RESULTS: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. AVAILABILITY AND IMPLEMENTATION: The related source code and documents are freely available at https://sites.google.com/site/bestumich/issues. CONTACT: yili@umich.edu.