12 resultados para Probability Density Function

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

© 2010 by the American Geophysical Union.The cross-scale probabilistic structure of rainfall intensity records collected over time scales ranging from hours to decades at sites dominated by both convective and frontal systems is investigated. Across these sites, intermittency build-up from slow to fast time-scales is analyzed in terms of heavy tailed and asymmetric signatures in the scale-wise evolution of rainfall probability density functions (pdfs). The analysis demonstrates that rainfall records dominated by convective storms develop heavier-Tailed power law pdfs toward finer scales when compared with their frontal systems counterpart. Also, a concomitant marked asymmetry build-up emerges at such finer time scales. A scale-dependent probabilistic description of such fat tails and asymmetry appearance is proposed based on a modified q-Gaussian model, able to describe the cross-scale rainfall pdfs in terms of the nonextensivity parameter q, a lacunarity (intermittency) correction and a tail asymmetry coefficient, linked to the rainfall generation mechanism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Continuing our development of a mathematical theory of stochastic microlensing, we study the random shear and expected number of random lensed images of different types. In particular, we characterize the first three leading terms in the asymptotic expression of the joint probability density function (pdf) of the random shear tensor due to point masses in the limit of an infinite number of stars. Up to this order, the pdf depends on the magnitude of the shear tensor, the optical depth, and the mean number of stars through a combination of radial position and the star's mass. As a consequence, the pdf's of the shear components are seen to converge, in the limit of an infinite number of stars, to shifted Cauchy distributions, which shows that the shear components have heavy tails in that limit. The asymptotic pdf of the shear magnitude in the limit of an infinite number of stars is also presented. All the results on the random microlensing shear are given for a general point in the lens plane. Extending to the general random distributions (not necessarily uniform) of the lenses, we employ the Kac-Rice formula and Morse theory to deduce general formulas for the expected total number of images and the expected number of saddle images. We further generalize these results by considering random sources defined on a countable compact covering of the light source plane. This is done to introduce the notion of global expected number of positive parity images due to a general lensing map. Applying the result to microlensing, we calculate the asymptotic global expected number of minimum images in the limit of an infinite number of stars, where the stars are uniformly distributed. This global expectation is bounded, while the global expected number of images and the global expected number of saddle images diverge as the order of the number of stars. © 2009 American Institute of Physics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel approach is proposed to estimate the natural streamflow regime of a river and to assess the extent of the alterations induced by dam operation related to anthropogenic (e.g., agricultural, hydropower) water uses in engineered river basins. The method consists in the comparison between the seasonal probability density function (pdf) of observed streamflows and the purportedly natural streamflow pdf obtained by a recently proposed and validated probabilistic model. The model employs a minimum of landscape and climate parameters and unequivocally separates the effects of anthropogenic regulations from those produced by hydroclimatic fluctuations. The approach is applied to evaluate the extent of the alterations of intra-annual streamflow variability in a highly engineered alpine catchment of north-eastern Italy, the Piave river. Streamflows observed downstream of the regulation devices in the Piave catchment are found to exhibit smaller means/modes, larger coefficients of variation, and more pronounced peaks than the flows that would be observed in the absence of anthropogenic regulation, suggesting that the anthropogenic disturbance leads to remarkable reductions of river flows, with an increase of the streamflow variability and of the frequency of preferential states far from the mean. Some structural limitations of management approaches based on minimum streamflow requirements (widely used to guide water policies) as opposed to criteria based on whole distributions are also discussed. Copyright © 2010 by the American Geophysical Union.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The time reversal of stochastic diffusion processes is revisited with emphasis on the physical meaning of the time-reversed drift and the noise prescription in the case of multiplicative noise. The local kinematics and mechanics of free diffusion are linked to the hydrodynamic description. These properties also provide an interpretation of the Pope-Ching formula for the steady-state probability density function along with a geometric interpretation of the fluctuation-dissipation relation. Finally, the statistics of the local entropy production rate of diffusion are discussed in the light of local diffusion properties, and a stochastic differential equation for entropy production is obtained using the Girsanov theorem for reversed diffusion. The results are illustrated for the Ornstein-Uhlenbeck process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.

Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper investigates stochastic processes forced by independent and identically distributed jumps occurring according to a Poisson process. The impact of different distributions of the jump amplitudes are analyzed for processes with linear drift. Exact expressions of the probability density functions are derived when jump amplitudes are distributed as exponential, gamma, and mixture of exponential distributions for both natural and reflecting boundary conditions. The mean level-crossing properties are studied in relation to the different jump amplitudes. As an example of application of the previous theoretical derivations, the role of different rainfall-depth distributions on an existing stochastic soil water balance model is analyzed. It is shown how the shape of distribution of daily rainfall depths plays a more relevant role on the soil moisture probability distribution as the rainfall frequency decreases, as predicted by future climatic scenarios. © 2010 The American Physical Society.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent theoretical advances predict the existence, deep into the glass phase, of a novel phase transition, the so-called Gardner transition. This transition is associated with the emergence of a complex free energy landscape composed of many marginally stable sub-basins within a glass metabasin. In this study, we explore several methods to detect numerically the Gardner transition in a simple structural glass former, the infinite-range Mari-Kurchan model. The transition point is robustly located from three independent approaches: (i) the divergence of the characteristic relaxation time, (ii) the divergence of the caging susceptibility, and (iii) the abnormal tail in the probability distribution function of cage order parameters. We show that the numerical results are fully consistent with the theoretical expectation. The methods we propose may also be generalized to more realistic numerical models as well as to experimental systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Based on Pulay's direct inversion iterative subspace (DIIS) approach, we present a method to accelerate self-consistent field (SCF) convergence. In this method, the quadratic augmented Roothaan-Hall (ARH) energy function, proposed recently by Høst and co-workers [J. Chem. Phys. 129, 124106 (2008)], is used as the object of minimization for obtaining the linear coefficients of Fock matrices within DIIS. This differs from the traditional DIIS of Pulay, which uses an object function derived from the commutator of the density and Fock matrices. Our results show that the present algorithm, abbreviated ADIIS, is more robust and efficient than the energy-DIIS (EDIIS) approach. In particular, several examples demonstrate that the combination of ADIIS and DIIS ("ADIIS+DIIS") is highly reliable and efficient in accelerating SCF convergence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exogenous gene delivery to alter the function of the heart is a potential novel therapeutic strategy for treatment of cardiovascular diseases such as heart failure (HF). Before gene therapy approaches to alter cardiac function can be realized, efficient and reproducible in vivo gene techniques must be established to efficiently transfer transgenes globally to the myocardium. We have been testing the hypothesis that genetic manipulation of the myocardial beta-adrenergic receptor (beta-AR) system, which is impaired in HF, can enhance cardiac function. We have delivered adenoviral transgenes, including the human beta2-AR (Adeno-beta2AR), to the myocardium of rabbits using an intracoronary approach. Catheter-mediated Adeno-beta2AR delivery produced diffuse multichamber myocardial expression, peaking 1 week after gene transfer. A total of 5 x 10(11) viral particles of Adeno-beta2AR reproducibly produced 5- to 10-fold beta-AR overexpression in the heart, which, at 7 and 21 days after delivery, resulted in increased in vivo hemodynamic function compared with control rabbits that received an empty adenovirus. Several physiological parameters, including dP/dtmax as a measure of contractility, were significantly enhanced basally and showed increased responsiveness to the beta-agonist isoproterenol. Our results demonstrate that global myocardial in vivo gene delivery is possible and that genetic manipulation of beta-AR density can result in enhanced cardiac performance. Thus, replacement of lost receptors seen in HF may represent novel inotropic therapy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

© 2015 IOP Publishing Ltd and Deutsche Physikalische Gesellschaft.A key component in calculations of exchange and correlation energies is the Coulomb operator, which requires the evaluation of two-electron integrals. For localized basis sets, these four-center integrals are most efficiently evaluated with the resolution of identity (RI) technique, which expands basis-function products in an auxiliary basis. In this work we show the practical applicability of a localized RI-variant ('RI-LVL'), which expands products of basis functions only in the subset of those auxiliary basis functions which are located at the same atoms as the basis functions. We demonstrate the accuracy of RI-LVL for Hartree-Fock calculations, for the PBE0 hybrid density functional, as well as for RPA and MP2 perturbation theory. Molecular test sets used include the S22 set of weakly interacting molecules, the G3 test set, as well as the G2-1 and BH76 test sets, and heavy elements including titanium dioxide, copper and gold clusters. Our RI-LVL implementation paves the way for linear-scaling RI-based hybrid functional calculations for large systems and for all-electron many-body perturbation theory with significantly reduced computational and memory cost.