945 resultados para Probability Distribution Function


Relevância:

90.00% 90.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 60J80.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 60K10, 62P05

Relevância:

90.00% 90.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 60E05, 62P05.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We examined the anatomy of expanding, mature, and senescing leaves of tropical plants for the presence of red pigments: anthocyanins and betacyanins. We studied 463 species in total, 370 genera, belonging to 94 families. This included 21 species from five families in the Caryophyllales, where betacyanins are the basis for red color. We also included 14 species of ferns and gymnosperms in seven families and 29 species with undersurface coloration at maturity. We analyzed 399 angiosperm species (74 families) for factors (especially developmental and evolutionary) influencing anthocyanin production during expansion and senescence. During expansion, 44.9% produced anthocyanins and only 13.5% during senescence. At both stages, relatively few patterns of tissue distributions developed, primarily in the mesophyll, and very few taxa produced anthocyanins in dermal and ground tissue simultaneously. Of the 35 species producing anthocyanins both in development and senescence, most had similar cellular distributions. Anthocyanin distributions were identical in different developing leaves of three heteroblastic taxa. Phylogeny has influenced the distribution of anthocyanins in the epidermis and mesophyll of expanding leaves and the palisade parenchyma during senescence, although these influences are not strong. Betacyanins appear to have similar distributions in leaves of taxa within the Caryophyllales and, perhaps, similar functions. The presence of anthocyanins in the mesophyll of so many species is inconsistent with the hypothesis of protection against UV damage or fungal pathogens, and the differing tissue distributions indicate that the pigments may function in different ways, as in photoprotection and freeradical scavenging.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.

Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work presents a computational, called MOMENTS, code developed to be used in process control to determine a characteristic transfer function to industrial units when radiotracer techniques were been applied to study the unit´s performance. The methodology is based on the measuring the residence time distribution function (RTD) and calculate the first and second temporal moments of the tracer data obtained by two scintillators detectors NaI positioned to register a complete tracer movement inside the unit. Non linear regression technique has been used to fit various mathematical models and a statistical test was used to select the best result to the transfer function. Using the code MOMENTS, twelve different models can be used to fit a curve and calculate technical parameters to the unit.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Cette thèse s’inscrit dans le contexte d’une optimisation industrielle et économique des éléments de structure en BFUP permettant d’en garantir la ductilité au niveau structural, tout en ajustant la quantité de fibres et en optimisant le mode de fabrication. Le modèle développé décrit explicitement la participation du renfort fibré en traction au niveau local, en enchaînant une phase de comportement écrouissante suivie d’une phase adoucissante. La loi de comportement est fonction de la densité, de l’orientation des fibres vis-à-vis des directions principales de traction, de leur élancement et d’autres paramètres matériaux usuels liés aux fibres, à la matrice cimentaire et à leur interaction. L’orientation des fibres est prise en compte à partir d’une loi de probabilité normale à une ou deux variables permettant de reproduire n’importe quelle orientation obtenue à partir d’un calcul représentatif de la mise en oeuvre du BFUP frais ou renseignée par analyse expérimentale sur prototype. Enfin, le modèle reproduit la fissuration des BFUP sur le principe des modèles de fissures diffuses et tournantes. La loi de comportement est intégrée au sein d’un logiciel de calcul de structure par éléments finis, permettant de l’utiliser comme un outil prédictif de la fiabilité et de la ductilité globale d’éléments en BFUP. Deux campagnes expérimentales ont été effectuées, une à l’Université Laval de Québec et l’autre à l’Ifsttar, Marne-la-Vallée. La première permet de valider la capacité du modèle reproduire le comportement global sous des sollicitations typiques de traction et de flexion dans des éléments structurels simples pour lesquels l’orientation préférentielle des fibres a été renseignée par tomographie. La seconde campagne expérimentale démontre les capacités du modèle dans une démarche d’optimisation, pour la fabrication de plaques nervurées relativement complexes et présentant un intérêt industriel potentiel pour lesquels différentes modalités de fabrication et des BFUP plus ou moins fibrés ont été envisagés. Le contrôle de la répartition et de l’orientation des fibres a été réalisé à partir d’essais mécaniques sur prélèvements. Les prévisions du modèle ont été confrontées au comportement structurel global et à la ductilité mis en évidence expérimentalement. Le modèle a ainsi pu être qualifié vis-à-vis des méthodes analytiques usuelles de l’ingénierie, en prenant en compte la variabilité statistique. Des pistes d’amélioration et de complément de développement ont été identifiées.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this thesis, wind wave prediction and analysis in the Southern Caspian Sea are surveyed. Because of very much importance and application of this matter in reducing vital and financial damages or marine activities, such as monitoring marine pollution, designing marine structure, shipping, fishing, offshore industry, tourism and etc, gave attention by some marine activities. In this study are used the Caspian Sea topography data that are extracted from the Caspian Sea Hydrography map of Iran Armed Forces Geographical Organization and the I 0 meter wind field data that are extracted from the transmitted GTS synoptic data of regional centers to Forecasting Center of Iran Meteorological Organization for wave prediction and is used the 20012 wave are recorded by the oil company's buoy that was located at distance 28 Kilometers from Neka shore for wave analysis. The results of this research are as follows: - Because of disagreement between the prediction results of SMB method in the Caspian sea and wave data of the Anzali and Neka buoys. The SMB method isn't able to Predict wave characteristics in the Southern Caspian Sea. - Because of good relativity agreement between the WAM model output in the Caspian Sea and wave data of the Anzali buoy. The WAM model is able to predict wave characteristics in the southern Caspian Sea with high relativity accuracy. The extreme wave height distribution function for fitting to the Southern Caspian Sea wave data is obtained by determining free parameters of Poisson-Gumbel function through moment method. These parameters are as below: A=2.41, B=0.33. The maximum relative error between the estimated 4-year return value of the Southern Caspian Sea significant wave height by above function with the wave data of Neka buoy is about %35. The 100-year return value of the Southern Caspian Sea significant height wave is about 4.97 meter. The maximum relative error between the estimated 4-year return value of the Southern Caspian Sea significant wave height by statistical model of peak over threshold with the wave data of Neka buoy is about %2.28. The parametric relation for fitting to the Southern Caspian Sea frequency spectra is obtained by determining free parameters of the Strekalov, Massel and Krylov etal_ multipeak spectra through mathematical method. These parameters are as below: A = 2.9 B=26.26, C=0.0016 m=0.19 and n=3.69. The maximum relative error between calculated free parameters of the Southern Caspian Sea multipeak spectrum with the proposed free parameters of double-peaked spectrum by Massel and Strekalov on the experimental data from the Caspian Sea is about 36.1 % in spectrum energetic part and is about 74M% in spectrum high frequency part. The peak over threshold waverose of the Southern Caspian Sea shows that maximum occurrence probability of wave height is relevant to waves with 2-2.5 meters wave fhe error sources in the statistical analysis are mainly due to: l) the missing wave data in 2 years duration through battery discharge of Neka buoy. 2) the deportation %15 of significant height annual mean in single year than long period average value that is caused by lack of adequate measurement on oceanic waves, and the error sources in the spectral analysis are mainly due to above- mentioned items and low accurate of the proposed free parameters of double-peaked spectrum on the experimental data from the Caspian Sea.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper proposes a new memetic evolutionary algorithm to achieve explicit learning in rule-based nurse rostering, which involves applying a set of heuristic rules for each nurse's assignment. The main framework of the algorithm is an estimation of distribution algorithm, in which an ant-miner methodology improves the individual solutions produced in each generation. Unlike our previous work (where learning is implicit), the learning in the memetic estimation of distribution algorithm is explicit, i.e. we are able to identify building blocks directly. The overall approach learns by building a probabilistic model, i.e. an estimation of the probability distribution of individual nurse-rule pairs that are used to construct schedules. The local search processor (i.e. the ant-miner) reinforces nurse-rule pairs that receive higher rewards. A challenging real world nurse rostering problem is used as the test problem. Computational results show that the proposed approach outperforms most existing approaches. It is suggested that the learning methodologies suggested in this paper may be applied to other scheduling problems where schedules are built systematically according to specific rules.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper proposes a new memetic evolutionary algorithm to achieve explicit learning in rule-based nurse rostering, which involves applying a set of heuristic rules for each nurse's assignment. The main framework of the algorithm is an estimation of distribution algorithm, in which an ant-miner methodology improves the individual solutions produced in each generation. Unlike our previous work (where learning is implicit), the learning in the memetic estimation of distribution algorithm is explicit, i.e. we are able to identify building blocks directly. The overall approach learns by building a probabilistic model, i.e. an estimation of the probability distribution of individual nurse-rule pairs that are used to construct schedules. The local search processor (i.e. the ant-miner) reinforces nurse-rule pairs that receive higher rewards. A challenging real world nurse rostering problem is used as the test problem. Computational results show that the proposed approach outperforms most existing approaches. It is suggested that the learning methodologies suggested in this paper may be applied to other scheduling problems where schedules are built systematically according to specific rules.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The service of a critical infrastructure, such as a municipal wastewater treatment plant (MWWTP), is taken for granted until a flood or another low frequency, high consequence crisis brings its fragility to attention. The unique aspects of the MWWTP call for a method to quantify the flood stage-duration-frequency relationship. By developing a bivariate joint distribution model of flood stage and duration, this study adds a second dimension, time, into flood risk studies. A new parameter, inter-event time, is developed to further illustrate the effect of event separation on the frequency assessment. The method is tested on riverine, estuary and tidal sites in the Mid-Atlantic region. Equipment damage functions are characterized by linear and step damage models. The Expected Annual Damage (EAD) of the underground equipment is further estimated by the parametric joint distribution model, which is a function of both flood stage and duration, demonstrating the application of the bivariate model in risk assessment. Flood likelihood may alter due to climate change. A sensitivity analysis method is developed to assess future flood risk by estimating flood frequency under conditions of higher sea level and stream flow response to increased precipitation intensity. Scenarios based on steady and unsteady flow analysis are generated for current climate, future climate within this century, and future climate beyond this century, consistent with the WWTP planning horizons. The spatial extent of flood risk is visualized by inundation mapping and GIS-Assisted Risk Register (GARR). This research will help the stakeholders of the critical infrastructure be aware of the flood risk, vulnerability, and the inherent uncertainty.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There are diferent applications in Engineering that require to compute improper integrals of the first kind (integrals defined on an unbounded domain) such as: the work required to move an object from the surface of the earth to in nity (Kynetic Energy), the electric potential created by a charged sphere, the probability density function or the cumulative distribution function in Probability Theory, the values of the Gamma Functions(wich useful to compute the Beta Function used to compute trigonometrical integrals), Laplace and Fourier Transforms (very useful, for example in Differential Equations).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper shows that the proposed Rician shadowed model for multi-antenna communications allows for the unification of a wide set of models, both for multiple-input multiple-output (MIMO) and single- input single-output (SISO) communications. The MIMO Rayleigh and MIMO Rician can be deduced from the MIMO Rician shadowed, and so their SISO counterparts. Other more general SISO models, besides the Rician shadowed, are included in the model, such as the κ-μ, and its recent generalization, the κ-μ shadowed model. Moreover, the SISO η-μ and Nakagami-q models are also included in the MIMO Rician shadowed model. The literature already presents the probability density function (pdf) of the Rician shadowed Gram channel matrix in terms of the well-known gamma- Wishart distribution. We here derive its moment generating function in a tractable form. Closed- form expressions for the cumulative distribution function and the pdf of the maximum eigenvalue are also carried out.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Méthodologie: Modèle de régression quantile de variable instrumentale pour données de Panel utilisant la fonction de production partielle