982 resultados para Probability Distribution


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Various physical systems have dynamics that can be modeled by percolation processes. Percolation is used to study issues ranging from fluid diffusion through disordered media to fragmentation of a computer network caused by hacker attacks. A common feature of all of these systems is the presence of two non-coexistent regimes associated to certain properties of the system. For example: the disordered media can allow or not allow the flow of the fluid depending on its porosity. The change from one regime to another characterizes the percolation phase transition. The standard way of analyzing this transition uses the order parameter, a variable related to some characteristic of the system that exhibits zero value in one of the regimes and a nonzero value in the other. The proposal introduced in this thesis is that this phase transition can be investigated without the explicit use of the order parameter, but rather through the Shannon entropy. This entropy is a measure of the uncertainty degree in the information content of a probability distribution. The proposal is evaluated in the context of cluster formation in random graphs, and we apply the method to both classical percolation (Erd¨os- R´enyi) and explosive percolation. It is based in the computation of the entropy contained in the cluster size probability distribution and the results show that the transition critical point relates to the derivatives of the entropy. Furthermore, the difference between the smooth and abrupt aspects of the classical and explosive percolation transitions, respectively, is reinforced by the observation that the entropy has a maximum value in the classical transition critical point, while that correspondence does not occurs during the explosive percolation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this paper is to suggest a simple methodology to be used by renewable power generators to bid in Spanish markets in order to minimize the cost of their imbalances. As it is known, the optimal bid depends on the probability distribution function of the energy to produce, of the probability distribution function of the future system imbalance and of its expected cost. We assume simple methods for estimating any of these parameters and, using actual data of 2014, we test the potential economic benefit for a wind generator from using our optimal bid instead of just the expected power generation. We find evidence that Spanish wind generators savings would be from 7% to 26%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.

Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent theoretical advances predict the existence, deep into the glass phase, of a novel phase transition, the so-called Gardner transition. This transition is associated with the emergence of a complex free energy landscape composed of many marginally stable sub-basins within a glass metabasin. In this study, we explore several methods to detect numerically the Gardner transition in a simple structural glass former, the infinite-range Mari-Kurchan model. The transition point is robustly located from three independent approaches: (i) the divergence of the characteristic relaxation time, (ii) the divergence of the caging susceptibility, and (iii) the abnormal tail in the probability distribution function of cage order parameters. We show that the numerical results are fully consistent with the theoretical expectation. The methods we propose may also be generalized to more realistic numerical models as well as to experimental systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prior research has established that idiosyncratic volatility of the securities prices exhibits a positive trend. This trend and other factors have made the merits of investment diversification and portfolio construction more compelling. A new optimization technique, a greedy algorithm, is proposed to optimize the weights of assets in a portfolio. The main benefits of using this algorithm are to: a) increase the efficiency of the portfolio optimization process, b) implement large-scale optimizations, and c) improve the resulting optimal weights. In addition, the technique utilizes a novel approach in the construction of a time-varying covariance matrix. This involves the application of a modified integrated dynamic conditional correlation GARCH (IDCC - GARCH) model to account for the dynamics of the conditional covariance matrices that are employed. The stochastic aspects of the expected return of the securities are integrated into the technique through Monte Carlo simulations. Instead of representing the expected returns as deterministic values, they are assigned simulated values based on their historical measures. The time-series of the securities are fitted into a probability distribution that matches the time-series characteristics using the Anderson-Darling goodness-of-fit criterion. Simulated and actual data sets are used to further generalize the results. Employing the S&P500 securities as the base, 2000 simulated data sets are created using Monte Carlo simulation. In addition, the Russell 1000 securities are used to generate 50 sample data sets. The results indicate an increase in risk-return performance. Choosing the Value-at-Risk (VaR) as the criterion and the Crystal Ball portfolio optimizer, a commercial product currently available on the market, as the comparison for benchmarking, the new greedy technique clearly outperforms others using a sample of the S&P500 and the Russell 1000 securities. The resulting improvements in performance are consistent among five securities selection methods (maximum, minimum, random, absolute minimum, and absolute maximum) and three covariance structures (unconditional, orthogonal GARCH, and integrated dynamic conditional GARCH).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We calculate the first two moments and full probability distribution of the work performed on a system of bosonic particles in a two-mode Bose-Hubbard Hamiltonian when the self-interaction term is varied instantaneously or with a finite-time ramp. In the instantaneous case, we show how the irreversible work scales differently depending on whether the system is driven to the Josephson or Fock regime of the bosonic Josephson junction. In the finite-time case, we use optimal control techniques to substantially decrease the irreversible work to negligible values. Our analysis can be implemented in present-day experiments with ultracold atoms and we show how to relate the work statistics to that of the population imbalance of the two modes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cette thèse s’inscrit dans le contexte d’une optimisation industrielle et économique des éléments de structure en BFUP permettant d’en garantir la ductilité au niveau structural, tout en ajustant la quantité de fibres et en optimisant le mode de fabrication. Le modèle développé décrit explicitement la participation du renfort fibré en traction au niveau local, en enchaînant une phase de comportement écrouissante suivie d’une phase adoucissante. La loi de comportement est fonction de la densité, de l’orientation des fibres vis-à-vis des directions principales de traction, de leur élancement et d’autres paramètres matériaux usuels liés aux fibres, à la matrice cimentaire et à leur interaction. L’orientation des fibres est prise en compte à partir d’une loi de probabilité normale à une ou deux variables permettant de reproduire n’importe quelle orientation obtenue à partir d’un calcul représentatif de la mise en oeuvre du BFUP frais ou renseignée par analyse expérimentale sur prototype. Enfin, le modèle reproduit la fissuration des BFUP sur le principe des modèles de fissures diffuses et tournantes. La loi de comportement est intégrée au sein d’un logiciel de calcul de structure par éléments finis, permettant de l’utiliser comme un outil prédictif de la fiabilité et de la ductilité globale d’éléments en BFUP. Deux campagnes expérimentales ont été effectuées, une à l’Université Laval de Québec et l’autre à l’Ifsttar, Marne-la-Vallée. La première permet de valider la capacité du modèle reproduire le comportement global sous des sollicitations typiques de traction et de flexion dans des éléments structurels simples pour lesquels l’orientation préférentielle des fibres a été renseignée par tomographie. La seconde campagne expérimentale démontre les capacités du modèle dans une démarche d’optimisation, pour la fabrication de plaques nervurées relativement complexes et présentant un intérêt industriel potentiel pour lesquels différentes modalités de fabrication et des BFUP plus ou moins fibrés ont été envisagés. Le contrôle de la répartition et de l’orientation des fibres a été réalisé à partir d’essais mécaniques sur prélèvements. Les prévisions du modèle ont été confrontées au comportement structurel global et à la ductilité mis en évidence expérimentalement. Le modèle a ainsi pu être qualifié vis-à-vis des méthodes analytiques usuelles de l’ingénierie, en prenant en compte la variabilité statistique. Des pistes d’amélioration et de complément de développement ont été identifiées.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract. Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each person's assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

International audience

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract. Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each person's assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present work proposes a Hypothesis Test to detect a shift in the variance of a series of independent normal observations using a statistic based on the p-values of the F distribution. Since the probability distribution function of this statistic is intractable, critical values were we estimated numerically through extensive simulation. A regression approach was used to simplify the quantile evaluation and extrapolation. The power of the test was simulated using Monte Carlo simulation, and the results were compared with the Chen test (1997) to prove its efficiency. Time series analysts might find the test useful to address homoscedasticity studies were at most one change might be involved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The service of a critical infrastructure, such as a municipal wastewater treatment plant (MWWTP), is taken for granted until a flood or another low frequency, high consequence crisis brings its fragility to attention. The unique aspects of the MWWTP call for a method to quantify the flood stage-duration-frequency relationship. By developing a bivariate joint distribution model of flood stage and duration, this study adds a second dimension, time, into flood risk studies. A new parameter, inter-event time, is developed to further illustrate the effect of event separation on the frequency assessment. The method is tested on riverine, estuary and tidal sites in the Mid-Atlantic region. Equipment damage functions are characterized by linear and step damage models. The Expected Annual Damage (EAD) of the underground equipment is further estimated by the parametric joint distribution model, which is a function of both flood stage and duration, demonstrating the application of the bivariate model in risk assessment. Flood likelihood may alter due to climate change. A sensitivity analysis method is developed to assess future flood risk by estimating flood frequency under conditions of higher sea level and stream flow response to increased precipitation intensity. Scenarios based on steady and unsteady flow analysis are generated for current climate, future climate within this century, and future climate beyond this century, consistent with the WWTP planning horizons. The spatial extent of flood risk is visualized by inundation mapping and GIS-Assisted Risk Register (GARR). This research will help the stakeholders of the critical infrastructure be aware of the flood risk, vulnerability, and the inherent uncertainty.