920 resultados para probability distribution


Relevância:

60.00% 60.00%

Publicador:

Resumo:

L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cette thèse s’inscrit dans le contexte d’une optimisation industrielle et économique des éléments de structure en BFUP permettant d’en garantir la ductilité au niveau structural, tout en ajustant la quantité de fibres et en optimisant le mode de fabrication. Le modèle développé décrit explicitement la participation du renfort fibré en traction au niveau local, en enchaînant une phase de comportement écrouissante suivie d’une phase adoucissante. La loi de comportement est fonction de la densité, de l’orientation des fibres vis-à-vis des directions principales de traction, de leur élancement et d’autres paramètres matériaux usuels liés aux fibres, à la matrice cimentaire et à leur interaction. L’orientation des fibres est prise en compte à partir d’une loi de probabilité normale à une ou deux variables permettant de reproduire n’importe quelle orientation obtenue à partir d’un calcul représentatif de la mise en oeuvre du BFUP frais ou renseignée par analyse expérimentale sur prototype. Enfin, le modèle reproduit la fissuration des BFUP sur le principe des modèles de fissures diffuses et tournantes. La loi de comportement est intégrée au sein d’un logiciel de calcul de structure par éléments finis, permettant de l’utiliser comme un outil prédictif de la fiabilité et de la ductilité globale d’éléments en BFUP. Deux campagnes expérimentales ont été effectuées, une à l’Université Laval de Québec et l’autre à l’Ifsttar, Marne-la-Vallée. La première permet de valider la capacité du modèle reproduire le comportement global sous des sollicitations typiques de traction et de flexion dans des éléments structurels simples pour lesquels l’orientation préférentielle des fibres a été renseignée par tomographie. La seconde campagne expérimentale démontre les capacités du modèle dans une démarche d’optimisation, pour la fabrication de plaques nervurées relativement complexes et présentant un intérêt industriel potentiel pour lesquels différentes modalités de fabrication et des BFUP plus ou moins fibrés ont été envisagés. Le contrôle de la répartition et de l’orientation des fibres a été réalisé à partir d’essais mécaniques sur prélèvements. Les prévisions du modèle ont été confrontées au comportement structurel global et à la ductilité mis en évidence expérimentalement. Le modèle a ainsi pu être qualifié vis-à-vis des méthodes analytiques usuelles de l’ingénierie, en prenant en compte la variabilité statistique. Des pistes d’amélioration et de complément de développement ont été identifiées.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract. Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each person's assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

International audience

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract. Two ideas taken from Bayesian optimization and classifier systems are presented for personnel scheduling based on choosing a suitable scheduling rule from a set for each person's assignment. Unlike our previous work of using genetic algorithms whose learning is implicit, the learning in both approaches is explicit, i.e. we are able to identify building blocks directly. To achieve this target, the Bayesian optimization algorithm builds a Bayesian network of the joint probability distribution of the rules used to construct solutions, while the adapted classifier system assigns each rule a strength value that is constantly updated according to its usefulness in the current situation. Computational results from 52 real data instances of nurse scheduling demonstrate the success of both approaches. It is also suggested that the learning mechanism in the proposed approaches might be suitable for other scheduling problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present work proposes a Hypothesis Test to detect a shift in the variance of a series of independent normal observations using a statistic based on the p-values of the F distribution. Since the probability distribution function of this statistic is intractable, critical values were we estimated numerically through extensive simulation. A regression approach was used to simplify the quantile evaluation and extrapolation. The power of the test was simulated using Monte Carlo simulation, and the results were compared with the Chen test (1997) to prove its efficiency. Time series analysts might find the test useful to address homoscedasticity studies were at most one change might be involved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The service of a critical infrastructure, such as a municipal wastewater treatment plant (MWWTP), is taken for granted until a flood or another low frequency, high consequence crisis brings its fragility to attention. The unique aspects of the MWWTP call for a method to quantify the flood stage-duration-frequency relationship. By developing a bivariate joint distribution model of flood stage and duration, this study adds a second dimension, time, into flood risk studies. A new parameter, inter-event time, is developed to further illustrate the effect of event separation on the frequency assessment. The method is tested on riverine, estuary and tidal sites in the Mid-Atlantic region. Equipment damage functions are characterized by linear and step damage models. The Expected Annual Damage (EAD) of the underground equipment is further estimated by the parametric joint distribution model, which is a function of both flood stage and duration, demonstrating the application of the bivariate model in risk assessment. Flood likelihood may alter due to climate change. A sensitivity analysis method is developed to assess future flood risk by estimating flood frequency under conditions of higher sea level and stream flow response to increased precipitation intensity. Scenarios based on steady and unsteady flow analysis are generated for current climate, future climate within this century, and future climate beyond this century, consistent with the WWTP planning horizons. The spatial extent of flood risk is visualized by inundation mapping and GIS-Assisted Risk Register (GARR). This research will help the stakeholders of the critical infrastructure be aware of the flood risk, vulnerability, and the inherent uncertainty.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mobile and wireless networks have long exploited mobility predictions, focused on predicting the future location of given users, to perform more efficient network resource management. In this paper, we present a new approach in which we provide predictions as a probability distribution of the likelihood of moving to a set of future locations. This approach provides wireless services a greater amount of knowledge and enables them to perform more effectively. We present a framework for the evaluation of this new type of predictor, and develop 2 new predictors, HEM and G-Stat. We evaluate our predictors accuracy in predicting future cells for mobile users, using two large geolocation data sets, from MDC [11], [12] and Crawdad [13]. We show that our predictors can successfully predict with as low as an average 2.2% inaccuracy in certain scenarios.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis builds a framework for evaluating downside risk from multivariate data via a special class of risk measures (RM). The peculiarity of the analysis lies in getting rid of strong data distributional assumptions and in orientation towards the most critical data in risk management: those with asymmetries and heavy tails. At the same time, under typical assumptions, such as the ellipticity of the data probability distribution, the conformity with classical methods is shown. The constructed class of RM is a multivariate generalization of the coherent distortion RM, which possess valuable properties for a risk manager. The design of the framework is twofold. The first part contains new computational geometry methods for the high-dimensional data. The developed algorithms demonstrate computability of geometrical concepts used for constructing the RM. These concepts bring visuality and simplify interpretation of the RM. The second part develops models for applying the framework to actual problems. The spectrum of applications varies from robust portfolio selection up to broader spheres, such as stochastic conic optimization with risk constraints or supervised machine learning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract: The objectives of this study were to evaluate the combined effects of soil bioticand abiotic factors on the incidence of Fusarium corn stalk rot, during four annual incorporations of two typesofsewagesludge intosoil ina 5-years field assay under tropical conditions and topredict the effectsof these variables on the disease. For each type of sewage sludge, the following treatments were included: control with mineral fertilization recommended for corn; control without fertilization; sewage sludge based on the nitrogen concentration that provided the same amount of nitrogen as in the mineral fertilizer treatment; and sewage sludge that provided two, four and eight times the nitrogen concentration recommended for corn. Increasing dosages of both types of sewage sludge incorporated into soil resulted in increased corn stalk rot incidence, being negatively correlated with corn yield. A global analysis highlighted the effect of the year of the experiment, followed by the sewage sludge dosages. The type of sewage sludge did not affect the disease incidence. Amultiple logistic model using a stepwise procedure was fitted based on the selection of a model that included the three explanatory parameters for disease incidence: electrical conductivity, magnesium and Fusarium population. In the selected model, the probability of higher disease incidence increased with an increase of these three explanatory parameters. When the explanatory parameters were compared, electrical conductivity presented a dominant effect and was the main variable to predict the probability distribution curves of Fusarium corn stalk rot, after sewage sludge application into the soil.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We derive a very general expression of the survival probability and the first passage time distribution for a particle executing Brownian motion in full phase space with an absorbing boundary condition at a point in the position space, which is valid irrespective of the statistical nature of the dynamics. The expression, together with the Jensen's inequality, naturally leads to a lower bound to the actual survival probability and an approximate first passage time distribution. These are expressed in terms of the position-position, velocity-velocity, and position-velocity variances. Knowledge of these variances enables one to compute a lower bound to the survival probability and consequently the first passage distribution function. As examples, we compute these for a Gaussian Markovian process and, in the case of non-Markovian process, with an exponentially decaying friction kernel and also with a power law friction kernel. Our analysis shows that the survival probability decays exponentially at the long time irrespective of the nature of the dynamics with an exponent equal to the transition state rate constant.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes a PSO based approach to increase the probability of delivering power to any load point by identifying new investments in distribution energy systems. The statistical failure and repair data of distribution components is the main basis of the proposed methodology that uses a fuzzyprobabilistic modeling for the components outage parameters. The fuzzy membership functions of the outage parameters of each component are based on statistical records. A Modified Discrete PSO optimization model is developed in order to identify the adequate investments in distribution energy system components which allow increasing the probability of delivering power to any customer in the distribution system at the minimum possible cost for the system operator. To illustrate the application of the proposed methodology, the paper includes a case study that considers a 180 bus distribution network.