975 resultados para probabilistic topic models
Resumo:
This paper presents an effective decision making system for leak detection based on multiple generalized linear models and clustering techniques. The training data for the proposed decision system is obtained by setting up an experimental pipeline fully operational distribution system. The system is also equipped with data logging for three variables; namely, inlet pressure, outlet pressure, and outlet flow. The experimental setup is designed such that multi-operational conditions of the distribution system, including multi pressure and multi flow can be obtained. We then statistically tested and showed that pressure and flow variables can be used as signature of leak under the designed multi-operational conditions. It is then shown that the detection of leakages based on the training and testing of the proposed multi model decision system with pre data clustering, under multi operational conditions produces better recognition rates in comparison to the training based on the single model approach. This decision system is then equipped with the estimation of confidence limits and a method is proposed for using these confidence limits for obtaining more robust leakage recognition results.
Resumo:
The Models@run.time workshop (MRT) series offers a discussion forum for the rising need to leverage modeling techniques at runtime for the software of the future. MRT has become a mature research topic, which is, e.g., reflected in separate sessions at conferences covering MRT approaches only. The target venues of the workshops audience changed from workshops to conferences. Hence, new topics in the area of MRT need to be identified, which are not yet mature enough for conferences. In consequence, the main goal of this edition was to reflect on the past decade of the workshop's history and to identify new future directions for the workshop.
Resumo:
2000 Mathematics Subject Classification: 78A50
Resumo:
Topic classification (TC) of short text messages offers an effective and fast way to reveal events happening around the world ranging from those related to Disaster (e.g. Sandy hurricane) to those related to Violence (e.g. Egypt revolution). Previous approaches to TC have mostly focused on exploiting individual knowledge sources (KS) (e.g. DBpedia or Freebase) without considering the graph structures that surround concepts present in KSs when detecting the topics of Tweets. In this paper we introduce a novel approach for harnessing such graph structures from multiple linked KSs, by: (i) building a conceptual representation of the KSs, (ii) leveraging contextual information about concepts by exploiting semantic concept graphs, and (iii) providing a principled way for the combination of KSs. Experiments evaluating our TC classifier in the context of Violence detection (VD) and Emergency Responses (ER) show promising results that significantly outperform various baseline models including an approach using a single KS without linked data and an approach using only Tweets. Copyright 2013 ACM.
Resumo:
An investigation into karst hazard in southern Ontario has been undertaken with the intention of leading to the development of predictive karst models for this region. The reason these are not currently feasible is a lack of sufficient karst data, though this is not entirely due to the lack of karst features. Geophysical data was collected at Lake on the Mountain, Ontario as part of this karst investigation. This data was collected in order to validate the long-standing hypothesis that Lake on the Mountain was formed from a sinkhole collapse. Sub-bottom acoustic profiling data was collected in order to image the lake bottom sediments and bedrock. Vertical bedrock features interpreted as solutionally enlarged fractures were taken as evidence for karst processes on the lake bottom. Additionally, the bedrock topography shows a narrower and more elongated basin than was previously identified, and this also lies parallel to a mapped fault system in the area. This suggests that Lake on the Mountain was formed over a fault zone which also supports the sinkhole hypothesis as it would provide groundwater pathways for karst dissolution to occur. Previous sediment cores suggest that Lake on the Mountain would have formed at some point during the Wisconsinan glaciation with glacial meltwater and glacial loading as potential contributing factors to sinkhole development. A probabilistic karst model for the state of Kentucky, USA, has been generated using the Weights of Evidence method. This model is presented as an example of the predictive capabilities of these kind of data-driven modelling techniques and to show how such models could be applied to karst in Ontario. The model was able to classify 70% of the validation dataset correctly while minimizing false positive identifications. This is moderately successful and could stand to be improved. Finally, suggestions to improving the current karst model of southern Ontario are suggested with the goal of increasing investigation into karst in Ontario and streamlining the reporting system for sinkholes, caves, and other karst features so as to improve the current Ontario karst database.
Resumo:
L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.
Resumo:
This work provides a holistic investigation into the realm of feature modeling within software product lines. The work presented identifies limitations and challenges within the current feature modeling approaches. Those limitations include, but not limited to, the dearth of satisfactory cognitive presentation, inconveniency in scalable systems, inflexibility in adapting changes, nonexistence of predictability of models behavior, as well as the lack of probabilistic quantification of model’s implications and decision support for reasoning under uncertainty. The work in this thesis addresses these challenges by proposing a series of solutions. The first solution is the construction of a Bayesian Belief Feature Model, which is a novel modeling approach capable of quantifying the uncertainty measures in model parameters by a means of incorporating probabilistic modeling with a conventional modeling approach. The Bayesian Belief feature model presents a new enhanced feature modeling approach in terms of truth quantification and visual expressiveness. The second solution takes into consideration the unclear support for the reasoning under the uncertainty process, and the challenging constraint satisfaction problem in software product lines. This has been done through the development of a mathematical reasoner, which was designed to satisfy the model constraints by considering probability weight for all involved parameters and quantify the actual implications of the problem constraints. The developed Uncertain Constraint Satisfaction Problem approach has been tested and validated through a set of designated experiments. Profoundly stating, the main contributions of this thesis include the following: • Develop a framework for probabilistic graphical modeling to build the purported Bayesian belief feature model. • Extend the model to enhance visual expressiveness throughout the integration of colour degree variation; in which the colour varies with respect to the predefined probabilistic weights. • Enhance the constraints satisfaction problem by the uncertainty measuring of the parameters truth assumption. • Validate the developed approach against different experimental settings to determine its functionality and performance.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.
Resumo:
People, animals and the environment can be exposed to multiple chemicals at once from a variety of sources, but current risk assessment is usually carried out based on one chemical substance at a time. In human health risk assessment, ingestion of food is considered a major route of exposure to many contaminants, namely mycotoxins, a wide group of fungal secondary metabolites that are known to potentially cause toxicity and carcinogenic outcomes. Mycotoxins are commonly found in a variety of foods including those intended for consumption by infants and young children and have been found in processed cereal-based foods available in the Portuguese market. The use of mathematical models, including probabilistic approaches using Monte Carlo simulations, constitutes a prominent issue in human health risk assessment in general and in mycotoxins exposure assessment in particular. The present study aims to characterize, for the first time, the risk associated with the exposure of Portuguese children to single and multiple mycotoxins present in processed cereal-based foods (CBF). Portuguese children (0-3 years old) food consumption data (n=103) were collected using a 3 days food diary. Contamination data concerned the quantification of 12 mycotoxins (aflatoxins, ochratoxin A, fumonisins and trichothecenes) were evaluated in 20 CBF samples marketed in 2014 and 2015 in Lisbon; samples were analyzed by HPLC-FLD, LC-MS/MS and GC-MS. Daily exposure of children to mycotoxins was performed using deterministic and probabilistic approaches. Different strategies were used to treat the left censored data. For aflatoxins, as carcinogenic compounds, the margin of exposure (MoE) was calculated as a ratio of BMDL (benchmark dose lower confidence limit) to the aflatoxin exposure. The magnitude of the MoE gives an indication of the risk level. For the remaining mycotoxins, the output of exposure was compared to the dose reference values (TDI) in order to calculate the hazard quotients (ratio between exposure and a reference dose, HQ). For the cumulative risk assessment of multiple mycotoxins, the concentration addition (CA) concept was used. The combined margin of exposure (MoET) and the hazard index (HI) were calculated for aflatoxins and the remaining mycotoxins, respectively. 71% of CBF analyzed samples were contaminated with mycotoxins (with values below the legal limits) and approximately 56% of the studied children consumed CBF at least once in these 3 days. Preliminary results showed that children exposure to single mycotoxins present in CBF were below the TDI. Aflatoxins MoE and MoET revealed a reduced potential risk by exposure through consumption of CBF (with values around 10000 or more). HQ and HI values for the remaining mycotoxins were below 1. Children are a particularly vulnerable population group to food contaminants and the present results point out an urgent need to establish legal limits and control strategies regarding the presence of multiple mycotoxins in children foods in order to protect their health. The development of packaging materials with antifungal properties is a possible solution to control the growth of moulds and consequently to reduce mycotoxin production, contributing to guarantee the quality and safety of foods intended for children consumption.
Resumo:
Our research has shown that schedules can be built mimicking a human scheduler by using a set of rules that involve domain knowledge. This chapter presents a Bayesian Optimization Algorithm (BOA) for the nurse scheduling problem that chooses such suitable scheduling rules from a set for each nurse’s assignment. Based on the idea of using probabilistic models, the BOA builds a Bayesian network for the set of promising solutions and samples these networks to generate new candidate solutions. Computational results from 52 real data instances demonstrate the success of this approach. It is also suggested that the learning mechanism in the proposed algorithm may be suitable for other scheduling problems.
Resumo:
In this thesis, we present a quantitative approach using probabilistic verification techniques for the analysis of reliability, availability, maintainability, and safety (RAMS) properties of satellite systems. The subject of our research is satellites used in mission critical industrial applications. A strong case for using probabilistic model checking to support RAMS analysis of satellite systems is made by our verification results. This study is intended to build a foundation to help reliability engineers with a basic background in model checking to apply probabilistic model checking to small satellite systems. We make two major contributions. One of these is the approach of RAMS analysis to satellite systems. In the past, RAMS analysis has been extensively applied to the field of electrical and electronics engineering. It allows system designers and reliability engineers to predict the likelihood of failures from the indication of historical or current operational data. There is a high potential for the application of RAMS analysis in the field of space science and engineering. However, there is a lack of standardisation and suitable procedures for the correct study of RAMS characteristics for satellite systems. This thesis considers the promising application of RAMS analysis to the case of satellite design, use, and maintenance, focusing on its system segments. Data collection and verification procedures are discussed, and a number of considerations are also presented on how to predict the probability of failure. Our second contribution is leveraging the power of probabilistic model checking to analyse satellite systems. We present techniques for analysing satellite systems that differ from the more common quantitative approaches based on traditional simulation and testing. These techniques have not been applied in this context before. We present the use of probabilistic techniques via a suite of detailed examples, together with their analysis. Our presentation is done in an incremental manner: in terms of complexity of application domains and system models, and a detailed PRISM model of each scenario. We also provide results from practical work together with a discussion about future improvements.
Resumo:
Investigation of large, destructive earthquakes is challenged by their infrequent occurrence and the remote nature of geophysical observations. This thesis sheds light on the source processes of large earthquakes from two perspectives: robust and quantitative observational constraints through Bayesian inference for earthquake source models, and physical insights on the interconnections of seismic and aseismic fault behavior from elastodynamic modeling of earthquake ruptures and aseismic processes.
To constrain the shallow deformation during megathrust events, we develop semi-analytical and numerical Bayesian approaches to explore the maximum resolution of the tsunami data, with a focus on incorporating the uncertainty in the forward modeling. These methodologies are then applied to invert for the coseismic seafloor displacement field in the 2011 Mw 9.0 Tohoku-Oki earthquake using near-field tsunami waveforms and for the coseismic fault slip models in the 2010 Mw 8.8 Maule earthquake with complementary tsunami and geodetic observations. From posterior estimates of model parameters and their uncertainties, we are able to quantitatively constrain the near-trench profiles of seafloor displacement and fault slip. Similar characteristic patterns emerge during both events, featuring the peak of uplift near the edge of the accretionary wedge with a decay toward the trench axis, with implications for fault failure and tsunamigenic mechanisms of megathrust earthquakes.
To understand the behavior of earthquakes at the base of the seismogenic zone on continental strike-slip faults, we simulate the interactions of dynamic earthquake rupture, aseismic slip, and heterogeneity in rate-and-state fault models coupled with shear heating. Our study explains the long-standing enigma of seismic quiescence on major fault segments known to have hosted large earthquakes by deeper penetration of large earthquakes below the seismogenic zone, where mature faults have well-localized creeping extensions. This conclusion is supported by the simulated relationship between seismicity and large earthquakes as well as by observations from recent large events. We also use the modeling to connect the geodetic observables of fault locking with the behavior of seismicity in numerical models, investigating how a combination of interseismic geodetic and seismological estimates could constrain the locked-creeping transition of faults and potentially their co- and post-seismic behavior.
Resumo:
The U.S. Nuclear Regulatory Commission implemented a safety goal policy in response to the 1979 Three Mile Island accident. This policy addresses the question “How safe is safe enough?” by specifying quantitative health objectives (QHOs) for comparison with results from nuclear power plant (NPP) probabilistic risk analyses (PRAs) to determine whether proposed regulatory actions are justified based on potential safety benefit. Lessons learned from recent operating experience—including the 2011 Fukushima accident—indicate that accidents involving multiple units at a shared site can occur with non-negligible frequency. Yet risk contributions from such scenarios are excluded by policy from safety goal evaluations—even for the nearly 60% of U.S. NPP sites that include multiple units. This research develops and applies methods for estimating risk metrics for comparison with safety goal QHOs using models from state-of-the-art consequence analyses to evaluate the effect of including multi-unit accident risk contributions in safety goal evaluations.
Resumo:
The topic of the Ph.D project focuses on the modelling of the soil-water dynamics inside an instrumented embankment section along Secchia River (Cavezzo (MO)) in the period from 2017 to 2018 and the quantification of the performance of the direct and indirect simulations . The commercial code Hydrus2D by Pc-Progress has been chosen to run the direct simulations. Different soil-hydraulic models have been adopted and compared. The parameters of the different hydraulic models are calibrated using a local optimization method based on the Levenberg - Marquardt algorithm implemented in the Hydrus package. The calibration program is carried out using different types of dataset of observation points, different weighting distributions, different combinations of optimized parameters and different initial sets of parameters. The final goal is an in-depth study of the potentialities and limits of the inverse analysis when applied to a complex geotechnical problem as the case study. The second part of the research focuses on the effects of plant roots and soil-vegetation-atmosphere interaction on the spatial and temporal distribution of pore water pressure in soil. The investigated soil belongs to the West Charlestown Bypass embankment, Newcastle, Australia, that showed in the past years shallow instabilities and the use of long stem planting is intended to stabilize the slope. The chosen plant species is the Malaleuca Styphelioides, native of eastern Australia. The research activity included the design and realization of a specific large scale apparatus for laboratory experiments. Local suction measurements at certain intervals of depth and radial distances from the root bulb are recorded within the vegetated soil mass under controlled boundary conditions. The experiments are then reproduced numerically using the commercial code Hydrus 2D. Laboratory data are used to calibrate the RWU parameters and the parameters of the hydraulic model.