932 resultados para Probabilistic latent semantic model
Resumo:
This work provides a holistic investigation into the realm of feature modeling within software product lines. The work presented identifies limitations and challenges within the current feature modeling approaches. Those limitations include, but not limited to, the dearth of satisfactory cognitive presentation, inconveniency in scalable systems, inflexibility in adapting changes, nonexistence of predictability of models behavior, as well as the lack of probabilistic quantification of model’s implications and decision support for reasoning under uncertainty. The work in this thesis addresses these challenges by proposing a series of solutions. The first solution is the construction of a Bayesian Belief Feature Model, which is a novel modeling approach capable of quantifying the uncertainty measures in model parameters by a means of incorporating probabilistic modeling with a conventional modeling approach. The Bayesian Belief feature model presents a new enhanced feature modeling approach in terms of truth quantification and visual expressiveness. The second solution takes into consideration the unclear support for the reasoning under the uncertainty process, and the challenging constraint satisfaction problem in software product lines. This has been done through the development of a mathematical reasoner, which was designed to satisfy the model constraints by considering probability weight for all involved parameters and quantify the actual implications of the problem constraints. The developed Uncertain Constraint Satisfaction Problem approach has been tested and validated through a set of designated experiments. Profoundly stating, the main contributions of this thesis include the following: • Develop a framework for probabilistic graphical modeling to build the purported Bayesian belief feature model. • Extend the model to enhance visual expressiveness throughout the integration of colour degree variation; in which the colour varies with respect to the predefined probabilistic weights. • Enhance the constraints satisfaction problem by the uncertainty measuring of the parameters truth assumption. • Validate the developed approach against different experimental settings to determine its functionality and performance.
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.
Resumo:
Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.
Resumo:
The number of connected devices collecting and distributing real-world information through various systems, is expected to soar in the coming years. As the number of such connected devices grows, it becomes increasingly difficult to store and share all these new sources of information. Several context representation schemes try to standardize this information, but none of them have been widely adopted. In previous work we addressed this challenge, however our solution had some drawbacks: poor semantic extraction and scalability. In this paper we discuss ways to efficiently deal with representation schemes' diversity and propose a novel d-dimension organization model. Our evaluation shows that d-dimension model improves scalability and semantic extraction.
Resumo:
Markov Chain analysis was recently proposed to assess the time scales and preferential pathways into biological or physical networks by computing residence time, first passage time, rates of transfer between nodes and number of passages in a node. We propose to adapt an algorithm already published for simple systems to physical systems described with a high resolution hydrodynamic model. The method is applied to bays and estuaries on the Eastern Coast of Canada for their interest in shellfish aquaculture. Current velocities have been computed by using a 2 dimensional grid of elements and circulation patterns were summarized by averaging Eulerian flows between adjacent elements. Flows and volumes allow computing probabilities of transition between elements and to assess the average time needed by virtual particles to move from one element to another, the rate of transfer between two elements, and the average residence time of each system. We also combined transfer rates and times to assess the main pathways of virtual particles released in farmed areas and the potential influence of farmed areas on other areas. We suggest that Markov chain is complementary to other sets of ecological indicators proposed to analyse the interactions between farmed areas - e.g. depletion index, carrying capacity assessment. Markov Chain has several advantages with respect to the estimation of connectivity between pair of sites. It makes possible to estimate transfer rates and times at once in a very quick and efficient way, without the need to perform long term simulations of particle or tracer concentration.
Resumo:
In this thesis, we propose to infer pixel-level labelling in video by utilising only object category information, exploiting the intrinsic structure of video data. Our motivation is the observation that image-level labels are much more easily to be acquired than pixel-level labels, and it is natural to find a link between the image level recognition and pixel level classification in video data, which would transfer learned recognition models from one domain to the other one. To this end, this thesis proposes two domain adaptation approaches to adapt the deep convolutional neural network (CNN) image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of unlabelled video data. Our proposed approaches explicitly model and compensate for the domain adaptation from the source domain to the target domain which in turn underpins a robust semantic object segmentation method for natural videos. We demonstrate the superior performance of our methods by presenting extensive evaluations on challenging datasets comparing with the state-of-the-art methods.
Resumo:
In this thesis, we present a quantitative approach using probabilistic verification techniques for the analysis of reliability, availability, maintainability, and safety (RAMS) properties of satellite systems. The subject of our research is satellites used in mission critical industrial applications. A strong case for using probabilistic model checking to support RAMS analysis of satellite systems is made by our verification results. This study is intended to build a foundation to help reliability engineers with a basic background in model checking to apply probabilistic model checking to small satellite systems. We make two major contributions. One of these is the approach of RAMS analysis to satellite systems. In the past, RAMS analysis has been extensively applied to the field of electrical and electronics engineering. It allows system designers and reliability engineers to predict the likelihood of failures from the indication of historical or current operational data. There is a high potential for the application of RAMS analysis in the field of space science and engineering. However, there is a lack of standardisation and suitable procedures for the correct study of RAMS characteristics for satellite systems. This thesis considers the promising application of RAMS analysis to the case of satellite design, use, and maintenance, focusing on its system segments. Data collection and verification procedures are discussed, and a number of considerations are also presented on how to predict the probability of failure. Our second contribution is leveraging the power of probabilistic model checking to analyse satellite systems. We present techniques for analysing satellite systems that differ from the more common quantitative approaches based on traditional simulation and testing. These techniques have not been applied in this context before. We present the use of probabilistic techniques via a suite of detailed examples, together with their analysis. Our presentation is done in an incremental manner: in terms of complexity of application domains and system models, and a detailed PRISM model of each scenario. We also provide results from practical work together with a discussion about future improvements.
Resumo:
International audience
Resumo:
Investigation of large, destructive earthquakes is challenged by their infrequent occurrence and the remote nature of geophysical observations. This thesis sheds light on the source processes of large earthquakes from two perspectives: robust and quantitative observational constraints through Bayesian inference for earthquake source models, and physical insights on the interconnections of seismic and aseismic fault behavior from elastodynamic modeling of earthquake ruptures and aseismic processes.
To constrain the shallow deformation during megathrust events, we develop semi-analytical and numerical Bayesian approaches to explore the maximum resolution of the tsunami data, with a focus on incorporating the uncertainty in the forward modeling. These methodologies are then applied to invert for the coseismic seafloor displacement field in the 2011 Mw 9.0 Tohoku-Oki earthquake using near-field tsunami waveforms and for the coseismic fault slip models in the 2010 Mw 8.8 Maule earthquake with complementary tsunami and geodetic observations. From posterior estimates of model parameters and their uncertainties, we are able to quantitatively constrain the near-trench profiles of seafloor displacement and fault slip. Similar characteristic patterns emerge during both events, featuring the peak of uplift near the edge of the accretionary wedge with a decay toward the trench axis, with implications for fault failure and tsunamigenic mechanisms of megathrust earthquakes.
To understand the behavior of earthquakes at the base of the seismogenic zone on continental strike-slip faults, we simulate the interactions of dynamic earthquake rupture, aseismic slip, and heterogeneity in rate-and-state fault models coupled with shear heating. Our study explains the long-standing enigma of seismic quiescence on major fault segments known to have hosted large earthquakes by deeper penetration of large earthquakes below the seismogenic zone, where mature faults have well-localized creeping extensions. This conclusion is supported by the simulated relationship between seismicity and large earthquakes as well as by observations from recent large events. We also use the modeling to connect the geodetic observables of fault locking with the behavior of seismicity in numerical models, investigating how a combination of interseismic geodetic and seismological estimates could constrain the locked-creeping transition of faults and potentially their co- and post-seismic behavior.
Resumo:
For climate risk management, cumulative distribution functions (CDFs) are an important source of information. They are ideally suited to compare probabilistic forecasts of primary (e.g. rainfall) or secondary data (e.g. crop yields). Summarised as CDFs, such forecasts allow an easy quantitative assessment of possible, alternative actions. Although the degree of uncertainty associated with CDF estimation could influence decisions, such information is rarely provided. Hence, we propose Cox-type regression models (CRMs) as a statistical framework for making inferences on CDFs in climate science. CRMs were designed for modelling probability distributions rather than just mean or median values. This makes the approach appealing for risk assessments where probabilities of extremes are often more informative than central tendency measures. CRMs are semi-parametric approaches originally designed for modelling risks arising from time-to-event data. Here we extend this original concept beyond time-dependent measures to other variables of interest. We also provide tools for estimating CDFs and surrounding uncertainty envelopes from empirical data. These statistical techniques intrinsically account for non-stationarities in time series that might be the result of climate change. This feature makes CRMs attractive candidates to investigate the feasibility of developing rigorous global circulation model (GCM)-CRM interfaces for provision of user-relevant forecasts. To demonstrate the applicability of CRMs, we present two examples for El Ni ? no/Southern Oscillation (ENSO)-based forecasts: the onset date of the wet season (Cairns, Australia) and total wet season rainfall (Quixeramobim, Brazil). This study emphasises the methodological aspects of CRMs rather than discussing merits or limitations of the ENSO-based predictors.
Resumo:
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t + 1, using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
Resumo:
Event extraction from texts aims to detect structured information such as what has happened, to whom, where and when. Event extraction and visualization are typically considered as two different tasks. In this paper, we propose a novel approach based on probabilistic modelling to jointly extract and visualize events from tweets where both tasks benefit from each other. We model each event as a joint distribution over named entities, a date, a location and event-related keywords. Moreover, both tweets and event instances are associated with coordinates in the visualization space. The manifold assumption that the intrinsic geometry of tweets is a low-rank, non-linear manifold within the high-dimensional space is incorporated into the learning framework using a regularization. Experimental results show that the proposed approach can effectively deal with both event extraction and visualization and performs remarkably better than both the state-of-the-art event extraction method and a pipeline approach for event extraction and visualization.
Resumo:
In restructured power systems, generation and commercialization activities became market activities, while transmission and distribution activities continue as regulated monopolies. As a result, the adequacy of transmission network should be evaluated independent of generation system. After introducing the constrained fuzzy power flow (CFPF) as a suitable tool to quantify the adequacy of transmission network to satisfy 'reasonable demands for the transmission of electricity' (as stated, for instance, at European Directive 2009/72/EC), the aim is now showing how this approach can be used in conjunction with probabilistic criteria in security analysis. In classical security analysis models of power systems are considered the composite system (generation plus transmission). The state of system components is usually modeled with probabilities and loads (and generation) are modeled by crisp numbers, probability distributions or fuzzy numbers. In the case of CFPF the component’s failure of the transmission network have been investigated. In this framework, probabilistic methods are used for failures modeling of the transmission system components and possibility models are used to deal with 'reasonable demands'. The enhanced version of the CFPF model is applied to an illustrative case.
Resumo:
Maintenance of transport infrastructure assets is widely advocated as the key in minimizing current and future costs of the transportation network. While effective maintenance decisions are often a result of engineering skills and practical knowledge, efficient decisions must also account for the net result over an asset's life-cycle. One essential aspect in the long term perspective of transport infrastructure maintenance is to proactively estimate maintenance needs. In dealing with immediate maintenance actions, support tools that can prioritize potential maintenance candidates are important to obtain an efficient maintenance strategy. This dissertation consists of five individual research papers presenting a microdata analysis approach to transport infrastructure maintenance. Microdata analysis is a multidisciplinary field in which large quantities of data is collected, analyzed, and interpreted to improve decision-making. Increased access to transport infrastructure data enables a deeper understanding of causal effects and a possibility to make predictions of future outcomes. The microdata analysis approach covers the complete process from data collection to actual decisions and is therefore well suited for the task of improving efficiency in transport infrastructure maintenance. Statistical modeling was the selected analysis method in this dissertation and provided solutions to the different problems presented in each of the five papers. In Paper I, a time-to-event model was used to estimate remaining road pavement lifetimes in Sweden. In Paper II, an extension of the model in Paper I assessed the impact of latent variables on road lifetimes; displaying the sections in a road network that are weaker due to e.g. subsoil conditions or undetected heavy traffic. The study in Paper III incorporated a probabilistic parametric distribution as a representation of road lifetimes into an equation for the marginal cost of road wear. Differentiated road wear marginal costs for heavy and light vehicles are an important information basis for decisions regarding vehicle miles traveled (VMT) taxation policies. In Paper IV, a distribution based clustering method was used to distinguish between road segments that are deteriorating and road segments that have a stationary road condition. Within railway networks, temporary speed restrictions are often imposed because of maintenance and must be addressed in order to keep punctuality. The study in Paper V evaluated the empirical effect on running time of speed restrictions on a Norwegian railway line using a generalized linear mixed model.
Resumo:
Conceptual interpretation of languages has gathered peak interest in the world of artificial intelligence. The challenge in modeling various complications involved in a language is the main motivation behind our work. Our main focus in this work is to develop conceptual graphical representation for image captions. We have used discourse representation structure to gain semantic information which is further modeled into a graphical structure. The effectiveness of the model is evaluated by a caption based image retrieval system. The image retrieval is performed by computing subgraph based similarity measures. Best retrievals were given an average rating of . ± . out of 4 by a group of 25 human judges. The experiments were performed on a subset of the SBU Captioned Photo Dataset. This purpose of this work is to establish the cognitive sensibility of the approach to caption representations