2 resultados para Probability Distribution

em DRUM (Digital Repository at the University of Maryland)


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The service of a critical infrastructure, such as a municipal wastewater treatment plant (MWWTP), is taken for granted until a flood or another low frequency, high consequence crisis brings its fragility to attention. The unique aspects of the MWWTP call for a method to quantify the flood stage-duration-frequency relationship. By developing a bivariate joint distribution model of flood stage and duration, this study adds a second dimension, time, into flood risk studies. A new parameter, inter-event time, is developed to further illustrate the effect of event separation on the frequency assessment. The method is tested on riverine, estuary and tidal sites in the Mid-Atlantic region. Equipment damage functions are characterized by linear and step damage models. The Expected Annual Damage (EAD) of the underground equipment is further estimated by the parametric joint distribution model, which is a function of both flood stage and duration, demonstrating the application of the bivariate model in risk assessment. Flood likelihood may alter due to climate change. A sensitivity analysis method is developed to assess future flood risk by estimating flood frequency under conditions of higher sea level and stream flow response to increased precipitation intensity. Scenarios based on steady and unsteady flow analysis are generated for current climate, future climate within this century, and future climate beyond this century, consistent with the WWTP planning horizons. The spatial extent of flood risk is visualized by inundation mapping and GIS-Assisted Risk Register (GARR). This research will help the stakeholders of the critical infrastructure be aware of the flood risk, vulnerability, and the inherent uncertainty.