2 resultados para approximation error
em DRUM (Digital Repository at the University of Maryland)
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.
Resumo:
We present a detailed analysis of the application of a multi-scale Hierarchical Reconstruction method for solving a family of ill-posed linear inverse problems. When the observations on the unknown quantity of interest and the observation operators are known, these inverse problems are concerned with the recovery of the unknown from its observations. Although the observation operators we consider are linear, they are inevitably ill-posed in various ways. We recall in this context the classical Tikhonov regularization method with a stabilizing function which targets the specific ill-posedness from the observation operators and preserves desired features of the unknown. Having studied the mechanism of the Tikhonov regularization, we propose a multi-scale generalization to the Tikhonov regularization method, so-called the Hierarchical Reconstruction (HR) method. First introduction of the HR method can be traced back to the Hierarchical Decomposition method in Image Processing. The HR method successively extracts information from the previous hierarchical residual to the current hierarchical term at a finer hierarchical scale. As the sum of all the hierarchical terms, the hierarchical sum from the HR method provides an reasonable approximate solution to the unknown, when the observation matrix satisfies certain conditions with specific stabilizing functions. When compared to the Tikhonov regularization method on solving the same inverse problems, the HR method is shown to be able to decrease the total number of iterations, reduce the approximation error, and offer self control of the approximation distance between the hierarchical sum and the unknown, thanks to using a ladder of finitely many hierarchical scales. We report numerical experiments supporting our claims on these advantages the HR method has over the Tikhonov regularization method.