3 resultados para probability models

em DRUM (Digital Repository at the University of Maryland)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The American woodcock (Scolopax minor) population index in North America has declined 0.9% a year since 1968 prompting managers to identify priority information and management needs for the species (Sauer et al 2008). Managers identified a need for a population model that better informs on the status of American woodcock populations (Case et al. 2010). Population reconstruction techniques use long-term age-at-harvest data and harvest effort to estimate abundances with error estimates. Four new models were successfully developed using survey data (1999 to 2013). The optimal model estimates sex specific harvest probability for adult females at 0.148 (SE = 0.017) and all other age-sex cohorts at 0.082 (SE = 0.008) for the most current year 2013. The model estimated a yearly survival rate of 0.528 (SE = 0.008). Total abundance ranged from 5,206,000 woodcock in 2007 to 6,075,800 woodcock in 1999. This study represents the first population estimates of woodcock populations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Secure computation involves multiple parties computing a common function while keeping their inputs private, and is a growing field of cryptography due to its potential for maintaining privacy guarantees in real-world applications. However, current secure computation protocols are not yet efficient enough to be used in practice. We argue that this is due to much of the research effort being focused on generality rather than specificity. Namely, current research tends to focus on constructing and improving protocols for the strongest notions of security or for an arbitrary number of parties. However, in real-world deployments, these security notions are often too strong, or the number of parties running a protocol would be smaller. In this thesis we make several steps towards bridging the efficiency gap of secure computation by focusing on constructing efficient protocols for specific real-world settings and security models. In particular, we make the following four contributions: - We show an efficient (when amortized over multiple runs) maliciously secure two-party secure computation (2PC) protocol in the multiple-execution setting, where the same function is computed multiple times by the same pair of parties. - We improve the efficiency of 2PC protocols in the publicly verifiable covert security model, where a party can cheat with some probability but if it gets caught then the honest party obtains a certificate proving that the given party cheated. - We show how to optimize existing 2PC protocols when the function to be computed includes predicate checks on its inputs. - We demonstrate an efficient maliciously secure protocol in the three-party setting.