7 resultados para Logic and Probabilistic Models

em DRUM (Digital Repository at the University of Maryland)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Secure computation involves multiple parties computing a common function while keeping their inputs private, and is a growing field of cryptography due to its potential for maintaining privacy guarantees in real-world applications. However, current secure computation protocols are not yet efficient enough to be used in practice. We argue that this is due to much of the research effort being focused on generality rather than specificity. Namely, current research tends to focus on constructing and improving protocols for the strongest notions of security or for an arbitrary number of parties. However, in real-world deployments, these security notions are often too strong, or the number of parties running a protocol would be smaller. In this thesis we make several steps towards bridging the efficiency gap of secure computation by focusing on constructing efficient protocols for specific real-world settings and security models. In particular, we make the following four contributions: - We show an efficient (when amortized over multiple runs) maliciously secure two-party secure computation (2PC) protocol in the multiple-execution setting, where the same function is computed multiple times by the same pair of parties. - We improve the efficiency of 2PC protocols in the publicly verifiable covert security model, where a party can cheat with some probability but if it gets caught then the honest party obtains a certificate proving that the given party cheated. - We show how to optimize existing 2PC protocols when the function to be computed includes predicate checks on its inputs. - We demonstrate an efficient maliciously secure protocol in the three-party setting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite the extensive implementation of Superstreets on congested arterials, reliable methodologies for such designs remain unavailable. The purpose of this research is to fill the information gap by offering reliable tools to assist traffic professionals in the design of Superstreets with and without signal control. The entire tool developed in this thesis consists of three models. The first model is used to determine the minimum U-turn offset length for an Un-signalized Superstreet, given the arterial headway distribution of the traffic flows and the distribution of critical gaps among drivers. The second model is designed to estimate the queue size and its variation on each critical link in a signalized Superstreet, based on the given signal plan and the range of observed volumes. Recognizing that the operational performance of a Superstreet cannot be achieved without an effective signal plan, the third model is developed to produce a signal optimization method that can generate progression offsets for heavy arterial flows moving into and out of such an intersection design.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose three research problems to explore the relations between trust and security in the setting of distributed computation. In the first problem, we study trust-based adversary detection in distributed consensus computation. The adversaries we consider behave arbitrarily disobeying the consensus protocol. We propose a trust-based consensus algorithm with local and global trust evaluations. The algorithm can be abstracted using a two-layer structure with the top layer running a trust-based consensus algorithm and the bottom layer as a subroutine executing a global trust update scheme. We utilize a set of pre-trusted nodes, headers, to propagate local trust opinions throughout the network. This two-layer framework is flexible in that it can be easily extensible to contain more complicated decision rules, and global trust schemes. The first problem assumes that normal nodes are homogeneous, i.e. it is guaranteed that a normal node always behaves as it is programmed. In the second and third problems however, we assume that nodes are heterogeneous, i.e, given a task, the probability that a node generates a correct answer varies from node to node. The adversaries considered in these two problems are workers from the open crowd who are either investing little efforts in the tasks assigned to them or intentionally give wrong answers to questions. In the second part of the thesis, we consider a typical crowdsourcing task that aggregates input from multiple workers as a problem in information fusion. To cope with the issue of noisy and sometimes malicious input from workers, trust is used to model workers' expertise. In a multi-domain knowledge learning task, however, using scalar-valued trust to model a worker's performance is not sufficient to reflect the worker's trustworthiness in each of the domains. To address this issue, we propose a probabilistic model to jointly infer multi-dimensional trust of workers, multi-domain properties of questions, and true labels of questions. Our model is very flexible and extensible to incorporate metadata associated with questions. To show that, we further propose two extended models, one of which handles input tasks with real-valued features and the other handles tasks with text features by incorporating topic models. Our models can effectively recover trust vectors of workers, which can be very useful in task assignment adaptive to workers' trust in the future. These results can be applied for fusion of information from multiple data sources like sensors, human input, machine learning results, or a hybrid of them. In the second subproblem, we address crowdsourcing with adversaries under logical constraints. We observe that questions are often not independent in real life applications. Instead, there are logical relations between them. Similarly, workers that provide answers are not independent of each other either. Answers given by workers with similar attributes tend to be correlated. Therefore, we propose a novel unified graphical model consisting of two layers. The top layer encodes domain knowledge which allows users to express logical relations using first-order logic rules and the bottom layer encodes a traditional crowdsourcing graphical model. Our model can be seen as a generalized probabilistic soft logic framework that encodes both logical relations and probabilistic dependencies. To solve the collective inference problem efficiently, we have devised a scalable joint inference algorithm based on the alternating direction method of multipliers. The third part of the thesis considers the problem of optimal assignment under budget constraints when workers are unreliable and sometimes malicious. In a real crowdsourcing market, each answer obtained from a worker incurs cost. The cost is associated with both the level of trustworthiness of workers and the difficulty of tasks. Typically, access to expert-level (more trustworthy) workers is more expensive than to average crowd and completion of a challenging task is more costly than a click-away question. In this problem, we address the problem of optimal assignment of heterogeneous tasks to workers of varying trust levels with budget constraints. Specifically, we design a trust-aware task allocation algorithm that takes as inputs the estimated trust of workers and pre-set budget, and outputs the optimal assignment of tasks to workers. We derive the bound of total error probability that relates to budget, trustworthiness of crowds, and costs of obtaining labels from crowds naturally. Higher budget, more trustworthy crowds, and less costly jobs result in a lower theoretical bound. Our allocation scheme does not depend on the specific design of the trust evaluation component. Therefore, it can be combined with generic trust evaluation algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Leafy greens are essential part of a healthy diet. Because of their health benefits, production and consumption of leafy greens has increased considerably in the U.S. in the last few decades. However, leafy greens are also associated with a large number of foodborne disease outbreaks in the last few years. The overall goal of this dissertation was to use the current knowledge of predictive models and available data to understand the growth, survival, and death of enteric pathogens in leafy greens at pre- and post-harvest levels. Temperature plays a major role in the growth and death of bacteria in foods. A growth-death model was developed for Salmonella and Listeria monocytogenes in leafy greens for varying temperature conditions typically encountered during supply chain. The developed growth-death models were validated using experimental dynamic time-temperature profiles available in the literature. Furthermore, these growth-death models for Salmonella and Listeria monocytogenes and a similar model for E. coli O157:H7 were used to predict the growth of these pathogens in leafy greens during transportation without temperature control. Refrigeration of leafy greens meets the purposes of increasing their shelf-life and mitigating the bacterial growth, but at the same time, storage of foods at lower temperature increases the storage cost. Nonlinear programming was used to optimize the storage temperature of leafy greens during supply chain while minimizing the storage cost and maintaining the desired levels of sensory quality and microbial safety. Most of the outbreaks associated with consumption of leafy greens contaminated with E. coli O157:H7 have occurred during July-November in the U.S. A dynamic system model consisting of subsystems and inputs (soil, irrigation, cattle, wildlife, and rainfall) simulating a farm in a major leafy greens producing area in California was developed. The model was simulated incorporating the events of planting, irrigation, harvesting, ground preparation for the new crop, contamination of soil and plants, and survival of E. coli O157:H7. The predictions of this system model are in agreement with the seasonality of outbreaks. This dissertation utilized the growth, survival, and death models of enteric pathogens in leafy greens during production and supply chain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Energy Conservation Measure (ECM) project selection is made difficult given real-world constraints, limited resources to implement savings retrofits, various suppliers in the market and project financing alternatives. Many of these energy efficient retrofit projects should be viewed as a series of investments with annual returns for these traditionally risk-averse agencies. Given a list of ECMs available, federal, state and local agencies must determine how to implement projects at lowest costs. The most common methods of implementation planning are suboptimal relative to cost. Federal, state and local agencies can obtain greater returns on their energy conservation investment over traditional methods, regardless of the implementing organization. This dissertation outlines several approaches to improve the traditional energy conservations models. Any public buildings in regions with similar energy conservation goals in the United States or internationally can also benefit greatly from this research. Additionally, many private owners of buildings are under mandates to conserve energy e.g., Local Law 85 of the New York City Energy Conservation Code requires any building, public or private, to meet the most current energy code for any alteration or renovation. Thus, both public and private stakeholders can benefit from this research. The research in this dissertation advances and presents models that decision-makers can use to optimize the selection of ECM projects with respect to the total cost of implementation. A practical application of a two-level mathematical program with equilibrium constraints (MPEC) improves the current best practice for agencies concerned with making the most cost-effective selection leveraging energy services companies or utilities. The two-level model maximizes savings to the agency and profit to the energy services companies (Chapter 2). An additional model presented leverages a single congressional appropriation to implement ECM projects (Chapter 3). Returns from implemented ECM projects are used to fund additional ECM projects. In these cases, fluctuations in energy costs and uncertainty in the estimated savings severely influence ECM project selection and the amount of the appropriation requested. A risk aversion method proposed imposes a minimum on the number of “of projects completed in each stage. A comparative method using Conditional Value at Risk is analyzed. Time consistency was addressed in this chapter. This work demonstrates how a risk-based, stochastic, multi-stage model with binary decision variables at each stage provides a much more accurate estimate for planning than the agency’s traditional approach and deterministic models. Finally, in Chapter 4, a rolling-horizon model allows for subadditivity and superadditivity of the energy savings to simulate interactive effects between ECM projects. The approach makes use of inequalities (McCormick, 1976) to re-express constraints that involve the product of binary variables with an exact linearization (related to the convex hull of those constraints). This model additionally shows the benefits of learning between stages while remaining consistent with the single congressional appropriations framework.