992 resultados para Knowledge Bases
Resumo:
This thesis addressed the problem of risk analysis in mental healthcare, with respect to the GRiST project at Aston University. That project provides a risk-screening tool based on the knowledge of 46 experts, captured as mind maps that describe relationships between risks and patterns of behavioural cues. Mind mapping, though, fails to impose control over content, and is not considered to formally represent knowledge. In contrast, this thesis treated GRiSTs mind maps as a rich knowledge base in need of refinement; that process drew on existing techniques for designing databases and knowledge bases. Identifying well-defined mind map concepts, though, was hindered by spelling mistakes, and by ambiguity and lack of coverage in the tools used for researching words. A novel use of the Edit Distance overcame those problems, by assessing similarities between mind map texts, and between spelling mistakes and suggested corrections. That algorithm further identified stems, the shortest text string found in related word-forms. As opposed to existing approaches’ reliance on built-in linguistic knowledge, this thesis devised a novel, more flexible text-based technique. An additional tool, Correspondence Analysis, found patterns in word usage that allowed machines to determine likely intended meanings for ambiguous words. Correspondence Analysis further produced clusters of related concepts, which in turn drove the automatic generation of novel mind maps. Such maps underpinned adjuncts to the mind mapping software used by GRiST; one such new facility generated novel mind maps, to reflect the collected expert knowledge on any specified concept. Mind maps from GRiST are stored as XML, which suggested storing them in an XML database. In fact, the entire approach here is ”XML-centric”, in that all stages rely on XML as far as possible. A XML-based query language allows user to retrieve information from the mind map knowledge base. The approach, it was concluded, will prove valuable to mind mapping in general, and to detecting patterns in any type of digital information.
Resumo:
Document detialing plans to develop the Medical Library's knowledge base collection. Provides an overview of databases and knowledge bases, as well as recommended databases.
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.
Resumo:
We examine the representation of judgements of stochastic independence in probabilistic logics. We focus on a relational logic where (i) judgements of stochastic independence are encoded by directed acyclic graphs, and (ii) probabilistic assessments are flexible in the sense that they are not required to specify a single probability measure. We discuss issues of knowledge representation and inference that arise from our particular combination of graphs, stochastic independence, logical formulas and probabilistic assessments. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
In recent years emerged several initiatives promoted by educational organizations to adapt Service Oriented Architectures (SOA) to e-learning. These initiatives commonly named eLearning Frameworks share a common goal: to create flexible learning environments by integrating heterogeneous systems already available in many educational institutions. However, these frameworks were designed for integration of systems participating in business like processes rather than on complex pedagogical processes as those related to automatic evaluation. Consequently, their knowledge bases lack some fundamental components that are needed to model pedagogical processes. The objective of the research described in this paper is to study the applicability of eLearning frameworks for modelling a network of heterogeneous eLearning systems, using the automatic evaluation of programming exercises as a case study. The paper surveys the existing eLearning frameworks to justify the selection of the e-Framework. This framework is described in detail and identified the necessary components missing from its knowledge base, more precisely, a service genre, expression and usage model for an evaluation service. The extensibility of the framework is tested with the definition of this service. A concrete model for evaluation of programming exercises is presented as a validation of the proposed approach.
Resumo:
Dissertação para obtenção do Grau de Doutor em Informática
Resumo:
Context-aware recommendation of personalised tourism resources is possible because of personal mobile devices and powerful data filtering algorithms. The devices contribute with computing capabilities, on board sensors, ubiquitous Internet access and continuous user monitoring, whereas the filtering algorithms provide the ability to match the profile (interests and the context) of the tourist against a large knowledge bases of tourism resources. While, in terms of technology, personal mobile devices can gather user-related information, including the user context and access multiple data sources, the creation and maintenance of an updated knowledge base of tourism-related resources requires a collaborative approach due to the heterogeneity, volume and dynamic nature of the resources. The current PhD thesis aims to contribute to the solution of this problem by adopting a Crowdsourcing approach for the collaborative maintenance of the knowledge base of resources, Trust and Reputation for the validation of uploaded resources as well as publishers, Big Data for user profiling and context-aware filtering algorithms for the personalised recommendation of tourism resources.
Resumo:
Ontologies formalized by means of Description Logics (DLs) and rules in the form of Logic Programs (LPs) are two prominent formalisms in the field of Knowledge Representation and Reasoning. While DLs adhere to the OpenWorld Assumption and are suited for taxonomic reasoning, LPs implement reasoning under the Closed World Assumption, so that default knowledge can be expressed. However, for many applications it is useful to have a means that allows reasoning over an open domain and expressing rules with exceptions at the same time. Hybrid MKNF knowledge bases make such a means available by formalizing DLs and LPs in a common logic, the Logic of Minimal Knowledge and Negation as Failure (MKNF). Since rules and ontologies are used in open environments such as the Semantic Web, inconsistencies cannot always be avoided. This poses a problem due to the Principle of Explosion, which holds in classical logics. Paraconsistent Logics offer a solution to this issue by assigning meaningful models even to contradictory sets of formulas. Consequently, paraconsistent semantics for DLs and LPs have been investigated intensively. Our goal is to apply the paraconsistent approach to the combination of DLs and LPs in hybrid MKNF knowledge bases. In this thesis, a new six-valued semantics for hybrid MKNF knowledge bases is introduced, extending the three-valued approach by Knorr et al., which is based on the wellfounded semantics for logic programs. Additionally, a procedural way of computing paraconsistent well-founded models for hybrid MKNF knowledge bases by means of an alternating fixpoint construction is presented and it is proven that the algorithm is sound and complete w.r.t. the model-theoretic characterization of the semantics. Moreover, it is shown that the new semantics is faithful w.r.t. well-studied paraconsistent semantics for DLs and LPs, respectively, and maintains the efficiency of the approach it extends.
Resumo:
Most of today’s systems, especially when related to the Web or to multi-agent systems, are not standalone or independent, but are part of a greater ecosystem, where they need to interact with other entities, react to complex changes in the environment, and act both over its own knowledge base and on the external environment itself. Moreover, these systems are clearly not static, but are constantly evolving due to the execution of self updates or external actions. Whenever actions and updates are possible, the need to ensure properties regarding the outcome of performing such actions emerges. Originally purposed in the context of databases, transactions solve this problem by guaranteeing atomicity, consistency, isolation and durability of a special set of actions. However, current transaction solutions fail to guarantee such properties in dynamic environments, since they cannot combine transaction execution with reactive features, or with the execution of actions over domains that the system does not completely control (thus making rolling back a non-viable proposition). In this thesis, we investigate what and how transaction properties can be ensured over these dynamic environments. To achieve this goal, we provide logic-based solutions, based on Transaction Logic, to precisely model and execute transactions in such environments, and where knowledge bases can be defined by arbitrary logic theories.
Resumo:
The vast territories that have been radioactively contaminated during the 1986 Chernobyl accident provide a substantial data set of radioactive monitoring data, which can be used for the verification and testing of the different spatial estimation (prediction) methods involved in risk assessment studies. Using the Chernobyl data set for such a purpose is motivated by its heterogeneous spatial structure (the data are characterized by large-scale correlations, short-scale variability, spotty features, etc.). The present work is concerned with the application of the Bayesian Maximum Entropy (BME) method to estimate the extent and the magnitude of the radioactive soil contamination by 137Cs due to the Chernobyl fallout. The powerful BME method allows rigorous incorporation of a wide variety of knowledge bases into the spatial estimation procedure leading to informative contamination maps. Exact measurements (?hard? data) are combined with secondary information on local uncertainties (treated as ?soft? data) to generate science-based uncertainty assessment of soil contamination estimates at unsampled locations. BME describes uncertainty in terms of the posterior probability distributions generated across space, whereas no assumption about the underlying distribution is made and non-linear estimators are automatically incorporated. Traditional estimation variances based on the assumption of an underlying Gaussian distribution (analogous, e.g., to the kriging variance) can be derived as a special case of the BME uncertainty analysis. The BME estimates obtained using hard and soft data are compared with the BME estimates obtained using only hard data. The comparison involves both the accuracy of the estimation maps using the exact data and the assessment of the associated uncertainty using repeated measurements. Furthermore, a comparison of the spatial estimation accuracy obtained by the two methods was carried out using a validation data set of hard data. Finally, a separate uncertainty analysis was conducted that evaluated the ability of the posterior probabilities to reproduce the distribution of the raw repeated measurements available in certain populated sites. The analysis provides an illustration of the improvement in mapping accuracy obtained by adding soft data to the existing hard data and, in general, demonstrates that the BME method performs well both in terms of estimation accuracy as well as in terms estimation error assessment, which are both useful features for the Chernobyl fallout study.
Resumo:
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.
Resumo:
BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
Resumo:
Réalisé en cotutelle avec l'Université Joseph Fourier École Doctorale Ingénierie pour la Santé,la Cognition et l'Environnement (France)
Resumo:
The motivation for this thesis work is the need for improving reliability of equipment and quality of service to railway passengers as well as a requirement for cost-effective and efficient condition maintenance management for rail transportation. This thesis work develops a fusion of various machine vision analysis methods to achieve high performance in automation of wooden rail track inspection.The condition monitoring in rail transport is done manually by a human operator where people rely on inference systems and assumptions to develop conclusions. The use of conditional monitoring allows maintenance to be scheduled, or other actions to be taken to avoid the consequences of failure, before the failure occurs. Manual or automated condition monitoring of materials in fields of public transportation like railway, aerial navigation, traffic safety, etc, where safety is of prior importance needs non-destructive testing (NDT).In general, wooden railway sleeper inspection is done manually by a human operator, by moving along the rail sleeper and gathering information by visual and sound analysis for examining the presence of cracks. Human inspectors working on lines visually inspect wooden rails to judge the quality of rail sleeper. In this project work the machine vision system is developed based on the manual visual analysis system, which uses digital cameras and image processing software to perform similar manual inspections. As the manual inspection requires much effort and is expected to be error prone sometimes and also appears difficult to discriminate even for a human operator by the frequent changes in inspected material. The machine vision system developed classifies the condition of material by examining individual pixels of images, processing them and attempting to develop conclusions with the assistance of knowledge bases and features.A pattern recognition approach is developed based on the methodological knowledge from manual procedure. The pattern recognition approach for this thesis work was developed and achieved by a non destructive testing method to identify the flaws in manually done condition monitoring of sleepers.In this method, a test vehicle is designed to capture sleeper images similar to visual inspection by human operator and the raw data for pattern recognition approach is provided from the captured images of the wooden sleepers. The data from the NDT method were further processed and appropriate features were extracted.The collection of data by the NDT method is to achieve high accuracy in reliable classification results. A key idea is to use the non supervised classifier based on the features extracted from the method to discriminate the condition of wooden sleepers in to either good or bad. Self organising map is used as classifier for the wooden sleeper classification.In order to achieve greater integration, the data collected by the machine vision system was made to interface with one another by a strategy called fusion. Data fusion was looked in at two different levels namely sensor-level fusion, feature- level fusion. As the goal was to reduce the accuracy of the human error on the rail sleeper classification as good or bad the results obtained by the feature-level fusion compared to that of the results of actual classification were satisfactory.
Resumo:
The development of new technologies that use peer-to-peer networks grows every day, with the object to supply the need of sharing information, resources and services of databases around the world. Among them are the peer-to-peer databases that take advantage of peer-to-peer networks to manage distributed knowledge bases, allowing the sharing of information semantically related but syntactically heterogeneous. However, it is a challenge to ensure the efficient search for information without compromising the autonomy of each node and network flexibility, given the structural characteristics of these networks. On the other hand, some studies propose the use of ontology semantics by assigning standardized categorization of information. The main original contribution of this work is the approach of this problem with a proposal for optimization of queries supported by the Ant Colony algorithm and classification though ontologies. The results show that this strategy enables the semantic support to the searches in peer-to-peer databases, aiming to expand the results without compromising network performance. © 2011 IEEE.