875 resultados para OWL web ontology language
Resumo:
The evaluation of ontologies is vital for the growth of the Semantic Web. We consider a number of problems in evaluating a knowledge artifact like an ontology. We propose in this paper that one approach to ontology evaluation should be corpus or data driven. A corpus is the most accessible form of knowledge and its use allows a measure to be derived of the ‘fit’ between an ontology and a domain of knowledge. We consider a number of methods for measuring this ‘fit’ and propose a measure to evaluate structural fit, and a probabilistic approach to identifying the best ontology.
Resumo:
Ontologies have become a key component in the Semantic Web and Knowledge management. One accepted goal is to construct ontologies from a domain specific set of texts. An ontology reflects the background knowledge used in writing and reading a text. However, a text is an act of knowledge maintenance, in that it re-enforces the background assumptions, alters links and associations in the ontology, and adds new concepts. This means that background knowledge is rarely expressed in a machine interpretable manner. When it is, it is usually in the conceptual boundaries of the domain, e.g. in textbooks or when ideas are borrowed into other domains. We argue that a partial solution to this lies in searching external resources such as specialized glossaries and the internet. We show that a random selection of concept pairs from the Gene Ontology do not occur in a relevant corpus of texts from the journal Nature. In contrast, a significant proportion can be found on the internet. Thus, we conclude that sources external to the domain corpus are necessary for the automatic construction of ontologies.
Resumo:
Automatic ontology building is a vital issue in many fields where they are currently built manually. This paper presents a user-centred methodology for ontology construction based on the use of Machine Learning and Natural Language Processing. In our approach, the user selects a corpus of texts and sketches a preliminary ontology (or selects an existing one) for a domain with a preliminary vocabulary associated to the elements in the ontology (lexicalisations). Examples of sentences involving such lexicalisation (e.g. ISA relation) in the corpus are automatically retrieved by the system. Retrieved examples are validated by the user and used by an adaptive Information Extraction system to generate patterns that discover other lexicalisations of the same objects in the ontology, possibly identifying new concepts or relations. New instances are added to the existing ontology or used to tune it. This process is repeated until a satisfactory ontology is obtained. The methodology largely automates the ontology construction process and the output is an ontology with an associated trained leaner to be used for further ontology modifications.
Resumo:
In the context of the needs of the Semantic Web and Knowledge Management, we consider what the requirements are of ontologies. The ontology as an artifact of knowledge representation is in danger of becoming a Chimera. We present a series of facts concerning the foundations on which automated ontology construction must build. We discuss a number of different functions that an ontology seeks to fulfill, and also a wish list of ideal functions. Our objective is to stimulate discussion as to the real requirements of ontology engineering and take the view that only a selective and restricted set of requirements will enable the beast to fly.
Resumo:
The fundamental failure of current approaches to ontology learning is to view it as single pipeline with one or more specific inputs and a single static output. In this paper, we present a novel approach to ontology learning which takes an iterative view of knowledge acquisition for ontologies. Our approach is founded on three open-ended resources: a set of texts, a set of learning patterns and a set of ontological triples, and the system seeks to maintain these in equilibrium. As events occur which disturb this equilibrium, actions are triggered to re-establish a balance between the resources. We present a gold standard based evaluation of the final output of the system, the intermediate output showing the iterative process and a comparison of performance using different seed input. The results are comparable to existing performance in the literature.
Resumo:
INTAMAP is a web processing service for the automatic interpolation of measured point data. Requirements were (i) using open standards for spatial data such as developed in the context of the open geospatial consortium (OGC), (ii) using a suitable environment for statistical modelling and computation, and (iii) producing an open source solution. The system couples the 52-North web processing service, accepting data in the form of an observations and measurements (O&M) document with a computing back-end realized in the R statistical environment. The probability distribution of interpolation errors is encoded with UncertML, a new markup language to encode uncertain data. Automatic interpolation needs to be useful for a wide range of applications and the algorithms have been designed to cope with anisotropies and extreme values. In the light of the INTAMAP experience, we discuss the lessons learnt.
Resumo:
INTAMAP is a Web Processing Service for the automatic spatial interpolation of measured point data. Requirements were (i) using open standards for spatial data such as developed in the context of the Open Geospatial Consortium (OGC), (ii) using a suitable environment for statistical modelling and computation, and (iii) producing an integrated, open source solution. The system couples an open-source Web Processing Service (developed by 52°North), accepting data in the form of standardised XML documents (conforming to the OGC Observations and Measurements standard) with a computing back-end realised in the R statistical environment. The probability distribution of interpolation errors is encoded with UncertML, a markup language designed to encode uncertain data. Automatic interpolation needs to be useful for a wide range of applications and the algorithms have been designed to cope with anisotropy, extreme values, and data with known error distributions. Besides a fully automatic mode, the system can be used with different levels of user control over the interpolation process.
Resumo:
OBJECTIVES: The objective of this research was to design a clinical decision support system (CDSS) that supports heterogeneous clinical decision problems and runs on multiple computing platforms. Meeting this objective required a novel design to create an extendable and easy to maintain clinical CDSS for point of care support. The proposed solution was evaluated in a proof of concept implementation. METHODS: Based on our earlier research with the design of a mobile CDSS for emergency triage we used ontology-driven design to represent essential components of a CDSS. Models of clinical decision problems were derived from the ontology and they were processed into executable applications during runtime. This allowed scaling applications' functionality to the capabilities of computing platforms. A prototype of the system was implemented using the extended client-server architecture and Web services to distribute the functions of the system and to make it operational in limited connectivity conditions. RESULTS: The proposed design provided a common framework that facilitated development of diversified clinical applications running seamlessly on a variety of computing platforms. It was prototyped for two clinical decision problems and settings (triage of acute pain in the emergency department and postoperative management of radical prostatectomy on the hospital ward) and implemented on two computing platforms-desktop and handheld computers. CONCLUSIONS: The requirement of the CDSS heterogeneity was satisfied with ontology-driven design. Processing of application models described with the help of ontological models allowed having a complex system running on multiple computing platforms with different capabilities. Finally, separation of models and runtime components contributed to improved extensibility and maintainability of the system.
Resumo:
Models are central tools for modern scientists and decision makers, and there are many existing frameworks to support their creation, execution and composition. Many frameworks are based on proprietary interfaces, and do not lend themselves to the integration of models from diverse disciplines. Web based systems, or systems based on web services, such as Taverna and Kepler, allow composition of models based on standard web service technologies. At the same time the Open Geospatial Consortium has been developing their own service stack, which includes the Web Processing Service, designed to facilitate the executing of geospatial processing - including complex environmental models. The current Open Geospatial Consortium service stack employs Extensible Markup Language as a default data exchange standard, and widely-used encodings such as JavaScript Object Notation can often only be used when incorporated with Extensible Markup Language. Similarly, no successful engagement of the Web Processing Service standard with the well-supported technologies of Simple Object Access Protocol and Web Services Description Language has been seen. In this paper we propose a pure Simple Object Access Protocol/Web Services Description Language processing service which addresses some of the issues with the Web Processing Service specication and brings us closer to achieving a degree of interoperability between geospatial models, and thus realising the vision of a useful 'model web'.
Resumo:
This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.
Resumo:
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
Resumo:
With the recent rapid growth of the Semantic Web (SW), the processes of searching and querying content that is both massive in scale and heterogeneous have become increasingly challenging. User-friendly interfaces, which can support end users in querying and exploring this novel and diverse, structured information space, are needed to make the vision of the SW a reality. We present a survey on ontology-based Question Answering (QA), which has emerged in recent years to exploit the opportunities offered by structured semantic information on the Web. First, we provide a comprehensive perspective by analyzing the general background and history of the QA research field, from influential works from the artificial intelligence and database communities developed in the 70s and later decades, through open domain QA stimulated by the QA track in TREC since 1999, to the latest commercial semantic QA solutions, before tacking the current state of the art in open user-friendly interfaces for the SW. Second, we examine the potential of this technology to go beyond the current state of the art to support end-users in reusing and querying the SW content. We conclude our review with an outlook for this novel research area, focusing in particular on the R&D directions that need to be pursued to realize the goal of efficient and competent retrieval and integration of answers from large scale, heterogeneous, and continuously evolving semantic sources.
Resumo:
The Semantic Web relies on carefully structured, well defined, data to allow machines to communicate and understand one another. In many domains (e.g. geospatial) the data being described contains some uncertainty, often due to incomplete knowledge; meaningful processing of this data requires these uncertainties to be carefully analysed and integrated into the process chain. Currently, within the SemanticWeb there is no standard mechanism for interoperable description and exchange of uncertain information, which renders the automated processing of such information implausible, particularly where error must be considered and captured as it propagates through a processing sequence. In particular we adopt a Bayesian perspective and focus on the case where the inputs / outputs are naturally treated as random variables. This paper discusses a solution to the problem in the form of the Uncertainty Markup Language (UncertML). UncertML is a conceptual model, realised as an XML schema, that allows uncertainty to be quantified in a variety of ways i.e. realisations, statistics and probability distributions. UncertML is based upon a soft-typed XML schema design that provides a generic framework from which any statistic or distribution may be created. Making extensive use of Geography Markup Language (GML) dictionaries, UncertML provides a collection of definitions for common uncertainty types. Containing both written descriptions and mathematical functions, encoded as MathML, the definitions within these dictionaries provide a robust mechanism for defining any statistic or distribution and can be easily extended. Universal Resource Identifiers (URIs) are used to introduce semantics to the soft-typed elements by linking to these dictionary definitions. The INTAMAP (INTeroperability and Automated MAPping) project provides a use case for UncertML. This paper demonstrates how observation errors can be quantified using UncertML and wrapped within an Observations & Measurements (O&M) Observation. The interpolation service uses the information within these observations to influence the prediction outcome. The output uncertainties may be encoded in a variety of UncertML types, e.g. a series of marginal Gaussian distributions, a set of statistics, such as the first three marginal moments, or a set of realisations from a Monte Carlo treatment. Quantifying and propagating uncertainty in this way allows such interpolation results to be consumed by other services. This could form part of a risk management chain or a decision support system, and ultimately paves the way for complex data processing chains in the Semantic Web.
Resumo:
Despite years of effort in building organisational taxonomies, the potential of ontologies to support knowledge management in complex technical domains is under-exploited. The authors of this chapter present an approach to using rich domain ontologies to support sense-making tasks associated with resolving mechanical issues. Using Semantic Web technologies, the authors have built a framework and a suite of tools which support the whole semantic knowledge lifecycle. These are presented by describing the process of issue resolution for a simulated investigation concerning failure of bicycle brakes. Foci of the work have included ensuring that semantic tasks fit in with users’ everyday tasks, to achieve user acceptability and support the flexibility required by communities of practice with differing local sub-domains, tasks, and terminology.
Resumo:
Linked Data semantic sources, in particular DBpedia, can be used to answer many user queries. PowerAqua is an open multi-ontology Question Answering (QA) system for the Semantic Web (SW). However, the emergence of Linked Data, characterized by its openness, heterogeneity and scale, introduces a new dimension to the Semantic Web scenario, in which exploiting the relevant information to extract answers for Natural Language (NL) user queries is a major challenge. In this paper we discuss the issues and lessons learned from our experience of integrating PowerAqua as a front-end for DBpedia and a subset of Linked Data sources. As such, we go one step beyond the state of the art on end-users interfaces for Linked Data by introducing mapping and fusion techniques needed to translate a user query by means of multiple sources. Our first informal experiments probe whether, in fact, it is feasible to obtain answers to user queries by composing information across semantic sources and Linked Data, even in its current form, where the strength of Linked Data is more a by-product of its size than its quality. We believe our experiences can be extrapolated to a variety of end-user applications that wish to scale, open up, exploit and re-use what possibly is the greatest wealth of data about everything in the history of Artificial Intelligence. © 2010 Springer-Verlag.