955 resultados para Temporal Information Extraction
Resumo:
Los métodos para Extracción de Información basados en la Supervisión a Distancia se basan en usar tuplas correctas para adquirir menciones de esas tuplas, y así entrenar un sistema tradicional de extracción de información supervisado. En este artículo analizamos las fuentes de ruido en las menciones, y exploramos métodos sencillos para filtrar menciones ruidosas. Los resultados demuestran que combinando el filtrado de tuplas por frecuencia, la información mutua y la eliminación de menciones lejos de los centroides de sus respectivas etiquetas mejora los resultados de dos modelos de extracción de información significativamente.
Resumo:
In Computer Science world several proposals have been developed for the assessment of the quality of the digital objects, based on the capabilities and facilities offered by current technologies and the available resources. Years ago researchers and specialists from both educational and technological areas have been committed to the development of strategies that improve the quality of education. At present, in the field of teaching-learning, another important aspect is the need to improve the manner of gaining knowledge and learning in education, which the use of learning strategies is a major advance in the teaching-learning process in institutions of higher education. This paper presents QEES, a proposal for evaluating the quality of the learning objects employed on learning strategies to support students during their education processes by using information extraction techniques and ontologies.
Resumo:
Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.
Resumo:
Presentamos una herramienta basada en coocurrencias de fármaco-efecto para la detección de reacciones adversas e indicaciones en comentarios de usuarios procedentes de un foro médico en español. Además, se describe la construcción automática de la primera base de datos en español sobre indicaciones y efectos adversos de fármacos.
Resumo:
In this paper, a novel approach for exploiting multitemporal remote sensing data focused on real-time monitoring of agricultural crops is presented. The methodology is defined in a dynamical system context using state-space techniques, which enables the possibility of merging past temporal information with an update for each new acquisition. The dynamic system context allows us to exploit classical tools in this domain to perform the estimation of relevant variables. A general methodology is proposed, and a particular instance is defined in this study based on polarimetric radar data to track the phenological stages of a set of crops. A model generation from empirical data through principal component analysis is presented, and an extended Kalman filter is adapted to perform phenological stage estimation. Results employing quad-pol Radarsat-2 data over three different cereals are analyzed. The potential of this methodology to retrieve vegetation variables in real time is shown.
Resumo:
Evolutionary change results from selection acting on genetic variation. For migration to be successful, many different aspects of an animal's physiology and behaviour need to function in a co-coordinated way. Changes in one migratory trait are therefore likely to be accompanied by changes in other migratory and life-history traits. At present, we have some knowledge of the pressures that operate at the various stages of migration, but we know very little about the extent of genetic variation in various aspects of the migratory syndrome. As a consequence, our ability to predict which species is capable of what kind of evolutionary change, and at which rate, is limited. Here, we review how our evolutionary understanding of migration may benefit from taking a quantitative-genetic approach and present a framework for studying the causes of phenotypic variation. We review past research, that has mainly studied single migratory traits in captive birds, and discuss how this work could be extended to study genetic variation in the wild and to account for genetic correlations and correlated selection. In the future, reaction-norm approaches may become very important, as they allow the study of genetic and environmental effects on phenotypic expression within a single framework, as well as of their interactions. We advocate making more use of repeated measurements on single individuals to study the causes of among-individual variation in the wild, as they are easier to obtain than data on relatives and can provide valuable information for identifying and selecting traits. This approach will be particularly informative if it involves systematic testing of individuals under different environmental conditions. We propose extending this research agenda by using optimality models to predict levels of variation and covariation among traits and constraints. This may help us to select traits in which we might expect genetic variation, and to identify the most informative environmental axes. We also recommend an expansion of the passerine model, as this model does not apply to birds, like geese, where cultural transmission of spatio-temporal information is an important determinant of migration patterns and their variation.
Resumo:
The Leximancer system is a relatively new method for transforming lexical co-occurrence information from natural language into semantic patterns in an unsupervised manner. It employs two stages of co-occurrence information extraction-semantic and relational-using a different algorithm for each stage. The algorithms used are statistical, but they employ nonlinear dynamics and machine learning. This article is an attempt to validate the output of Leximancer, using a set of evaluation criteria taken from content analysis that are appropriate for knowledge discovery tasks.
Resumo:
Government agencies responsible for riparian environments are assessing the combined utility of field survey and remote sensing for mapping and monitoring indicators of riparian zone health. The objective of this work was to determine if the structural attributes of savanna riparian zones in northern Australia can be detected from commercially available remotely sensed image data. Two QuickBird images and coincident field data covering sections of the Daly River and the South Alligator River - Barramundie Creek in the Northern Territory were used. Semi-variograms were calculated to determine the characteristic spatial scales of riparian zone features, both vegetative and landform. Interpretation of semi-variograms showed that structural dimensions of riparian environments could be detected and estimated from the QuickBird image data. The results also show that selecting the correct spatial resolution and spectral bands is essential to maximize the accuracy of mapping spatial characteristics of savanna riparian features. The distribution of foliage projective cover of riparian vegetation affected spectral reflectance variations in individual spectral bands differently. Pan-sharpened image data enabled small-scale information extraction (< 6 m) on riparian zone structural parameters. The semi-variogram analysis results provide the basis for an inversion approach using high spatial resolution satellite image data to map indicators of savanna riparian zone health.
Resumo:
Automatic ontology building is a vital issue in many fields where they are currently built manually. This paper presents a user-centred methodology for ontology construction based on the use of Machine Learning and Natural Language Processing. In our approach, the user selects a corpus of texts and sketches a preliminary ontology (or selects an existing one) for a domain with a preliminary vocabulary associated to the elements in the ontology (lexicalisations). Examples of sentences involving such lexicalisation (e.g. ISA relation) in the corpus are automatically retrieved by the system. Retrieved examples are validated by the user and used by an adaptive Information Extraction system to generate patterns that discover other lexicalisations of the same objects in the ontology, possibly identifying new concepts or relations. New instances are added to the existing ontology or used to tune it. This process is repeated until a satisfactory ontology is obtained. The methodology largely automates the ontology construction process and the output is an ontology with an associated trained leaner to be used for further ontology modifications.
Resumo:
With this paper, we propose a set of techniques to largely automate the process of KA, by using technologies based on Information Extraction (IE) , Information Retrieval and Natural Language Processing. We aim to reduce all the impeding factors mention above and thereby contribute to the wider utility of the knowledge management tools. In particular we intend to reduce the introspection of knowledge engineers or the extended elicitations of knowledge from experts by extensive textual analysis using a variety of methods and tools, as texts are largely available and in them - we believe - lies most of an organization's memory.
Resumo:
The main argument of this paper is that Natural Language Processing (NLP) does, and will continue to, underlie the Semantic Web (SW), including its initial construction from unstructured sources like the World Wide Web (WWW), whether its advocates realise this or not. Chiefly, we argue, such NLP activity is the only way up to a defensible notion of meaning at conceptual levels (in the original SW diagram) based on lower level empirical computations over usage. Our aim is definitely not to claim logic-bad, NLP-good in any simple-minded way, but to argue that the SW will be a fascinating interaction of these two methodologies, again like the WWW (which has been basically a field for statistical NLP research) but with deeper content. Only NLP technologies (and chiefly information extraction) will be able to provide the requisite RDF knowledge stores for the SW from existing unstructured text databases in the WWW, and in the vast quantities needed. There is no alternative at this point, since a wholly or mostly hand-crafted SW is also unthinkable, as is a SW built from scratch and without reference to the WWW. We also assume that, whatever the limitations on current SW representational power we have drawn attention to here, the SW will continue to grow in a distributed manner so as to serve the needs of scientists, even if it is not perfect. The WWW has already shown how an imperfect artefact can become indispensable.
Resumo:
Automatic Term Recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach using a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algorithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recognition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.
Resumo:
Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence extends to many areas and includes contributions to Machines Translation, word sense disambiguation, dialogue modeling and Information Extraction. This book celebrates the work of Yorick Wilks in the form of a selection of his papers which are intended to reflect the range and depth of his work. The volume accompanies a Festschrift which celebrates his contribution to the fields of Computational Linguistics and Artificial Intelligence. The papers include early work carried out at Cambridge University, descriptions of groundbreaking work on Machine Translation and Preference Semantics as well as more recent works on belief modeling and computational semantics. The selected papers reflect Yorick’s contribution to both practical and theoretical aspects of automatic language processing.
Resumo:
Attractor properties of a popular discrete-time neural network model are illustrated through numerical simulations. The most complex dynamics is found to occur within particular ranges of parameters controlling the symmetry and magnitude of the weight matrix. A small network model is observed to produce fixed points, limit cycles, mode-locking, the Ruelle-Takens route to chaos, and the period-doubling route to chaos. Training algorithms for tuning this dynamical behaviour are discussed. Training can be an easy or difficult task, depending whether the problem requires the use of temporal information distributed over long time intervals. Such problems require training algorithms which can handle hidden nodes. The most prominent of these algorithms, back propagation through time, solves the temporal credit assignment problem in a way which can work only if the relevant information is distributed locally in time. The Moving Targets algorithm works for the more general case, but is computationally intensive, and prone to local minima.
Resumo:
The primary objective of this research was to understand what kinds of knowledge and skills people use in `extracting' relevant information from text and to assess the extent to which expert systems techniques could be applied to automate the process of abstracting. The approach adopted in this thesis is based on research in cognitive science, information science, psycholinguistics and textlinguistics. The study addressed the significance of domain knowledge and heuristic rules by developing an information extraction system, called INFORMEX. This system, which was implemented partly in SPITBOL, and partly in PROLOG, used a set of heuristic rules to analyse five scientific papers of expository type, to interpret the content in relation to the key abstract elements and to extract a set of sentences recognised as relevant for abstracting purposes. The analysis of these extracts revealed that an adequate abstract could be generated. Furthermore, INFORMEX showed that a rule based system was a suitable computational model to represent experts' knowledge and strategies. This computational technique provided the basis for a new approach to the modelling of cognition. It showed how experts tackle the task of abstracting by integrating formal knowledge as well as experiential learning. This thesis demonstrated that empirical and theoretical knowledge can be effectively combined in expert systems technology to provide a valuable starting approach to automatic abstracting.