10 resultados para Information Extraction
em CentAUR: Central Archive University of Reading - UK
Resumo:
A new robust neurofuzzy model construction algorithm has been introduced for the modeling of a priori unknown dynamical systems from observed finite data sets in the form of a set of fuzzy rules. Based on a Takagi-Sugeno (T-S) inference mechanism a one to one mapping between a fuzzy rule base and a model matrix feature subspace is established. This link enables rule based knowledge to be extracted from matrix subspace to enhance model transparency. In order to achieve maximized model robustness and sparsity, a new robust extended Gram-Schmidt (G-S) method has been introduced via two effective and complementary approaches of regularization and D-optimality experimental design. Model rule bases are decomposed into orthogonal subspaces, so as to enhance model transparency with the capability of interpreting the derived rule base energy level. A locally regularized orthogonal least squares algorithm, combined with a D-optimality used for subspace based rule selection, has been extended for fuzzy rule regularization and subspace based information extraction. By using a weighting for the D-optimality cost function, the entire model construction procedure becomes automatic. Numerical examples are included to demonstrate the effectiveness of the proposed new algorithm.
Resumo:
Remote sensing can potentially provide information useful in improving pollution transport modelling in agricultural catchments. Realisation of this potential will depend on the availability of the raw data, development of information extraction techniques, and the impact of the assimilation of the derived information into models. High spatial resolution hyperspectral imagery of a farm near Hereford, UK is analysed. A technique is described to automatically identify the soil and vegetation endmembers within a field, enabling vegetation fractional cover estimation. The aerially-acquired laser altimetry is used to produce digital elevation models of the site. At the subfield scale the hypothesis that higher resolution topography will make a substantial difference to contaminant transport is tested using the AGricultural Non-Point Source (AGNPS) model. Slope aspect and direction information are extracted from the topography at different resolutions to study the effects on soil erosion, deposition, runoff and nutrient losses. Field-scale models are often used to model drainage water, nitrate and runoff/sediment loss, but the demanding input data requirements make scaling up to catchment level difficult. By determining the input range of spatial variables gathered from EO data, and comparing the response of models to the range of variation measured, the critical model inputs can be identified. Response surfaces to variation in these inputs constrain uncertainty in model predictions and are presented. Although optical earth observation analysis can provide fractional vegetation cover, cloud cover and semi-random weather patterns can hinder data acquisition in Northern Europe. A Spring and Autumn cloud cover analysis is carried out over seven UK sites close to agricultural districts, using historic satellite image metadata, climate modelling and historic ground weather observations. Results are assessed in terms of probability of acquisition probability and implications for future earth observation missions. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
Children’s eye movements during reading. In this chapter, we evaluate the literature on children’s eye movements during reading to date. We describe the basic, developmental changes that occur in eye movement behaviour during reading, discuss age-related changes in the extent and time course of information extraction during fixations in reading, and compare the effects of visual and linguistic manipulations in the text on children’s eye movement behaviour in relation to skilled adult readers. We argue that future research will benefit from examining how eye movement behaviour during reading develops in relation to language and literacy skills, and use of computational modelling with children’s eye movement data may improve our understanding of the mechanisms that underlie the progression from beginning to skilled reader.
Resumo:
Automatic indexing and retrieval of digital data poses major challenges. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions, or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. For a number of years research has been ongoing in the field of ontological engineering with the aim of using ontologies to add such (meta) knowledge to information. In this paper, we describe the architecture of a system (Dynamic REtrieval Analysis and semantic metadata Management (DREAM)) designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval. The DREAM Demonstrator has been evaluated as deployed in the film post-production phase to support the process of storage, indexing and retrieval of large data sets of special effects video clips as an exemplar application domain. This paper provides its performance and usability results and highlights the scope for future enhancements of the DREAM architecture which has proven successful in its first and possibly most challenging proving ground, namely film production, where it is already in routine use within our test bed Partners' creative processes. (C) 2009 Published by Elsevier B.V.
Resumo:
Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes the paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate appropriate and diverse range of keyphrases that reflect the document. This paper proposes a solution that examines the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work.
Resumo:
There are many published methods available for creating keyphrases for documents. Previous work in the field has shown that in a significant proportion of cases author selected keyphrases are not appropriate for the document they accompany. This requires the use of such automated methods to improve the use of keyphrases. Often the keyphrases are not updated when the focus of a paper changes or include keyphrases that are more classificatory than explanatory. The published methods are all evaluated using different corpora, typically one relevant to their field of study. This not only makes it difficult to incorporate the useful elements of algorithms in future work but also makes comparing the results of each method inefficient and ineffective. This paper describes the work undertaken to compare five methods across a common baseline of six corpora. The methods chosen were term frequency, inverse document frequency, the C-Value, the NC-Value, and a synonym based approach. These methods were compared to evaluate performance and quality of results, and to provide a future benchmark. It is shown that, with the comparison metric used for this study Term Frequency and Inverse Document Frequency were the best algorithms, with the synonym based approach following them. Further work in the area is required to determine an appropriate (or more appropriate) comparison metric.
Resumo:
Analysis of microbial gene expression during host colonization provides valuable information on the nature of interaction, beneficial or pathogenic, and the adaptive processes involved. Isolation of bacterial mRNA for in planta analysis can be challenging where host nucleic acid may dominate the preparation, or inhibitory compounds affect downstream analysis, e.g., quantitative reverse transcriptase PCR (qPCR), microarray, or RNA-seq. The goal of this work was to optimize the isolation of bacterial mRNA of food-borne pathogens from living plants. Reported methods for recovery of phytopathogen-infected plant material, using hot phenol extraction and high concentration of bacterial inoculation or large amounts of infected tissues, were found to be inappropriate for plant roots inoculated with Escherichia coli O157:H7. The bacterial RNA yields were too low and increased plant material resulted in a dominance of plant RNA in the sample. To improve the yield of bacterial RNA and reduce the number of plants required, an optimized method was developed which combines bead beating with directed bacterial lysis using SDS and lysozyme. Inhibitory plant compounds, such as phenolics and polysaccharides, were counteracted with the addition of high-molecular-weight polyethylene glycol and hexadecyltrimethyl ammonium bromide. The new method increased the total yield of bacterial mRNA substantially and allowed assessment of gene expression by qPCR. This method can be applied to other bacterial species associated with plant roots, and also in the wider context of food safety.
Resumo:
Twitter has become a dependable microblogging tool for real time information dissemination and newsworthy events broadcast. Its users sometimes break news on the network faster than traditional newsagents due to their presence at ongoing real life events at most times. Different topic detection methods are currently used to match Twitter posts to real life news of mainstream media. In this paper, we analyse tweets relating to the English FA Cup finals 2012 by applying our novel method named TRCM to extract association rules present in hash tag keywords of tweets in different time-slots. Our system identify evolving hash tag keywords with strong association rules in each time-slot. We then map the identified hash tag keywords to event highlights of the game as reported in the ground truth of the main stream media. The performance effectiveness measure of our experiments show that our method perform well as a Topic Detection and Tracking approach.
Resumo:
This article is concerned with the risks associated with the monopolisation of information that is available from a single source only. Although there is a longstanding consensus that sole-source databases should not receive protection under the EU Database Directive, and there are legislative provisions to ensure that lawful users have access to a database’s contents, Ryanair v PR Aviation challenges this assumption by affirming that the use of non-protected databases can be restricted by contract. Owners of non-protected databases can contractually exclude lawful users from taking the benefit of statutorily permitted uses, because such databases are not covered from the legislation that declares this kind of contract null and void. We argue that this judgment is not consistent with the legislative history and can have a profound impact on the functioning of the digital single market, where new information services, such as meta-search engines or price-comparison websites, base their operation on the systematic extraction and re-utilisation of materials available from online sources. This is an issue that the Commission should address in a forthcoming evaluation of the Database Directive.