8 resultados para Extraction techniques
em CentAUR: Central Archive University of Reading - UK
Resumo:
Keyphrases are added to documents to help identify the areas of interest they contain. However, in a significant proportion of papers author selected keyphrases are not appropriate for the document they accompany: for instance, they can be classificatory rather than explanatory, or they are not updated when the focus of the paper changes. As such, automated methods for improving the use of keyphrases are needed, and various methods have been published. However, each method was evaluated using a different corpus, typically one relevant to the field of study of the method’s authors. This not only makes it difficult to incorporate the useful elements of algorithms in future work, but also makes comparing the results of each method inefficient and ineffective. This paper describes the work undertaken to compare five methods across a common baseline of corpora. The methods chosen were Term Frequency, Inverse Document Frequency, the C-Value, the NC-Value, and a Synonym based approach. These methods were analysed to evaluate performance and quality of results, and to provide a future benchmark. It is shown that Term Frequency and Inverse Document Frequency were the best algorithms, with the Synonym approach following them. Following these findings, a study was undertaken into the value of using human evaluators to judge the outputs. The Synonym method was compared to the original author keyphrases of the Reuters’ News Corpus. The findings show that authors of Reuters’ news articles provide good keyphrases but that more often than not they do not provide any keyphrases.
Resumo:
Remote sensing can potentially provide information useful in improving pollution transport modelling in agricultural catchments. Realisation of this potential will depend on the availability of the raw data, development of information extraction techniques, and the impact of the assimilation of the derived information into models. High spatial resolution hyperspectral imagery of a farm near Hereford, UK is analysed. A technique is described to automatically identify the soil and vegetation endmembers within a field, enabling vegetation fractional cover estimation. The aerially-acquired laser altimetry is used to produce digital elevation models of the site. At the subfield scale the hypothesis that higher resolution topography will make a substantial difference to contaminant transport is tested using the AGricultural Non-Point Source (AGNPS) model. Slope aspect and direction information are extracted from the topography at different resolutions to study the effects on soil erosion, deposition, runoff and nutrient losses. Field-scale models are often used to model drainage water, nitrate and runoff/sediment loss, but the demanding input data requirements make scaling up to catchment level difficult. By determining the input range of spatial variables gathered from EO data, and comparing the response of models to the range of variation measured, the critical model inputs can be identified. Response surfaces to variation in these inputs constrain uncertainty in model predictions and are presented. Although optical earth observation analysis can provide fractional vegetation cover, cloud cover and semi-random weather patterns can hinder data acquisition in Northern Europe. A Spring and Autumn cloud cover analysis is carried out over seven UK sites close to agricultural districts, using historic satellite image metadata, climate modelling and historic ground weather observations. Results are assessed in terms of probability of acquisition probability and implications for future earth observation missions. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
In this paper, we introduce a novel high-level visual content descriptor which is devised for performing semantic-based image classification and retrieval. The work can be treated as an attempt to bridge the so called “semantic gap”. The proposed image feature vector model is fundamentally underpinned by the image labelling framework, called Collaterally Confirmed Labelling (CCL), which incorporates the collateral knowledge extracted from the collateral texts of the images with the state-of-the-art low-level image processing and visual feature extraction techniques for automatically assigning linguistic keywords to image regions. Two different high-level image feature vector models are developed based on the CCL labelling of results for the purposes of image data clustering and retrieval respectively. A subset of the Corel image collection has been used for evaluating our proposed method. The experimental results to-date already indicates that our proposed semantic-based visual content descriptors outperform both traditional visual and textual image feature models.
Resumo:
In this paper, we introduce a novel high-level visual content descriptor devised for performing semantic-based image classification and retrieval. The work can be treated as an attempt for bridging the so called "semantic gap". The proposed image feature vector model is fundamentally underpinned by an automatic image labelling framework, called Collaterally Cued Labelling (CCL), which incorporates the collateral knowledge extracted from the collateral texts accompanying the images with the state-of-the-art low-level visual feature extraction techniques for automatically assigning textual keywords to image regions. A subset of the Corel image collection was used for evaluating the proposed method. The experimental results indicate that our semantic-level visual content descriptors outperform both conventional visual and textual image feature models.
Resumo:
Multispectral iris recognition uses information from multiple bands of the electromagnetic spectrum to better represent certain physiological characteristics of the iris texture and enhance obtained recognition accuracy. This paper addresses the questions of single versus cross spectral performance and compares score-level fusion accuracy for different feature types, combining different wavelengths to overcome limitations in less constrained recording environments. Further it is investigated whether Doddington's “goats” (users who are particularly difficult to recognize) in one spectrum also extend to other spectra. Focusing on the question of feature stability at different wavelengths, this work uses manual ground truth segmentation, avoiding bias by segmentation impact. Experiments on the public UTIRIS multispectral iris dataset using 4 feature extraction techniques reveal a significant enhancement when combining NIR + Red for 2-channel and NIR + Red + Blue for 3-channel fusion, across different feature types. Selective feature-level fusion is investigated and shown to improve overall and especially cross-spectral performance without increasing the overall length of the iris code.
Resumo:
Three procedures for the isolation of volatiles from grilled goat meat were compared: dynamic headspace entrainment on Tenax TA, simultaneous steam distillation-extraction, and solid-phase microextraction. Headspace entrainment on Tenax TA extracted the highest number of Maillard-derived volatile compounds. Two hundred and three volatile components were identified: 159 are reported for the first time in goat meat. Most of the volatiles detected (155) were lipid oxidation products, such as hydrocarbons, aldehydes, alcohols, ketones, carboxylic acids and esters. Forty-eight Maillard-derived compounds were identified. comprising pyrazines, pyrroles, thiophenes, furanthiol derivatives, alkyl and alicyclic sulphides, pyridines, and thiazoles. Some reported character impact compounds of cooked meat, e.g., 12-methyltridecanal, (EE)-2,4-decadienal, methional, and dimethyl trisulphide were identified in the volatile profile of goat meat, together with a series of C-2 to C-5 alkylformylcyclopentenes, which have been reported in cooked chicken, pork, beef and lamb, as being important for the characteristic flavour impression of different animal species. (C) 2009 Published by Elsevier Ltd.
Resumo:
n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.