432 resultados para Automatic term extraction


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work aims at developing a planetary rover capable of acting as an assistant astrobiologist: making a preliminary analysis of the collected visual images that will help to make better use of the scientists time by pointing out the most interesting pieces of data. This paper focuses on the problem of detecting and recognising particular types of stromatolites. Inspired by the processes actual astrobiologists go through in the field when identifying stromatolites, the processes we investigate focus on recognising characteristics associated with biogenicity. The extraction of these characteristics is based on the analysis of geometrical structure enhanced by passing the images of stromatolites into an edge-detection filter and its Fourier Transform, revealing typical spatial frequency patterns. The proposed analysis is performed on both simulated images of stromatolite structures and images of real stromatolites taken in the field by astrobiologists.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Camera-laser calibration is necessary for many robotics and computer vision applications. However, existing calibration toolboxes still require laborious effort from the operator in order to achieve reliable and accurate results. This paper proposes algorithms that augment two existing trustful calibration methods with an automatic extraction of the calibration object from the sensor data. The result is a complete procedure that allows for automatic camera-laser calibration. The first stage of the procedure is automatic camera calibration which is useful in its own right for many applications. The chessboard extraction algorithm it provides is shown to outperform openly available techniques. The second stage completes the procedure by providing automatic camera-laser calibration. The procedure has been verified by extensive experimental tests with the proposed algorithms providing a major reduction in time required from an operator in comparison to manual methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A long query provides more useful hints for searching relevant documents, but it is likely to introduce noise which affects retrieval performance. In order to smooth such adverse effect, it is important to reduce noisy terms, introduce and boost additional relevant terms. This paper presents a comprehensive framework, called Aspect Hidden Markov Model (AHMM), which integrates query reduction and expansion, for retrieval with long queries. It optimizes the probability distribution of query terms by utilizing intra-query term dependencies as well as the relationships between query terms and words observed in relevance feedback documents. Empirical evaluation on three large-scale TREC collections demonstrates that our approach, which is automatic, achieves salient improvements over various strong baselines, and also reaches a comparable performance to a state of the art method based on user’s interactive query term reduction and expansion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Used frequently in food contact materials, bisphenol A (BPA) has been studied extensively in recent years, and ubiquitous exposure in the general population has been demonstrated worldwide. Characterising within- and between-individual variability of BPA concentrations is important for characterising exposure in biomonitoring studies, and this has been investigated previously in adults, but not in children. The aim of this study was to characterise the short-term variability of BPA in spot urine samples in young children. Children aged ≥2-<4 years (n = 25) were recruited from an existing cohort in Queensland Australia, and donated four spot urine samples each over a two day period. Samples were analysed for total BPA using isotope dilution online solid phase extraction-liquid chromatography-tandem mass spectrometry, and concentrations ranged from 0.53–74.5 ng/ml, with geometric mean and standard deviation of 2.70 ng/ml and 2.94 ng/ml, respectively. Sex and time of sample collection were not significant predictors of BPA concentration. The between-individual variability was approximately equal to the within-individual variability (ICC = 0.51), and this ICC is somewhat higher than previously reported literature values. This may be the result of physiological or behavioural differences between children and adults or of the relatively short exposure window assessed. Using a bootstrapping methodology, a single sample resulted in correct tertile classification approximately 70% of the time. This study suggests that single spot samples obtained from young children provide a reliable characterization of absolute and relative exposure over the short time window studied, but this may not hold true over longer timeframes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Robustness to variations in environmental conditions and camera viewpoint is essential for long-term place recognition, navigation and SLAM. Existing systems typically solve either of these problems, but invariance to both remains a challenge. This paper presents a training-free approach to lateral viewpoint- and condition-invariant, vision-based place recognition. Our successive frame patch-tracking technique infers average scene depth along traverses and automatically rescales views of the same place at different depths to increase their similarity. We combine our system with the condition-invariant SMART algorithm and demonstrate place recognition between day and night, across entire 4-lane-plus-median-strip roads, where current algorithms fail.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective This paper presents an automatic active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort, and (2) the robustness of incremental active learning framework across different selection criteria and datasets is determined. Materials and methods The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional Random Fields as the supervised method, and least confidence and information density as two selection criteria for active learning framework were used. The effect of incremental learning vs. standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Results The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared to the Random sampling baseline, the saving is at least doubled. Discussion Incremental active learning guarantees robustness across all selection criteria and datasets. The reduction of annotation effort is always above random sampling and longest sequence baselines. Conclusion Incremental active learning is a promising approach for building effective and robust medical concept extraction models, while significantly reducing the burden of manual annotation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We introduce a framework for population analysis of white matter tracts based on diffusion-weighted images of the brain. The framework enables extraction of fibers from high angular resolution diffusion images (HARDI); clustering of the fibers based partly on prior knowledge from an atlas; representation of the fiber bundles compactly using a path following points of highest density (maximum density path; MDP); and registration of these paths together using geodesic curve matching to find local correspondences across a population. We demonstrate our method on 4-Tesla HARDI scans from 565 young adults to compute localized statistics across 50 white matter tracts based on fractional anisotropy (FA). Experimental results show increased sensitivity in the determination of genetic influences on principal fiber tracts compared to the tract-based spatial statistics (TBSS) method. Our results show that the MDP representation reveals important parts of the white matter structure and considerably reduces the dimensionality over comparable fiber matching approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador: