Biblioteca Digital

This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifications of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques.

Veja mais

Decision tree-based context clustering based on cross validation and hierarchical priors

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The standard, ad-hoc stopping criteria used in decision tree-based context clustering are known to be sub-optimal and require parameters to be tuned. This paper proposes a new approach for decision tree-based context clustering based on cross validation and hierarchical priors. Combination of cross validation and hierarchical priors within decision tree-based context clustering offers better model selection and more robust parameter estimation than conventional approaches, with no tuning parameters. Experimental results on HMM-based speech synthesis show that the proposed approach achieved significant improvements in naturalness of synthesized speech over the conventional approaches. © 2011 IEEE.

Veja mais

High-technology clustering through spin-out and attraction: the Cambridge case

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Corporate document mining for technology intelligence: An analysis of needs, utilization and possibilities

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper focuses on document data, one of the most significant sources for technology intelligence. To help organisations use their knowledge in documents effectively, this research aims to identify what organizations really want from documents and what might be possible to obtain from them. The research involves a literature review, a series of in-depth/on-site interviews and a descriptive analysis of document mining applications. The output of the research includes: a document mining framework; an analysis of the current condition of document mining in technology-based organisations together with their future requirements; and guidelines for introducing document mining into an organisation along with a discussion on the practical issues that are faced by users. Copyright © 2011 Inderscience Enterprises Ltd.

Veja mais

Spectral Methods for Automatic Multiscale Data Clustering.

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Spatio-temporal clustering of probabilistic region trajectories

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel model for the spatio-temporal clustering of trajectories based on motion, which applies to challenging street-view video sequences of pedestrians captured by a mobile camera. A key contribution of our work is the introduction of novel probabilistic region trajectories, motivated by the non-repeatability of segmentation of frames in a video sequence. Hierarchical image segments are obtained by using a state-of-the-art hierarchical segmentation algorithm, and connected from adjacent frames in a directed acyclic graph. The region trajectories and measures of confidence are extracted from this graph using a dynamic programming-based optimisation. Our second main contribution is a Bayesian framework with a twofold goal: to learn the optimal, in a maximum likelihood sense, Random Forests classifier of motion patterns based on video features, and construct a unique graph from region trajectories of different frames, lengths and hierarchical levels. Finally, we demonstrate the use of Isomap for effective spatio-temporal clustering of the region trajectories of pedestrians. We support our claims with experimental results on new and existing challenging video sequences. © 2011 IEEE.

Veja mais

MCBoost: Multiple classifier boosting for perceptual co-clustering of images and visual features

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Clustering user data for user modelling in the GUIDE multi-modal set-top box

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Chapter 20 Clustering User Data for User Modelling in the GUIDE Multi-modal Set- top Box PM Langdon and P. Biswas 20.1 ... It utilises advanced user modelling and simulation in conjunction with a single layer interface that permits a ...

Veja mais

A Nonparametric Bayesian Model for Multiple Clustering with Overlapping Feature Views.

Relevância:

20.00% 20.00%

Publicador:

Veja mais

MCBoost: Multiple classifier boosting for perceptual co-clustering of images and visual features

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a new co-clustering problem of images and visual features. The problem involves a set of non-object images in addition to a set of object images and features to be co-clustered. Co-clustering is performed in a way that maximises discrimination of object images from non-object images, thus emphasizing discriminative features. This provides a way of obtaining perceptual joint-clusters of object images and features. We tackle the problem by simultaneously boosting multiple strong classifiers which compete for images by their expertise. Each boosting classifier is an aggregation of weak-learners, i.e. simple visual features. The obtained classifiers are useful for object detection tasks which exhibit multimodalities, e.g. multi-category and multi-view object detection tasks. Experiments on a set of pedestrian images and a face data set demonstrate that the method yields intuitive image clusters with associated features and is much superior to conventional boosting classifiers in object detection tasks.

Veja mais

72 resultados para document clustering

Filtro por publicador