229 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features


Relevância:

100.00% 100.00%

Publicador:

Resumo:

State-of-the-art speech recognisers are usually based on hidden Markov models (HMMs). They model a hidden symbol sequence with a Markov process, with the observations independent given that sequence. These assumptions yield efficient algorithms, but limit the power of the model. An alternative model that allows a wide range of features, including word- and phone-level features, is a log-linear model. To handle, for example, word-level variable-length features, the original feature vectors must be segmented into words. Thus, decoding must find the optimal combination of segmentation of the utterance into words and word sequence. Features must therefore be extracted for each possible segment of audio. For many types of features, this becomes slow. In this paper, long-span features are derived from the likelihoods of word HMMs. Derivatives of the log-likelihoods, which break the Markov assumption, are appended. Previously, decoding with this model took cubic time in the length of the sequence, and longer for higher-order derivatives. This paper shows how to decode in quadratic time. © 2013 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For speech recognition, mismatches between training and testing for speaker and noise are normally handled separately. The work presented in this paper aims at jointly applying speaker adaptation and model-based noise compensation by embedding speaker adaptation as part of the noise mismatch function. The proposed method gives a faster and more optimum adaptation compared to compensating for these two factors separately. It is also more consistent with respect to the basic assumptions of speaker and noise adaptation. Experimental results show significant and consistent gains from the proposed method. © 2011 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As-built models have been proven useful in many project-related applications, such as progress monitoring and quality control. However, they are not widely produced in most projects because a lot of effort is still necessary to manually convert remote sensing data from photogrammetry or laser scanning to an as-built model. In order to automate the generation of as-built models, the first and fundamental step is to automatically recognize infrastructure-related elements from the remote sensing data. This paper outlines a framework for creating visual pattern recognition models that can automate the recognition of infrastructure-related elements based on their visual features. The framework starts with identifying the visual characteristics of infrastructure element types and numerically representing them using image analysis tools. The derived representations, along with their relative topology, are then used to form element visual pattern recognition (VPR) models. So far, the VPR models of four infrastructure-related elements have been created using the framework. The high recognition performance of these models validates the effectiveness of the framework in recognizing infrastructure-related elements.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As-built models have been proven useful in many project-related applications, such as progress monitoring and quality control. However, they are not widely produced in most projects because a lot of effort is still necessary to manually convert remote sensing data from photogrammetry or laser scanning to an as-built model. In order to automate the generation of as-built models, the first and fundamental step is to automatically recognize infrastructure-related elements from the remote sensing data. This paper outlines a framework for creating visual pattern recognition models that can automate the recognition of infrastructure-related elements based on their visual features. The framework starts with identifying the visual characteristics of infrastructure element types and numerically representing them using image analysis tools. The derived representations, along with their relative topology, are then used to form element visual pattern recognition (VPR) models. So far, the VPR models of four infrastructure-related elements have been created using the framework. The high recognition performance of these models validates the effectiveness of the framework in recognizing infrastructure-related elements.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Four types of neural networks which have previously been established for speech recognition and tested on a small, seven-speaker, 100-sentence database are applied to the TIMIT database. The networks are a recurrent network phoneme recognizer, a modified Kanerva model morph recognizer, a compositional representation phoneme-to-word recognizer, and a modified Kanerva model morph-to-word recognizer. The major result is for the recurrent net, giving a phoneme recognition accuracy of 57% from the si and sx sentences. The Kanerva morph recognizer achieves 66.2% accuracy for a small subset of the sa and sx sentences. The results for the word recognizers are incomplete.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The visual system must learn to infer the presence of objects and features in the world from the images it encounters, and as such it must, either implicitly or explicitly, model the way these elements interact to create the image. Do the response properties of cells in the mammalian visual system reflect this constraint? To address this question, we constructed a probabilistic model in which the identity and attributes of simple visual elements were represented explicitly and learnt the parameters of this model from unparsed, natural video sequences. After learning, the behaviour and grouping of variables in the probabilistic model corresponded closely to functional and anatomical properties of simple and complex cells in the primary visual cortex (V1). In particular, feature identity variables were activated in a way that resembled the activity of complex cells, while feature attribute variables responded much like simple cells. Furthermore, the grouping of the attributes within the model closely parallelled the reported anatomical grouping of simple cells in cat V1. Thus, this generative model makes explicit an interpretation of complex and simple cells as elements in the segmentation of a visual scene into basic independent features, along with a parametrisation of their moment-by-moment appearances. We speculate that such a segmentation may form the initial stage of a hierarchical system that progressively separates the identity and appearance of more articulated visual elements, culminating in view-invariant object recognition.