998 resultados para Speech segmentation
Resumo:
In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.
Resumo:
Image segmentation plays an important role in the analysis of retinal images as the extraction of the optic disk provides important cues for accurate diagnosis of various retinopathic diseases. In recent years, gradient vector flow (GVF) based algorithms have been used successfully to successfully segment a variety of medical imagery. However, due to the compromise of internal and external energy forces within the resulting partial differential equations, these methods can lead to less accurate segmentation results in certain cases. In this paper, we propose the use of a new mean shift-based GVF segmentation algorithm that drives the internal/external energies towards the correct direction. The proposed method incorporates a mean shift operation within the standard GVF cost function to arrive at a more accurate segmentation. Experimental results on a large dataset of retinal images demonstrate that the presented method optimally detects the border of the optic disc.
Resumo:
In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.