Biblioteca Digital

26 resultados para Speech Processing

Analysis of information leakage from encrypted Skype conversations

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Voice over IP (VoIP) has experienced a tremendous growth over the last few years and is now widely used among the population and for business purposes. The security of such VoIP systems is often assumed, creating a false sense of privacy. This paper investigates in detail the leakage of information from Skype, a widely used and protected VoIP application. Experiments have shown that isolated phonemes can be classified and given sentences identified. By using the dynamic time warping (DTW) algorithm, frequently used in speech processing, an accuracy of 60% can be reached. The results can be further improved by choosing specific training data and reach an accuracy of 83% under specific conditions. The initial results being speaker dependent, an approach involving the Kalman filter is proposed to extract the kernel of all training signals.

Data Processing: Imaging of speech data

Relevância:

40.00% 40.00%

Publicador:

Emotion and Mental State Recognition from Speech: Special Issue of EURASIP Journals on Advances in Signal Processing vol 15

Relevância:

40.00% 40.00%

Publicador:

Emotional speech: towards a new generation of databases

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research on speech and emotion is moving from a period of exploratory research into one where there is a prospect of substantial applications, notably in human-computer interaction. Progress in the area relies heavily on the development of appropriate databases. This paper addresses the issues that need to be considered in developing databases of emotional speech, and shows how the challenge of developing apropriate databases is being addressed in three major recent projects - the Belfast project, the Reading-Leeds project and the CREST-ESP project. From these and other studies the paper draws together the tools and methods that have been developed, addresses the problems that arise and indicates the future directions for the development of emotional speech databases.

Subband Correlation and Robust Speech Recognition

Relevância:

30.00% 30.00%

Publicador:

Robust Speech recognition using probabilistic union models

Relevância:

30.00% 30.00%

Publicador:

Speech Recognition with unknown partial feature corruption - a review of the union model

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper provides a summary of our studies on robust speech recognition based on a new statistical approach – the probabilistic union model. We consider speech recognition given that part of the acoustic features may be corrupted by noise. The union model is a method for basing the recognition on the clean part of the features, thereby reducing the effect of the noise on recognition. To this end, the union model is similar to the missing feature method. However, the two methods achieve this end through different routes. The missing feature method usually requires the identity of the noisy data for noise removal, while the union model combines the local features based on the union of random events, to reduce the dependence of the model on information about the noise. We previously investigated the applications of the union model to speech recognition involving unknown partial corruption in frequency band, in time duration, and in feature streams. Additionally, a combination of the union model with conventional noise-reduction techniques was studied, as a means of dealing with a mixture of known or trainable noise and unknown unexpected noise. In this paper, a unified review, in the context of dealing with unknown partial feature corruption, is provided into each of these applications, giving the appropriate theory and implementation algorithms, along with an experimental evaluation.

Noise compensation for speech recognition with arbitray additive noise

Relevância:

30.00% 30.00%

Publicador:

Describing the emotional states that are expressed in speech.

Relevância:

30.00% 30.00%

Publicador:

Union: a new approach for combining sub-band observations for noisy speech recognition

Relevância:

30.00% 30.00%

Publicador:

Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.

Hidden Conditional Random Fields for Visual Speech Recognition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.

Maximizing the continuity in segmentation - a new approach to model, segment and recognize speech

Relevância:

30.00% 30.00%

Publicador:

Modeling long-range dependencies in speech data for text-independent speaker recognition

Relevância:

30.00% 30.00%

Publicador:

Inter-Frame Contextual Modelling For Visual Speech Recognition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.

«
1
2
»