Biblioteca Digital

914 resultados para robust speech recognition

Robust treatment of impulsive noise in speech and audio signals

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Study of emotions in human-computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested.

Veja mais

Sub-band coding techniques for robust 4.8 to 8 kb/s speech coders

Relevância:

40.00% 40.00%

Publicador:

Veja mais

The auditory processing and recognition of speech

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Automatic recognition of spontaneous speech for access to multilingual oral history archives

Relevância:

40.00% 40.00%

Publicador:

Veja mais

A robust 8000 bit/s. sub-band speech coder

Relevância:

40.00% 40.00%

Publicador:

Veja mais

SUB-BAND CODING TECHNIQUES FOR ROBUST 4. 8 TO 8 kb/s SPEECH CODERS.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper describes a speech coding technique that has been developed in order to provide a method of digitising speech at bit rates in the range 4. 8 to 8 kb/s, that is insensitive to the effects of acoustic background noise and bit errors on the digital link. The main aim has been to develop a coding scheme which provides speech quality and robustness against noise and errors that is similar to a 16000 b/s continuously variable slope delta (CVSD) coder, but which operates at half its data rate or less. A desirable aim was to keep the complexity of the coding scheme within the scope of what could reasonably be handled by current signal processing chips or by a single custom integrated circuit. Applications areas include mobile radio and small Satcomms terminals.

Veja mais

An efficient and robust pitch marking algorithm on the speech waveform for TD-PSOLA

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In a Text-to-Speech system based on time-domain techniques that employ pitch-synchronous manipulation of the speech waveforms, one of the most important issues that affect the output quality is the way the analysis points of the speech signal are estimated and the actual points, i.e. the analysis pitchmarks. In this paper we present our methodology for calculating the pitchmarks of a speech waveform, a pitchmark detection algorithm, which after thorough experimentation and in comparison with other algorithms, proves to behave better with our TD-PSOLA-based Text-to-Speech synthesizer (Time- Domain Pitch-Synchronous Overlap Add Text to Speech System).

Veja mais

Robust F0 estimation based on a multi-microphone periodicity function for distant-talking speech

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work addresses the problem of deriving F0 from distanttalking speech signals acquired by a microphone network. The method here proposed exploits the redundancy across the channels by jointly processing the different signals. To this purpose, a multi-microphone periodicity function is derived from the magnitude spectrum of all the channels. This function allows to estimate F0 reliably, even under reverberant conditions, without the need of any post-processing or smoothing technique. Experiments, conducted on real data, showed that the proposed frequency-domain algorithm is more suitable than other time-domain based ones.

Veja mais

Robust recognition of chess-boards under deformation

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Current methods for formation of detected chess-board vertices into a grid structure tend to be weak in situations with a warped grid, and false and missing vertex-features. In this paper we present a highly robust, yet efficient, scheme suitable for inference of regular 2D square mesh structure from vertices recorded both during projection of a chess-board pattern onto 3D objects, and in the more simple case of camera calibration. Examples of the method's performance in a lung function measuring application, observing chess-boards projected on to patients' chests, are given. The method presented is resilient to significant surface deformation, and tolerates inexact vertex-feature detection. This robustness results from the scheme's novel exploitation of feature orientation information. © 2013 IEEE.

Veja mais

Mandarin speech emotion recognition based on high dimensional geometry theory

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, a novel approach for mandarin speech emotion recognition, that is mandarin speech emotion recognition based on high dimensional geometry theory, is proposed. The human emotions are classified into 6 archetypal classes: fear, anger, happiness, sadness, surprise and disgust. According to the characteristics of these emotional speech signals, the amplitude, pitch frequency and formant are used as the feature parameters for speech emotion recognition. The new method called high dimensional geometry theory is applied for recognition. Compared with traditional GSVM model, the new method has some advantages. It is noted that this method has significant values for researches and applications henceforth.

Veja mais

Robust and Efficient 3D Recognition by Alignment

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Alignment is a prevalent approach for recognizing 3D objects in 2D images. A major problem with current implementations is how to robustly handle errors that propagate from uncertainties in the locations of image features. This thesis gives a technique for bounding these errors. The technique makes use of a new solution to the problem of recovering 3D pose from three matching point pairs under weak-perspective projection. Furthermore, the error bounds are used to demonstrate that using line segments for features instead of points significantly reduces the false positive rate, to the extent that alignment can remain reliable even in cluttered scenes.

Veja mais

Robust 2-D Model-Based Object Recognition

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Techniques, suitable for parallel implementation, for robust 2D model-based object recognition in the presence of sensor error are studied. Models and scene data are represented as local geometric features and robust hypothesis of feature matchings and transformations is considered. Bounds on the error in the image feature geometry are assumed constraining possible matchings and transformations. Transformation sampling is introduced as a simple, robust, polynomial-time, and highly parallel method of searching the space of transformations to hypothesize feature matchings. Key to the approach is that error in image feature measurement is explicitly accounted for. A Connection Machine implementation and experiments on real images are presented.

Veja mais

Towards Real-time Speech Emotion Recognition for Affective E-Learning

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The original article is available as an open access file on the Springer website in the following link: http://link.springer.com/article/10.1007/s10639-015-9388-2

Veja mais

Modeling long-range dependencies in speech data for text-independent speaker recognition

Relevância:

40.00% 40.00%

Publicador:

Veja mais

914 resultados para robust speech recognition

Filtro por publicador