951 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View
Resumo:
In this letter, a speech recognition algorithm based on the least-squares method is presented. Particularly, the intention is to exemplify how such a traditional numerical technique can be applied to solve a signal processing problem that is usually treated by using more elaborated formulations.
Resumo:
[EN] In the last years we have developed some methods for 3D reconstruction. First we began with the problem of reconstructing a 3D scene from a stereoscopic pair of images. We developed some methods based on energy functionals which produce dense disparity maps by preserving discontinuities from image boundaries. Then we passed to the problem of reconstructing a 3D scene from multiple views (more than 2). The method for multiple view reconstruction relies on the method for stereoscopic reconstruction. For every pair of consecutive images we estimate a disparity map and then we apply a robust method that searches for good correspondences through the sequence of images. Recently we have proposed several methods for 3D surface regularization. This is a postprocessing step necessary for smoothing the final surface, which could be afected by noise or mismatch correspondences. These regularization methods are interesting because they use the information from the reconstructing process and not only from the 3D surface. We have tackled all these problems from an energy minimization approach. We investigate the associated Euler-Lagrange equation of the energy functional, and we approach the solution of the underlying partial differential equation (PDE) using a gradient descent method.
Resumo:
We used magnetoencephalography (MEG) to map the spatiotemporal evolution of cortical activity for visual word recognition. We show that for five-letter words, activity in the left hemisphere (LH) fusiform gyrus expands systematically in both the posterior-anterior and medial-lateral directions over the course of the first 500 ms after stimulus presentation. Contrary to what would be expected from cognitive models and hemodynamic studies, the component of this activity that spatially coincides with the visual word form area (VWFA) is not active until around 200 ms post-stimulus, and critically, this activity is preceded by and co-active with activity in parts of the inferior frontal gyrus (IFG, BA44/6). The spread of activity in the VWFA for words does not appear in isolation but is co-active in parallel with spread of activity in anterior middle temporal gyrus (aMTG, BA 21 and 38), posterior middle temporal gyrus (pMTG, BA37/39), and IFG. © 2004 Elsevier Inc. All rights reserved.
Resumo:
Speech perception routinely takes place in noisy or degraded listening environments, leading to ambiguity in the identity of the speech token. Here, I present one review paper and two experimental papers that highlight cognitive and visual speech contributions to the listening process, particularly in challenging listening environments. First, I survey the literature linking audiometric age-related hearing loss and cognitive decline and review the four proposed causal mechanisms underlying this link. I argue that future research in this area requires greater consideration of the functional overlap between hearing and cognition. I also present an alternative framework for understanding causal relationships between age-related declines in hearing and cognition, with emphasis on the interconnected nature of hearing and cognition and likely contributions from multiple causal mechanisms. I also provide a number of testable hypotheses to examine how impairments in one domain may affect the other. In my first experimental study, I examine the direct contribution of working memory (through a cognitive training manipulation) on speech in noise comprehension in older adults. My results challenge the efficacy of cognitive training more generally, and also provide support for the contribution of sentence context in reducing working memory load. My findings also challenge the ubiquitous use of the Reading Span test as a pure test of working memory. In a second experimental (fMRI) study, I examine the role of attention in audiovisual speech integration, particularly when the acoustic signal is degraded. I demonstrate that attentional processes support audiovisual speech integration in the middle and superior temporal gyri, as well as the fusiform gyrus. My results also suggest that the superior temporal sulcus is sensitive to intelligibility enhancement, regardless of how this benefit is obtained (i.e., whether it is obtained through visual speech information or speech clarity). In addition, I also demonstrate that both the cingulo-opercular network and motor speech areas are recruited in difficult listening conditions. Taken together, these findings augment our understanding of cognitive contributions to the listening process and demonstrate that memory, working memory, and executive control networks may flexibly be recruited in order to meet listening demands in challenging environments.
Resumo:
The aim of this work is to identify key factors of a sustainable urban mobility concept in a particular context. A multiple criteria decision analysis method was developed to identify the main variables associated to the concept. Looking at the results obtained in 11 cities of the five Brazilian regions, we conclude that the method is able to capture the different views and approaches discussed in the formulation of the mobility concept. Therefore, it can be used as a starting point for the formulation of public policies and also in the development of tools designed for monitoring the mobility conditions. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Magdeburg, Univ., Fak. für Elektrotechnik und Informationstechnik, Diss., 2007
Resumo:
We analyze the social representations of violence against women from the perspective of city managers, professionals and health workers in rural settings of the southern half of Rio Grande do Sul. The study has a qualitative approach and adds a theoretical/methodological perspective of social representations. The data were generated by means of the associative method, question-stimulus of words and expressions emergence. The analysis of word association was performed with EVOC software, considering frequency and order of association with inducing terms. Participants recognize violence against women as gender destination that induces consent, resignation, guilt and fear, and results in naturalization and trivialization of this social phenomenon. We highlight the need to produce ruptures in established and traditional forms of health care, in the conservative and stereotypical views of violence, favoring access to friendly service and avoiding the reproduction of gender inequalities.
Resumo:
This study explored changes in scalp electrophysiology across two Working Memory (WM) tasks and two age groups. Continuous electroencephalography (EEG) was recorded from 18 healthy adults (18-34 years) and 12 healthy adolescents (14-17) during the performance of two Oculomotor Delayed Response (ODR) WM tasks; (i.e. eye movements were the metric of motor response). Delay-period, EEG data in the alpha frequency was sampled from anterior and parietal scalp sites to achieve a general measure of frontal and parietal activity, respectively. Frontal-parietal, alpha coherence was calculated for each participant for each ODR-WM task. Coherence significantly decreased in adults moving across the two ODR tasks, whereas, coherence significantly increased in adolescents moving across the two ODR tasks. The effects of task in the adolescent and adult groups were large and medium, respectively. Within the limits of this study, the results provide empirical support that WM development during adolescence include complex, qualitative, change.
Resumo:
Speech processing and consequent recognition are important areas of Digital Signal Processing since speech allows people to communicate more natu-rally and efficiently. In this work, a speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing speech, features are to be ex-tracted from speech and hence feature extraction method plays an important role in speech recognition. Here, front end processing for extracting the features is per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose. After classification using Naive Bayes classifier, DWT produced a recognition accuracy of 83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new feature extraction method which produces improvements in the recognition accuracy. So, a new method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.
Resumo:
Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.
Resumo:
Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.
Resumo:
Performance of any continuous speech recognition system is dependent on the accuracy of its acoustic model. Hence, preparation of a robust and accurate acoustic model lead to satisfactory recognition performance for a speech recognizer. In acoustic modeling of phonetic unit, context information is of prime importance as the phonemes are found to vary according to the place of occurrence in a word. In this paper we compare and evaluate the effect of context dependent tied (CD tied) models, context dependent (CD) and context independent (CI) models in the perspective of continuous speech recognition of Malayalam language. The database for the speech recognition system has utterance from 21 speakers including 11 female and 10 males. Our evaluation results show that CD tied models outperforms CI models over 21%.
Resumo:
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.