971 resultados para Speech Processing


Relevância:

70.00% 70.00%

Publicador:

Relevância:

70.00% 70.00%

Publicador:

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Speech rhythm is an essential part of speech processing. It is the outcome of the workings of a combination of linguistic and non-linguistic parameters, many of which also have other functions in speech. This study focusses on the acoustic and auditive realization of two linguistic parameters of rhythm: (1) sentence stress, and (2) speech rate and pausing. The aim was to find out how well Finnish comprehensive school pupils realize these two parameters in English and how native speakers of English react to Finnish pupils English rhythm. The material was elicited by means of a story-telling task and questionnaires. Three female and three male pupils representing different levels of oral skills in English were selected as the experimental group. The control group consisted of two female and two male native speakers of English. The stories were analysed acoustically and auditorily with respect to interstress intervals, weak forms, fundamental frequency, pausing, and speech as well as articulation rate. In addition, 52 native speakers of English were asked to rate the intelligibility of the Finnish pupils English with respect to speech rhythm and give their attitudes on what the pupils sounded like. Results showed that Finnish pupils can produce isochronous interstress intervals in English, but that too large a proportion of these intervals contain pauses. A closer analysis of the pauses revealed that Finnish pupils pause too frequently and in inappropriate places when they speak English. Frequent pausing was also found to cause slow speech rates. The findings of the fundamental frequency (F0) measurements indicate that Finnish pupils tend to make a slightly narrower F0 difference between stressed and unstressed syllables than the native speakers of English. Furthermore, Finnish pupils appear to know how to reduce the duration and quality of unstressed sounds, but they fail to do it frequently enough. Native listeners gave lower intelligibility and attitude scores to pupils with more anomalous speech rhythm. Finnish pupils rhythm anomalies seemed to derive from various learning- or learner-related factors rather than from the differences between English and Finnish. This study demonstrates that pausing may be a more important component of English speech rhythm than sentence stress as far as Finnish adolescents are concerned and that interlanguage development is affected by various factors and characterised by jumps or periods of stasis. Other theoretical, methodological and pedagogical implications of the results are also discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Automatic and accurate detection of the closure-burst transition events of stops and affricates serves many applications in speech processing. A temporal measure named the plosion index is proposed to detect such events, which are characterized by an abrupt increase in energy. Using the maxima of the pitch-synchronous normalized cross correlation as an additional temporal feature, a rule-based algorithm is designed that aims at selecting only those events associated with the closure-burst transitions of stops and affricates. The performance of the algorithm, characterized by receiver operating characteristic curves and temporal accuracy, is evaluated using the labeled closure-burst transitions of stops and affricates of the entire TIMIT test and training databases. The robustness of the algorithm is studied with respect to global white and babble noise as well as local noise using the TIMIT test set and on telephone quality speech using the NTIMIT test set. For these experiments, the proposed algorithm, which does not require explicit statistical training and is based on two one-dimensional temporal measures, gives a performance comparable to or better than the state-of-the-art methods. In addition, to test the scalability, the algorithm is applied on the Buckeye conversational speech corpus and databases of two Indian languages. (C) 2014 Acoustical Society of America.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Speech polarity detection is a crucial first step in many speech processing techniques. In this paper, an algorithm is proposed that improvises the existing technique using the skewness of the voice source (VS) signal. Here, the integrated linear prediction residual (ILPR) is used as the VS estimate, which is obtained using linear prediction on long-term frames of the low-pass filtered speech signal. This excludes the unvoiced regions from analysis and also reduces the computation. Further, a modified skewness measure is proposed for decision, which also considers the magnitude of the skewness of the ILPR along with its sign. With the detection error rate (DER) as the performance metric, the algorithm is tested on 8 large databases and its performance (DER=0.20%) is found to be comparable to that of the best technique (DER=0.06%) on both clean and noisy speech. Further, the proposed method is found to be ten times faster than the best technique.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents an overview of the Text-to-Speech synthesis system developed at the Institute for Language and Speech Processing (ILSP). It focuses on the key issues regarding the design of the system components. The system currently fully supports three languages (Greek, English, Bulgarian) and is designed in such a way to be as language and speaker independent as possible. Also, experimental results are presented which show that the system produces high quality synthetic speech in terms of naturalness and intelligibility. The system was recently ranked among the first three systems worldwide in terms of achieved quality for the English language, at the international Blizzard Challenge 2013 workshop. © 2014 Springer International Publishing.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A neuroanatomical parcellation system is described which encompasses the entire cerebral cortex and the cerebellum. The cortical system modified version of the scheme described by Caviness et al. (1996) and is designed particularly for studies of speech processing. The cerebellum is parcellated into 6 cortical regions of interest (ROIs) and an ROI representing the deep cerebellar nuclei in each hemisphere. The boundaries of each ROI are based on individual anatomical markers that are clearly visible from standard structural MRI acquistions. The system permits averaginh of functional imaging data sets from multiple sujects while accounting for individual anatomical variability. Used in conjuction with region-of-interest analysis techniques such as that described by Nieto-Castanon et al. (2003), the parcellation system provides a more powerful means of analyzing functional data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The use of visual cues during the processing of audiovisual (AV) speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6–9 months to 14–16 months of age. We used eye-tracking to examine whether individual differences in visual attention during AV processing of speech in 6–9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6–9 month old infants also participated in an event-related potential (ERP) AV task within the same experimental session. Language development was then followed-up at the age of 14–16 months, using two measures of language development, the Preschool Language Scale and the Oxford Communicative Development Inventory. The results show that those infants who were less efficient in auditory speech processing at the age of 6–9 months had lower receptive language scores at 14–16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audiovisually incongruent stimuli at 6–9 months were both significantly associated with language development at 14–16 months. These findings add to the understanding of individual differences in neural signatures of AV processing and associated looking behavior in infants.