966 resultados para Speech data


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper proposes an HMM-based approach to generating emotional intonation patterns. A set of models were built to represent syllable-length intonation units. In a classification framework, the models were able to detect a sequence of intonation units from raw fundamental frequency values. Using the models in a generative framework, we were able to synthesize smooth and natural sounding pitch contours. As a case study for emotional intonation generation, Maximum Likelihood Linear Regression (MLLR) adaptation was used to transform the neutral model parameters with a small amount of happy and sad speech data. Perceptual tests showed that listeners could identify the speech with the sad intonation 80% of the time. On the other hand, listeners formed a bimodal distribution in their ability to detect the system generated happy intontation and on average listeners were able to detect happy intonation only 46% of the time. © Springer-Verlag Berlin Heidelberg 2005.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper investigates the problem of speaker identi-fication and verification in noisy conditions, assuming that speechsignals are corrupted by environmental noise, but knowledgeabout the noise characteristics is not available. This research ismotivated in part by the potential application of speaker recog-nition technologies on handheld devices or the Internet. Whilethe technologies promise an additional biometric layer of securityto protect the user, the practical implementation of such systemsfaces many challenges. One of these is environmental noise. Due tothe mobile nature of such systems, the noise sources can be highlytime-varying and potentially unknown. This raises the require-ment for noise robustness in the absence of information about thenoise. This paper describes a method that combines multicondi-tion model training and missing-feature theory to model noisewith unknown temporal-spectral characteristics. Multiconditiontraining is conducted using simulated noisy data with limitednoise variation, providing a “coarse” compensation for the noise,and missing-feature theory is applied to refine the compensationby ignoring noise variation outside the given training conditions,thereby reducing the training and testing mismatch. This paperis focused on several issues relating to the implementation of thenew model for real-world applications. These include the gener-ation of multicondition training data to model noisy speech, thecombination of different training data to optimize the recognitionperformance, and the reduction of the model’s complexity. Thenew algorithm was tested using two databases with simulated andrealistic noisy speech data. The first database is a redevelopmentof the TIMIT database by rerecording the data in the presence ofvarious noise types, used to test the model for speaker identifica-tion with a focus on the varieties of noise. The second database isa handheld-device database collected in realistic noisy conditions,used to further validate the model for real-world speaker verifica-tion. The new model is compared to baseline systems and is foundto achieve lower error rates.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This letter describes a novel algorithm that is based on autoregressive decomposition and pole tracking used to recognize two patterns of speech data: normal voice and disphonic voice caused by nodules. The presented method relates the poles and the peaks of the signal spectrum which represent the periodic components of the voice. The results show that the perturbation contained in the signal is clearly depicted by pole's positions. Their variability is related to jitter and shimmer. The pole dispersion for pathological voices is about 20% higher than for normal voices, therefore, the proposed approach is a more trustworthy measure than the classical ones. © 2007.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present monograph studies the speech of the people from Ibiraci, MG. It is a phonetic study which collects speech data by recordings. It has been recorded the speech of eight informants with the age between 20 and 85 years old. The phonetic objective is to transcribe the recordings for acoustic analysis with Praat program, considering aspects such as duration and intonation. The present monograph represents a contribution for the linguistic phonetic description of a Brazilian Portuguese variety. In Brazil, linguistic studies have always been considered the speech of people from big and capital cities. There are a great deal of data related to those cities, but only a few data collection for the other cities and communities

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Spanish spoken in the city of Malaga, as Andalusian Spanish in general, was in the past often times considered an incorrect, low prestige variety of Spanish which was strongly associated with the poor, rural, backward South of Spain. This southern Spanish variety is easily recognised because of its innovative phonetic features that diverge from the national standard, even though in the past years in the case of some features a convergence to the standard could be observed. Despite its low prestige the local variety of Spanish is quite often used on social network sites, where it is considered as urban, fashion and cool. Thus, this paper aims at analysing whether the Spanish used in the city of Malaga is undergoing an attitude change. The study draws on naturally occurring speech, data extracted from Facebook and a series of questionnaires about the salience, attitude and perception of the local variety of Spanish. The influence of the social factors age and gender is analysed, since they are both known to play a crucial role in many instances of language change. The first is of special interest, as during the Franco dictatorship dialect use was not accepted in schools and in the media. Results show that, on the one hand, people from Malaga hold a more positive attitude towards non-standard features used on social network sites than in spoken language. On the other hand, young female users employ most non-standard features online and unsurprisingly have an extremely positive attitude towards this use. However, in spoken Spanish the use and attitude of some features is led by men and speakers educated during the Franco dictatorship, while other features, such as elision of intervocalic /d/, elision of final /ɾ/, /l/ and /d/ and ceceo, are predominantly employed by and younger speakers and women. These features are considered as salient in the local variety and work as local identity markers.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.