980 resultados para Phonetic Detail
Resumo:
TEMA: análise acústica da fala. OBJETIVO: analisar acusticamente as substituições envolvendo o contraste entre /t/ e /k/ na fala de crianças em aquisição típica e desviante do contraste acima referido, a fim de identificar e quantificar a existência de contrastes encobertos. MÉTODO: foi elaborado um experimento de produção de fala que envolveu a repetição de palavras, que combinavam /t/ e /k/ com /a/ e /u/ na posição acentuada, por 9 crianças divididas em três grupos: crianças em processo de aquisição do contraste investigado (G1); crianças com transtorno fonológico (G2) e crianças com produções típicas (G3). Com o uso do software Praat, as produções foram editadas e analisadas de acordo com os seguintes parâmetros acústicos: características espectrais do burst; transição CV e características temporais. Os testes estatísticos utilizados foram ANOVA de Friedman e Manova. A significância estatística adotada foi menor que 0,05. RESULTADOS: tanto nas produções das crianças do G2 quanto nas produções das crianças do G1, detectamos, em grande medida (80% e 57,4%, respectivamente), a presença de contrastes encobertos nos erros de substituição das oclusivas investigadas. Adicionalmente, a análise acústica revelou diferenças em como as crianças utilizam as pistas fonético-acústicas para marcarem a distinção entre /t/ e /k/. CONCLUSÃO: muitas das substituições presentes da produção de fala de crianças em processo de aquisição típico e desviante tratam-se na verdade de contrastes fônicos encobertos. Além disso, o uso da análise acústica permitiu a detecção de diferenças sutis da produção da fala das crianças.
Resumo:
The present study focuses on the presence of covert contrasts in the speech of children with a phonological disorder. The hypothesis is that children with phonological disorders manipulate secondary acoustic cues in an attempt to distinguish the phonological contrasts. We used five audio recordings of the speech of five children with speech disorders, between four and five years of age, who showed the so-called “phonic substitution” involving the sound group of the fricatives. The data were edited and analyzed using the software PRAAT. A phonetic transcription of the first repetition of each child was performed by three evaluators, reaching a 66% agreement level. After the transcription, we carried out a contrastive phonological analysis of the production of the five children and, finally, an acoustic analysis of all the “substitutions”, based on six parameters. We discovered the existence of covert contrasts in the productions auditorily regarded as homophones by the evaluators, representing a total of 54% of total substitutions identified through an impressionistic approach by the evaluators. Children with phonological disorders are seen to rely on secondary acoustic cues in an attempt to distinguish fricative phonemes. The data obtained in this study allow us to reflect on the importance of considering the phonetic detail within the phonological models.
Resumo:
Spoken term detection (STD) popularly involves performing word or sub-word level speech recognition and indexing the result. This work challenges the assumption that improved speech recognition accuracy implies better indexing for STD. Using an index derived from phone lattices, this paper examines the effect of language model selection on the relationship between phone recognition accuracy and STD accuracy. Results suggest that language models usually improve phone recognition accuracy but their inclusion does not always translate to improved STD accuracy. The findings suggest that using phone recognition accuracy to measure the quality of an STD index can be problematic, and highlight the need for an alternative that is more closely aligned with the goals of the specific detection task.
Resumo:
While spoken term detection (STD) systems based on word indices provide good accuracy, there are several practical applications where it is infeasible or too costly to employ an LVCSR engine. An STD system is presented, which is designed to incorporate a fast phonetic decoding front-end and be robust to decoding errors whilst still allowing for rapid search speeds. This goal is achieved through mono-phone open-loop decoding coupled with fast hierarchical phone lattice search. Results demonstrate that an STD system that is designed with the constraint of a fast and simple phonetic decoding front-end requires a compromise to be made between search speed and search accuracy.
Resumo:
This paper introduces a novel technique to directly optimise the Figure of Merit (FOM) for phonetic spoken term detection. The FOM is a popular measure of sTD accuracy, making it an ideal candiate for use as an objective function. A simple linear model is introduced to transform the phone log-posterior probabilities output by a phe classifier to produce enhanced log-posterior features that are more suitable for the STD task. Direct optimisation of the FOM is then performed by training the parameters of this model using a non-linear gradient descent algorithm. Substantial FOM improvements of 11% relative are achieved on held-out evaluation data, demonstrating the generalisability of the approach.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but such approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks on the other hand, optimise the parameters of speech enhancement algorithms based on state sequences generated by a speech recogniser for utterances of known transcriptions. Previous applications of LIMA frameworks have generated a set of global enhancement parameters for all model states without taking in account the distribution of model occurrence, making optimisation susceptible to favouring frequently occurring models, in particular silence. In this paper, we demonstrate the existence of highly disproportionate phonetic distributions on two corpora with distinct speech tasks, and propose to normalise the influence of each phone based on a priori occurrence probabilities. Likelihood analysis and speech recognition experiments verify this approach for improving ASR performance in noisy environments.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
For the first time in human history, large volumes of spoken audio are being broadcast, made available on the internet, archived, and monitored for surveillance every day. New technologies are urgently required to unlock these vast and powerful stores of information. Spoken Term Detection (STD) systems provide access to speech collections by detecting individual occurrences of specified search terms. The aim of this work is to develop improved STD solutions based on phonetic indexing. In particular, this work aims to develop phonetic STD systems for applications that require open-vocabulary search, fast indexing and search speeds, and accurate term detection. Within this scope, novel contributions are made within two research themes, that is, accommodating phone recognition errors and, secondly, modelling uncertainty with probabilistic scores. A state-of-the-art Dynamic Match Lattice Spotting (DMLS) system is used to address the problem of accommodating phone recognition errors with approximate phone sequence matching. Extensive experimentation on the use of DMLS is carried out and a number of novel enhancements are developed that provide for faster indexing, faster search, and improved accuracy. Firstly, a novel comparison of methods for deriving a phone error cost model is presented to improve STD accuracy, resulting in up to a 33% improvement in the Figure of Merit. A method is also presented for drastically increasing the speed of DMLS search by at least an order of magnitude with no loss in search accuracy. An investigation is then presented of the effects of increasing indexing speed for DMLS, by using simpler modelling during phone decoding, with results highlighting the trade-off between indexing speed, search speed and search accuracy. The Figure of Merit is further improved by up to 25% using a novel proposal to utilise word-level language modelling during DMLS indexing. Analysis shows that this use of language modelling can, however, be unhelpful or even disadvantageous for terms with a very low language model probability. The DMLS approach to STD involves generating an index of phone sequences using phone recognition. An alternative approach to phonetic STD is also investigated that instead indexes probabilistic acoustic scores in the form of a posterior-feature matrix. A state-of-the-art system is described and its use for STD is explored through several experiments on spontaneous conversational telephone speech. A novel technique and framework is proposed for discriminatively training such a system to directly maximise the Figure of Merit. This results in a 13% improvement in the Figure of Merit on held-out data. The framework is also found to be particularly useful for index compression in conjunction with the proposed optimisation technique, providing for a substantial index compression factor in addition to an overall gain in the Figure of Merit. These contributions significantly advance the state-of-the-art in phonetic STD, by improving the utility of such systems in a wide range of applications.
Resumo:
This work proposes to improve spoken term detection (STD) accuracy by optimising the Figure of Merit (FOM). In this article, the index takes the form of phonetic posterior-feature matrix. Accuracy is improved by formulating STD as a discriminative training problem and directly optimising the FOM, through its use as an objective function to train a transformation of the index. The outcome of indexing is then a matrix of enhanced posterior-features that are directly tailored for the STD task. The technique is shown to improve the FOM by up to 13% on held-out data. Additional analysis explores the effect of the technique on phone recognition accuracy, examines the actual values of the learned transform, and demonstrates that using an extended training data set results in further improvement in the FOM.
Resumo:
This article considers the implications of the decision in Clayton Utz Lawyers v P & W Enterprises Pty Ltd [2011] QDC 5, and the meaning of "itemised bill" as defined in the Legal Profession Act 2007 (Qld).
Resumo:
The incipient Underground Coal Gasification (UCG) industry in Queensland, Australia, undertook three trial projects in two Mesozoic basins of southeast Queensland. The experiences of these three operations provide useful retrospective insight into gasifier productivity. This paper identifies key output measures of gasifier ‘success’ including output gas composition, presence of contaminants in groundwater and consistency of chamber operation. Likewise, a review of the geological and hydrogeological understanding of each site prior to gasifier commissioning was undertaken. Productivity parameters from gasification were then correlated against the level of baseline geological/hydrogeological understanding for each site. The aim of the study was to identify the optimum scope of geological and hydrogeological understanding required at the site assessment phase to ensure safe, maximum gasifier output during production phase. This approach allows identification of poor or unexpected performance that is attributable to pre-existing uncertainty. A historical review of gasifier conditions inferred from the three trial projects is presented. Hence from the Queensland experiences it is possible to identify what aspects of baseline geological understanding should be clearly understood at the site selection phase in order to limit anomalous gasifier performance and undesirable deviations, and maximise production output.