997 resultados para Speech Dialog Systems
Resumo:
The present thesis focuses on the overall structure of the language of two types of Speech Exchange Systems (SES) : Interview (INT) and Conversation (CON). The linguistic structure of INT and CON are quantitatively investigated on three different but interrelated levels of analysis : Lexis, Syntax and Information Structure. The corpus of data 1n vest1gated for the project consists of eight sessions of pairs of conversants in carefully planned interviews followed by unplanned, surreptitiously recorded conversational encounters of the same pairs of speakers. The data comprise a total of approximately 15.200 words of INT talk and of about 19.200 words in CON. Taking account of the debatable assumption that the language of SES might be complex on certain linguistic levels (e.g. syntax) (Halliday 1979) and might be simple on others (e.g. lexis) in comparison to written discourse, the thesis sets out to investigate this complexity using a statistical approach to the computation of the structures recurrent in the language of INT and CON. The findings indicate clearly the presence of linguistic complexity in both types. They also show the language of INT to be slightly more syntactically and lexically complex than that of CON. Lexical density seems to be relatively high in both types of spoken discourse. The language of INT seems to be more complex than that of CON on the level of information structure too. This is manifested in the greater use of Inferable and other linguistically complex entities of discourse. Halliday's suggestion that the language of SES is syntactically complex is confirmed but not the one that the more casual the conversation is the more syntactically complex it becomes. The results of the analysis point to the general conclusion that the linguistic complexity of types of SES is not only in the high recurrence of syntactic structures, but also in the combination of these features with each other and with other linguistic and extralinguistic features. The linguistic analysis of the language of SES can be useful in understanding and pinpointing the intricacies of spoken discourse in general and will help discourse analysts and applied linguists in exploiting it both for theoretical and pedagogical purposes.
Resumo:
While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence.
Resumo:
In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initial expectations, being Danish a language with a complex syllabic structure and thus difficult to be rule-driven. Comparison with data-driven syllabification system using artificial neural networks showed a higher accuracy rate of the former system.
Resumo:
Current development platforms for designing spoken dialog services feature different kinds of strategies to help designers build, test, and deploy their applications. In general, these platforms are made up of several assistants that handle the different design stages (e.g. definition of the dialog flow, prompt and grammar definition, database connection, or to debug and test the running of the application). In spite of all the advances in this area, in general the process of designing spoken-based dialog services is a time consuming task that needs to be accelerated. In this paper we describe a complete development platform that reduces the design time by using different types of acceleration strategies based on using information from the data model structure and database contents, as well as cumulative information obtained throughout the successive steps in the design. Thanks to these accelerations, the interaction with the platform is simplified and the design is reduced, in most cases, to simple confirmations to the proposals that the platform automatically provides at each stage. Different kinds of proposals are available to complete the application flow such as the possibility of selecting which information slots should be requested to the user together, predefined templates for common dialogs, the most probable actions that make up each state defined in the flow, different solutions to solve specific speech-modality problems such as the presentation of the lists of retrieved results after querying the backend database. The platform also includes accelerations for creating speech grammars and prompts, and the SQL queries for accessing the database at runtime. Finally, we will describe the setup and results obtained in a simultaneous summative, subjective and objective evaluations with different designers used to test the usability of the proposed accelerations as well as their contribution to reducing the design time and interaction.
Resumo:
Mode of access: Internet.
Resumo:
This thesis addresses the viability of automatic speech recognition for control room systems; with careful system design, automatic speech recognition (ASR) devices can be useful means for human computer interaction in specific types of task. These tasks can be defined as complex verbal activities, such as command and control, and can be paired with spatial tasks, such as monitoring, without detriment. It is suggested that ASR use be confined to routine plant operation, as opposed the critical incidents, due to possible problems of stress on the operators' speech. It is proposed that using ASR will require operators to adapt a commonly used skill to cater for a novel use of speech. Before using the ASR device, new operators will require some form of training. It is shown that a demonstration by an experienced user of the device can lead to superior performance than instructions. Thus, a relatively cheap and very efficient form of operator training can be supplied by demonstration by experienced ASR operators. From a series of studies into speech based interaction with computers, it is concluded that the interaction be designed to capitalise upon the tendency of operators to use short, succinct, task specific styles of speech. From studies comparing different types of feedback, it is concluded that operators be given screen based feedback, rather than auditory feedback, for control room operation. Feedback will take two forms: the use of the ASR device will require recognition feedback, which will be best supplied using text; the performance of a process control task will require task feedback integrated into the mimic display. This latter feedback can be either textual or symbolic, but it is suggested that symbolic feedback will be more beneficial. Related to both interaction style and feedback is the issue of handling recognition errors. These should be corrected by simple command repetition practices, rather than use error handling dialogues. This method of error correction is held to be non intrusive to primary command and control operations. This thesis also addresses some of the problems of user error in ASR use, and provides a number of recommendations for its reduction.
Resumo:
Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 12 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition
Resumo:
The purpose of this study was to explore the potential advantages, both theoretical and applied, of preserving low-frequency acoustic hearing in cochlear implant patients. Several hypotheses are presented that predict that residual low-frequency acoustic hearing along with electric stimulation for high frequencies will provide an advantage over traditional long-electrode cochlear implants for the recognition of speech in competing backgrounds. A simulation experiment in normal-hearing subjects demonstrated a clear advantage for preserving low-frequency residual acoustic hearing for speech recognition in a background of other talkers, but not in steady noise. Three subjects with an implanted "short-electrode" cochlear implant and preserved low-frequency acoustic hearing were also tested on speech recognition in the same competing backgrounds and compared to a larger group of traditional cochlear implant users. Each of the three short-electrode subjects performed better than any of the traditional long-electrode implant subjects for speech recognition in a background of other talkers, but not in steady noise, in general agreement with the simulation studies. When compared to a subgroup of traditional implant users matched according to speech recognition ability in quiet, the short-electrode patients showed a 9-dB advantage in the multitalker background. These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients. The results are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.
Resumo:
The purpose of the present study was to examine the benefits of providing audible speech to listeners with sensorineural hearing loss when the speech is presented in a background noise. Previous studies have shown that when listeners have a severe hearing loss in the higher frequencies, providing audible speech (in a quiet background) to these higher frequencies usually results in no improvement in speech recognition. In the present experiments, speech was presented in a background of multitalker babble to listeners with various severities of hearing loss. The signal was low-pass filtered at numerous cutoff frequencies and speech recognition was measured as additional high-frequency speech information was provided to the hearing-impaired listeners. It was found in all cases, regardless of hearing loss or frequency range, that providing audible speech resulted in an increase in recognition score. The change in recognition as the cutoff frequency was increased, along with the amount of audible speech information in each condition (articulation index), was used to calculate the "efficiency" of providing audible speech. Efficiencies were positive for all degrees of hearing loss. However, the gains in recognition were small, and the maximum score obtained by an listener was low, due to the noise background. An analysis of error patterns showed that due to the limited speech audibility in a noise background, even severely impaired listeners used additional speech audibility in the high frequencies to improve their perception of the "easier" features of speech including voicing
Resumo:
Parkinson's disease (PD) is a neurodegenerative movement disorder primarily due to basal ganglia dysfunction. While much research has been conducted on Parkinsonian deficits in the traditional arena of musculoskeletal limb movement, research in other functional motor tasks is lacking. The present study examined articulation in PD with increasingly complex sequences of articulatory movement. Of interest was whether dysfunction would affect articulation in the same manner as in limb-movement impairment. In particular, since very Similar (homogeneous) articulatory sequences (the tongue twister effect) are more difficult for healthy individuals to achieve than dissimilar (heterogeneous) gestures, while the reverse may apply for skeletal movements in PD, we asked which factor would dominate when PD patients articulated various grades of artificial tongue twisters: the influence of disease or a possible difference between the two motor systems. Execution was especially impaired when articulation involved a sequence of motor program heterogeneous in terms of place of articulation. The results are suggestive of a hypokinesic tendency in complex sequential articulatory movement as in limb movement. It appears that PD patients do show abnormalities in articulatory movement which are similar to those of the musculoskeletal system. The present study suggests that an underlying disease effect modulates movement impairment across different functional motor systems. (C) 1998 Academic Press.
Resumo:
Glutamate is the major excitatory neurotransmitter in the retina and is removed from the extracellular space by an energy-dependent process involving neuronal and glial cell transporters. The radial glial Muller cells express the glutamate transporter, GLAST, and preferentially accumulate glutamate. However, during an ischaemic episode, extracellular glutamate concentrations may rise to excitotoxic levels. Is this catastrophic rise in extracellular glutamate due to a failure of GLAST? Using immunocytochemistry, we monitored the transport of the glutamate transporter substrate, D-aspartate, in the retina under normal and ischaemic conditions. Two models of compromised retinal perfusion were compared: (1) Anaesthetised rats had their carotid arteries occluded for 7 days to produce a chronic reduction in retinal blood flow. Retinal function was assessed by electroretinography. D-aspartate was injected into the eye for 45 min, Following euthanasia, the retina was processed for D-aspartate. GLAST and glutamate immunocytochemistry. Although reduced retinal perfusion suppresses the electroretinogram b-wave, neither retinal histology, GLAST expression, nor the ability of Muller cells to uptake D-aspartate is affected. As this insult does not appear to cause excitotoxic neuronal damage, these data suggest that GLAST function and glutamate clearance are maintained during periods of reduced retinal perfusion. (2) Occlusion of the central retinal artery for 60 min abolishes retinal perfusion, inducing histological damage and electroretinogram suppression. Although GLAST expression appears to be normal. its ability to transport D-aspartate into Muller cells is greatly reduced. Interestingly, D-aspartate is transported into neuronal cells, i.e. photoreceptors, bipolar and ganglion cells. This suggests that while GLAST is vitally important for the clearance of excess extracellular glutamate, its capability to sustain inward transport is particularly susceptible to an acute ischaemic attack. Manipulation of GLAST function could alleviate the degeneration and blindness that result from ischaemic retinal disease. (C) 2001 Elsevier Science Ltd, All rights reserved.
Resumo:
The first and second authors would like to thank the support of the PhD grants with references SFRH/BD/28817/2006 and SFRH/PROTEC/49517/2009, respectively, from Fundao para a Cincia e Tecnol ogia (FCT). This work was partially done in the scope of the project Methodologies to Analyze Organs from Complex Medical Images Applications to Fema le Pelvic Cavity, wi th reference PTDC/EEA- CRO/103320/2008, financially supported by FCT.
Resumo:
In this paper, a linguistically rule-based grapheme-to-phone (G2P) transcription algorithm is described for European Portuguese. A complete set of phonological and phonetic transcription rules regarding the European Portuguese standard variety is presented. This algorithm was implemented and tested by using online newspaper articles. The obtained experimental results gave rise to 98.80% of accuracy rate. Future developments in order to increase this value are foreseen. Our purpose with this work is to develop a module/ tool that can improve synthetic speech naturalness in European Portuguese. Other applications of this system can be expected like language teaching/learning. These results, together with our perspectives of future improvements, have proved the dramatic importance of linguistic knowledge on the development of Text-to-Speech systems (TTS).