924 resultados para acoustic speech recognition system
Resumo:
Detecting user affect automatically during real-time conversation is the main challenge towards our greater aim of infusing social intelligence into a natural-language mixed-initiative High-Fidelity (Hi-Fi) audio control spoken dialog agent. In recent years, studies on affect detection from voice have moved on to using realistic, non-acted data, which is subtler. However, it is more challenging to perceive subtler emotions and this is demonstrated in tasks such as labelling and machine prediction. This paper attempts to address part of this challenge by considering the role of user satisfaction ratings and also conversational/dialog features in discriminating contentment and frustration, two types of emotions that are known to be prevalent within spoken human-computer interaction. However, given the laboratory constraints, users might be positively biased when rating the system, indirectly making the reliability of the satisfaction data questionable. Machine learning experiments were conducted on two datasets, users and annotators, which were then compared in order to assess the reliability of these datasets. Our results indicated that standard classifiers were significantly more successful in discriminating the abovementioned emotions and their intensities (reflected by user satisfaction ratings) from annotator data than from user data. These results corroborated that: first, satisfaction data could be used directly as an alternative target variable to model affect, and that they could be predicted exclusively by dialog features. Second, these were only true when trying to predict the abovementioned emotions using annotator?s data, suggesting that user bias does exist in a laboratory-led evaluation.
Resumo:
This paper describes the language identification (LID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We show that techniques originally developed for LID on telephone speech (e.g., for the NIST language recognition evaluations) remain effective on the noisy RATS data, provided that careful consideration is applied when designing the training and development sets. In addition, we show significant improvements from the use of Wiener filtering, neural network based and language dependent i-vector modeling, and fusion.
Resumo:
This paper proposes a methodology for developing a speech into sign language translation system considering a user-centered strategy. This method-ology consists of four main steps: analysis of technical and user requirements, data collection, technology adaptation to the new domain, and finally, evalua-tion of the system. The two most demanding tasks are the sign generation and the translation rules generation. Many other aspects can be updated automatical-ly from a parallel corpus that includes sentences (in Spanish and LSE: Lengua de Signos Española) related to the application domain. In this paper, we explain how to apply this methodology in order to develop two translation systems in two specific domains: bus transport information and hotel reception.
Resumo:
When designing human-machine interfaces it is important to consider not only the bare bones functionality but also the ease of use and accessibility it provides. When talking about voice-based inter- faces, it has been proven that imbuing expressiveness into the synthetic voices increases signi?cantly its perceived naturalness, which in the end is very helpful when building user friendly interfaces. This paper proposes an adaptation based expressiveness transplantation system capable of copying the emotions of a source speaker into any desired target speaker with just a few minutes of read speech and without requiring the record- ing of additional expressive data. This system was evaluated through a perceptual test for 3 speakers showing up to an average of 52% emotion recognition rates relative to the natural voice recognition rates, while at the same time keeping good scores in similarity and naturality.
Resumo:
The specificity of the yeast proprotein-processing Kex2 protease was examined in vivo by using a sensitive, quantitative assay. A truncated prepro-α-factor gene encoding an α-factor precursor with a single α-factor repeat was constructed with restriction sites for cassette mutagenesis flanking the single Kex2 cleavage site (-SLDKR↓EAEA-). All of the 19 substitutions for the Lys (P2) residue in the cleavage site were made. The wild-type and mutant precursors were expressed in a yeast strain lacking the chromosomal genes encoding Kex2 and prepro-α-factor. Cleavage of the 20 sites by Kex2, expressed at the wild-type level, was assessed by using a quantitative-mating assay with an effective range greater than six orders of magnitude. All substitutions for Lys at P2 decreased mating, from 2-fold for Arg to >106-fold for Trp. Eviction of the Kex2-encoding plasmid indicated that cleavage of mutant sites by other cellular proteases was not a complicating factor. Mating efficiencies of strains expressing the mutant precursors correlated well with the specificity (kcat/KM) of purified Kex2 for comparable model peptide substrates, validating the in vivo approach as a quantitative method. The results support the conclusion that KM, which is heavily influenced by the nature of the P2 residue, is a major determinant of cleavage efficiency in vivo. P2 preference followed the rank order: Lys > Arg > Thr > Pro > Glu > Ile > Ser > Ala > Asn > Val > Cys > AsP > Gln > Gly > His > Met > Leu > Tyr > Phe > Trp.
Resumo:
Auditory responses in the caudomedial neostriatum (NCM) of the zebra finch (Taeniopygia guttata) forebrain habituate to repeated presentations of a novel conspecific song. This habituation is long lasting and specific to individual stimuli. We here test the acoustic and ethological basis of this stimulus-specific habituation by recording extracellular multiunit activity in the NCM of awake male and female zebra finches presented with a variety of conspecific and heterospecific vocalizations, white noise, and tones. Initial responses to conspecific song and calls and to human speech were higher than responses to the other stimuli. Immediate habituation rates were high for all novel stimuli except tones, which habituated at a lower rate. Habituation to conspecific calls and songs outlasted habituation to other stimuli. The extent of immediate habituation induced by a particular novel song was not diminished when other conspecific songs were presented in alternation. In addition, the persistence of habituation was not diminished by exposure to other songs before testing, nor was it influenced by gender or laterality. Our results suggest that the NCM is specialized for remembering the calls and songs of many individual conspecifics.
Resumo:
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Resumo:
Deep brain stimulation (DBS) provides significant therapeutic benefit for movement disorders such as Parkinson’s disease (PD). Current DBS devices lack real-time feedback (thus are open loop) and stimulation parameters are adjusted during scheduled visits with a clinician. A closed-loop DBS system may reduce power consumption and side effects by adjusting stimulation parameters based on patient’s behavior. Thus behavior detection is a major step in designing such systems. Various physiological signals can be used to recognize the behaviors. Subthalamic Nucleus (STN) Local field Potential (LFP) is a great candidate signal for the neural feedback, because it can be recorded from the stimulation lead and does not require additional sensors. This thesis proposes novel detection and classification techniques for behavior recognition based on deep brain LFP. Behavior detection from such signals is the vital step in developing the next generation of closed-loop DBS devices. LFP recordings from 13 subjects are utilized in this study to design and evaluate our method. Recordings were performed during the surgery and the subjects were asked to perform various behavioral tasks. Various techniques are used understand how the behaviors modulate the STN. One method studies the time-frequency patterns in the STN LFP during the tasks. Another method measures the temporal inter-hemispheric connectivity of the STN as well as the connectivity between STN and Pre-frontal Cortex (PFC). Experimental results demonstrate that different behaviors create different m odulation patterns in STN and it’s connectivity. We use these patterns as features to classify behaviors. A method for single trial recognition of the patient’s current task is proposed. This method uses wavelet coefficients as features and support vector machine (SVM) as the classifier for recognition of a selection of behaviors: speech, motor, and random. The proposed method is 82.4% accurate for the binary classification and 73.2% for classifying three tasks. As the next step, a practical behavior detection method which asynchronously detects behaviors is proposed. This method does not use any priori knowledge of behavior onsets and is capable of asynchronously detect the finger movements of PD patients. Our study indicates that there is a motor-modulated inter-hemispheric connectivity between LFP signals recorded bilaterally from STN. We utilize a non-linear regression method to measure this inter-hemispheric connectivity and to detect the finger movements. Our experimental results using STN LFP recorded from eight patients with PD demonstrate this is a promising approach for behavior detection and developing novel closed-loop DBS systems.
Resumo:
Includes bibliographic references (216).
Resumo:
Mode of access: Internet.
Resumo:
Mode of access: Internet.
Resumo:
Mode of access: Internet.
Resumo:
Mode of access: Internet.