961 resultados para Perceptual Speech Evaluation
Resumo:
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.
Resumo:
This paper describes certain findings of intonation and intensity study of emotive speech with the minimal use of signal processing algorithms. This study was based on six basic emotions and the neutral, elicited from 1660 English utterances obtained from the speech recordings of six Indian women. The correctness of the emotional content was verified through perceptual listening tests. Marked similarity was noted among pitch contours of like-worded, positive valence emotions, though no such similarity was observed among the four negative valence emotional expressions. The intensity patterns were also studied. The results of the study were validated using arbitrary television recordings for four emotions. The findings are useful to technical researchers, social psychologists and to the common man interested in the dynamics of vocal expression of emotions
Resumo:
Abstract: The paper describes an auditory interface using directional sound as a possible support for pilots during approach in an instrument landing scenario. Several ways of producing directional sounds are illustrated. One using speaker pairs and controlling power distribution between speakers is evaluated experimentally. Results show, that power alone is insufficient for positioning single isolated sound events, although discrimination in the horizontal plane performs better than in the vertical. Additional sound parameters to compensate for this are proposed.
Resumo:
This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.
Resumo:
This dissertation examines the auditory-perceptual theory of speech perception and the concept and validity of perceptual target zones for vowels.
Resumo:
This study evaluates the progress of children with cochlear implants on the Speech Perception Instructional Curriculum and Evaluation (SPICE) auditory training protocol.
Resumo:
The equivalency of 34 TIMIT sentence lists was evaluated using adult cochlear implant recipients to determine if they should be recommended for future clinical or research use. Because these sentences incorporate gender, dialect and speaking rate variations, they have the potential to better represent speech recognition abilities in real-world communication situations.
Resumo:
This paper describes the results of an investigation which examined the efficacy of a feedback equalization algorithm incorporated into the Central Institute for the Deaf Wearable Digital Hearing Aid. The study examined whether the feedback equalization would allow for greater usable gains when subjects listened to soft speech signals, and if so, whether or not this would improve speech intelligibility.
Resumo:
This study evaluate the speech perception of hearing-impaired adults (with varying degrees of deafness) when using a video teleconferencing system (an integrated service digital network).
Resumo:
This paper discusses a study to validate the metric developed in the Geers and Moog Cochlear Implant Study at CID to measure the speech production of hearing impaired children.
Resumo:
Perceptual constancy effects are observed when differing amounts of reverberation are applied to a context sentence and a test‐word embedded in it. Adding reverberation to members of a “sir”‐“stir” test‐word continuum causes temporal‐envelope distortion, which has the effect of eliciting more sir responses from listeners. If the same amount of reverberation is also applied to the context sentence, the number of sir responses decreases again, indicating an “extrinsic” compensation for the effects of reverberation. Such a mechanism would effect perceptual constancy of phonetic perception when temporal envelopes vary in reverberation. This experiment asks whether such effects precede or follow grouping. Eight auditory‐filter shaped noise‐bands were modulated with the temporal envelopes that arise when speech is played through these filters. The resulting “gestalt” percept is the appropriate speech rather than the sound of noise‐bands, presumably due to across‐channel “grouping.” These sounds were played to listeners in “matched” conditions, where reverberation was present in the same bands in both context and test‐word, and in “mismatched” conditions, where the bands in which reverberation was added differed between context and test‐word. Constancy effects were obtained in matched conditions, but not in mismatched conditions, indicating that this type of constancy in hearing precedes across‐channel grouping.
Resumo:
Background: Word deafness is a rare condition where pathologically degraded speech perception results in impaired repetition and comprehension but otherwise intact linguistic skills. Although impaired linguistic systems in aphasias resulting from damage to the neural language system (here termed central impairments), have been consistently shown to be amenable to external influences such as linguistic or contextual information (e.g. cueing effects in naming), it is not known whether similar influences can be shown for aphasia arising from damage to a perceptual system (here termed peripheral impairments). Aims: This study aimed to investigate the extent to which pathologically degraded speech perception could be facilitated or disrupted by providing visual as well as auditory information. Methods and Procedures: In three word repetition tasks, the participant with word deafness (AB) repeated words under different conditions: words were repeated in the context of a pictorial or written target, a distractor (semantic, unrelated, rhyme or phonological neighbour) or a blank page (nothing). Accuracy and error types were analysed. Results: AB was impaired at repetition in the blank condition, confirming her degraded speech perception. Repetition was significantly facilitated when accompanied by a picture or written example of the word and significantly impaired by the presence of a written rhyme. Errors in the blank condition were primarily formal whereas errors in the rhyme condition were primarily miscues (saying the distractor word rather than the target). Conclusions: Cross-modal input can both facilitate and further disrupt repetition in word deafness. The cognitive mechanisms behind these findings are discussed. Both top-down influence from the lexical layer on perceptual processes as well as intra-lexical competition within the lexical layer may play a role.
Resumo:
Research evaluating perceptual responses to music has identified many structural features as correlates that might be incorporated in computer music systems for affectively charged algorithmic composition and/or expressive music performance. In order to investigate the possible integration of isolated musical features to such a system, a discrete feature known to correlate some with emotional responses – rhythmic density – was selected from a literature review and incorporated into a prototype system. This system produces variation in rhythm density via a transformative process. A stimulus set created using this system was then subjected to a perceptual evaluation. Pairwise comparisons were used to scale differences between 48 stimuli. Listener responses were analysed with Multidimensional scaling (MDS). The 2-Dimensional solution was then rotated to place the stimuli with the largest range of variation across the horizontal plane. Stimuli with variation in rhythmic density were placed further from the source material than stimuli that were generated by random permutation. This, combined with the striking similarity between the MDS scaling and that of the 2-dimensional emotional model used by some affective algorithmic composition systems, suggests that isolated musical feature manipulation can now be used to parametrically control affectively charged automated composition in a larger system.
Resumo:
Resistive respiratory loading is an established stimulus for the induction of experimental dyspnoea. In comparison to unloaded breathing, resistive loaded breathing alters end-tidal CO2 (PETCO2), which has independent physiological effects (e.g. upon cerebral blood flow). We investigated the subjective effects of resistive loaded breathing with stabilized PETCO2 (isocapnia) during manual control of inspired gases on varying baseline levels of mild hypercapnia increased PETCO2). Furthermore, to investigate whether perceptual habituation to dyspnoea stimuli occurs, the study was repeated over four experimental sessions. Isocapnic hypercapnia did not affect dyspnoea unpleasantness during resistive loading. A post hoc analysis revealed a small increase of respiratory unpleasantness during unloaded breathing at +0.6 kPa, the level that reliably induced isocapnia. We didnot observe perceptual habituation over the four sessions. We conclude that isocapnic respiratory loading allows stable induction of respiratory unpleasantness, making it a good stimulus for multi-session studies of dyspnoea.
Resumo:
Abstract. In addition to 9 vowel and 18 consonant phonemes, Swedish has three prosodic phonemic contrasts: word stress, quantity and tonal word accent. There are also examples of distinctive phrase or sentence stress, where a verb can be followed by either an unstressed preposition or a stressed particle. This study focuses on word level and more specifically on word stress and tonal word accent in disyllabic words. When making curriculums for second language learners, teachers are helped by knowing which phonetic or phonological features are more or less crucial for the intelligibility of speech and there are some structural and anecdotal evidence that word stress should play a more important role for intelligibility of Swedish, than the tonal word accent. The Swedish word stress is about prominence contrasts between syllables, mainly signaled by syllable duration, while the tonal word accent is signaled mainly by pitch contour. The word stress contrast, as in armen [´arːmən] ‘the arm’ - armén [ar´meːn] ‘the army’, the first word trochaic and the second iambic, is present in all regional varieties of Swedish, and realized with roughly the same acoustic cues, while the tonal word accent, as in anden [´anːdən] ‘the duck’ - anden [`anːdən] ‘the spirit’ is absent in some dialects (as well as in singing), and also signaled with a variety of tonal patterns depending on region. The present study aims at comparing the respective perceptual weight of the two mentioned contrasts. Two lexical decision tests were carried out where in total 34 native Swedish listeners should decide whether a stimulus was a real word or a non-word. Real words of all mentioned categories were mixed with nonsense words and words that were mispronounced with opposite stress pattern or opposite tonal word accent category. The results show that distorted word stress caused more non-word judgments and more loss, than distorted word accent. Our conclusion is that intelligibility of Swedish is more sensitive to distorted word stress pattern than to distorted tonal word accent pattern. This is in compliance with the structural arguments presented above, and also with our own intuition.