931 resultados para Speech-processing technologies


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Supported by Contract AT(11-1)-1018 and Contract AT(11-1)-2118 with U.S. Atomic Energy Commission.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this study was to compare the robustness of the event-related potential (ERP) response, called the mismatch negativity (MMN), when elicited by simple tone stimuli (differing in frequency, duration, or intensity) and speech stimuli (CV nonword contrast /de:/ vs. /ge:/ and CV word contrast /deI/ vs. /geI/). The study was conducted using 30 young adult subjects (Groups A and B; n = 15 each). The speech stimuli were presented to Group A at a stimulus onset asynchrony (SOA) of 610 msec and to Group B at an SOA of 900 msec. The tone stimuli were presented to both groups at an SOA of 610 msec. MMN responses were elicited by the simple tone stimuli (66.7%-96.7% of subjects with MMN "present," or significantly different from zero, p < 0.05) but not the speech stimuli (10% subjects with MMN present for nonwords, 10% for words). The length of the SOA (610 msec or 900 msec) had no effect on the ability to obtain consistent MMN responses to the speech stimuli. The results indicated a lack of robust MMN elicited by speech stimuli with fine acoustic contrasts under carefully controlled methodological conditions. The implications of these results are discussed in relation to conflicting reports in the literature of speech-elicited MMNs, and the importance of appropriate methodological design in MMN studies investigating speech processing in normal and pathological populations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Adults show great variation in their auditory skills, such as being able to discriminate between foreign speech-sounds. Previous research has demonstrated that structural features of auditory cortex can predict auditory abilities; here we are interested in the maturation of 2-Hz frequency-modulation (FM) detection, a task thought to tap into mechanisms underlying language abilities. We hypothesized that an individual's FM threshold will correlate with gray-matter density in left Heschl's gyrus, and that this function-structure relationship will change through adolescence. To test this hypothesis, we collected anatomical magnetic resonance imaging data from participants who were tested and scanned at three time points: at 10, 11.5 and 13 years of age. Participants judged which of two tones contained FM; the modulation depth was adjusted using an adaptive staircase procedure and their threshold was calculated based on the geometric mean of the last eight reversals. Using voxel-based morphometry, we found that FM threshold was significantly correlated with gray-matter density in left Heschl's gyrus at the age of 10 years, but that this correlation weakened with age. While there were no differences between girls and boys at Times 1 and 2, at Time 3 there was a relationship between gray-matter density in left Heschl's gyrus in boys but not in girls. Taken together, our results confirm that the structure of the auditory cortex can predict temporal processing abilities, namely that gray-matter density in left Heschl's gyrus can predict 2-Hz FM detection threshold. This ability is dependent on the processing of sounds changing over time, a skill believed necessary for speech processing. We tested this assumption and found that FM threshold significantly correlated with spelling abilities at Time 1, but that this correlation was found only in boys. This correlation decreased at Time 2, and at Time 3 we found a significant correlation between reading and FM threshold, but again, only in boys. We examined the sex differences in both the imaging and behavioral data taking into account pubertal stages, and found that the correlation between FM threshold and spelling was strongest pre-pubertally, and the correlation between FM threshold and gray-matter density in left Heschl's gyrus was strongest mid-pubertally.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Speech recognition technology is regarded as a key enabler for increasing the usability of applications deployed on mobile devices -- devices which are becoming increasingly prevalent in modern hospital-based healthcare. Although the use of speech recognition is not new to the hospital-based healthcare domain, its use with mobile devices has thus far been limited. This paper presents the results of a literature review we conducted in order to observe the manner in which speech recognition technology has been used in hospital-based healthcare and to gain an understanding of how this technology is being evaluated, in terms of its dependability and reliability, in healthcare settings. Our intent is that this review will help identify scope for future uses of speech recognition technologies in the healthcare domain, as well as to identify implications for the meaningful evaluation of such technologies given the specific context of use.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Speech recognition technology is regarded as a key enabler for increasing the usability of applications deployed on mobile devices -- devices which are becoming increasingly prevalent in modern hospital-based healthcare. Although the use of speech recognition is not new to the hospital-based healthcare domain, its use with mobile devices has thus far been limited. This paper presents the results of a literature review we conducted in order to observe the manner in which speech recognition technology has been used in hospital-based healthcare and to gain an understanding of how this technology is being evaluated, in terms of its dependability and reliability, in healthcare settings. Our intent is that this review will help identify scope for future uses of speech recognition technologies in the healthcare domain, as well as to identify implications for the meaningful evaluation of such technologies given the specific context of use.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech -- Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions -- A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds -- Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions -- Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it -- Finally features related with emotions in voiced speech are extracted and presented

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sleeper is an 18'00" musical work for live performer and laptop computer which exists as both a live performance work and a recorded work for audio CD. The work has been presented at a range of international performance events and survey exhibitions. These include the 2003 International Computer Music Conference (Singapore) where it was selected for CD publication, Variable Resistance (San Francisco Museum of Modern Art, USA), and i.audio, a survey of experimental sound at the Performance Space, Sydney. The source sound materials are drawn from field recordings made in acoustically resonant spaces in the Australian urban environment, amplified and acoustic instruments, radio signals, and sound synthesis procedures. The processing techniques blur the boundaries between, and exploit, the perceptual ambiguities of de-contextualised and processed sound. The work thus challenges the arbitrary distinctions between sound, noise and music and attempts to reveal the inherent musicality in so-called non-musical materials via digitally re-processed location audio. Thematically the work investigates Paul Virilio’s theory that technology ‘collapses space’ via the relationship of technology to speed. Technically this is explored through the design of a music composition process that draws upon spatially and temporally dispersed sound materials treated using digital audio processing technologies. One of the contributions to knowledge in this work is a demonstration of how disparate materials may be employed within a compositional process to produce music through the establishment of musically meaningful morphological, spectral and pitch relationships. This is achieved through the design of novel digital audio processing networks and a software performance interface. The work explores, tests and extends the music perception theories of ‘reduced listening’ (Schaeffer, 1967) and ‘surrogacy’ (Smalley, 1997), by demonstrating how, through specific audio processing techniques, sounds may shifted away from ‘causal’ listening contexts towards abstract aesthetic listening contexts. In doing so, it demonstrates how various time and frequency domain processing techniques may be used to achieve this shift.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Gabor representations have been widely used in facial analysis (face recognition, face detection and facial expression detection) due to their biological relevance and computational properties. Two popular Gabor representations used in literature are: 1) Log-Gabor and 2) Gabor energy filters. Even though these representations are somewhat similar, they also have distinct differences as the Log-Gabor filters mimic the simple cells in the visual cortex while the Gabor energy filters emulate the complex cells, which causes subtle differences in the responses. In this paper, we analyze the difference between these two Gabor representations and quantify these differences on the task of facial action unit (AU) detection. In our experiments conducted on the Cohn-Kanade dataset, we report an average area underneath the ROC curve (A`) of 92.60% across 17 AUs for the Gabor energy filters, while the Log-Gabor representation achieved an average A` of 96.11%. This result suggests that small spatial differences that the Log-Gabor filters pick up on are more useful for AU detection than the differences in contours and edges that the Gabor energy filters extract.