5 resultados para speaker diarization

em Bucknell University Digital Commons - Pensilvania - USA


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a new method for the enhancement of speech. The method is designed for scenarios in which targeted speaker enrollment as well as system training within the typical noise environment are feasible. The proposed procedure is fundamentally different from most conventional and state-of-the-art denoising approaches. Instead of filtering a distorted signal we are resynthesizing a new “clean” signal based on its likely characteristics. These characteristics are estimated from the distorted signal. A successful implementation of the proposed method is presented. Experiments were performed in a scenario with roughly one hour of clean speech training data. Our results show that the proposed method compares very favorably to other state-of-the-art systems in both objective and subjective speech quality assessments. Potential applications for the proposed method include jet cockpit communication systems and offline methods for the restoration of audio recordings.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Each year, the Research Committee of the Ohio Music Education Association sponsors a half-day Research Forum prior to the beginning of the state music education association conference. In 2004, Dr. Patricia J. Flowers, Professor of Music at the Ohio State University was the guest speaker. This article summarizes her talk on the process of becoming a music education researcher

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech is often a multimodal process, presented audiovisually through a talking face. One area of speech perception influenced by visual speech is speech segmentation, or the process of breaking a stream of speech into individual words. Mitchel and Weiss (2013) demonstrated that a talking face contains specific cues to word boundaries and that subjects can correctly segment a speech stream when given a silent video of a speaker. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2013). In Experiment 1, subjects were found to spend the most time watching the eyes and mouth, with a trend suggesting that the mouth was viewed more than the eyes. Although subjects displayed significant learning of word boundaries, performance was not correlated with gaze duration on any individual feature, nor was performance correlated with a behavioral measure of autistic-like traits. However, trends suggested that as autistic-like traits increased, gaze duration of the mouth increased and gaze duration of the eyes decreased, similar to significant trends seen in autistic populations (Boratston & Blakemore, 2007). In Experiment 2, the same video was modified so that a black bar covered the eyes or mouth. Both videos elicited learning of word boundaries that was equivalent to that seen in the first experiment. Again, no correlations were found between segmentation performance and SRS scores in either condition. These results, taken with those in Experiment, suggest that neither the eyes nor mouth are critical to speech segmentation and that perhaps more global head movements indicate word boundaries (see Graf, Cosatto, Strom, & Huang, 2002). Future work will elucidate the contribution of individual features relative to global head movements, as well as extend these results to additional types of speech tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Telephone Conference Network, sponsored by The Pennsylvania State University's Coordinating Council for Health Care, is designed as a cost-effective format for providing inservice training in geriatric mental health for individuals who serve the elderly. Institutions which subscribe to the Telephone Conference Network are equipped with a conference speaker and telephone hook-up providing a two-way line of communication, and may choose from a variety of inservice programs. Mailed evaluations were completed by participants (N=73) in the "Skills to Manage Moods" program, a series of four 1-hour sessions designed to teach participants the skills needed to help patients cope with depression and to deliver the program to others. The majority of respondents reported high levels of satisfaction with the Telephone Conference Network system and the specific program in which they participated. Although 85 percent reported that they would be able to use the skills learned in the program on the job, 50 percent reported that they would not be interested in teaching these skills to others. The convenience and efficiency of the Telephone Conference Network were the most frequently mentioned strengths of the system, while the physical facilities and the program delivery format adopted by the individual institutions were the most frequently mentioned weaknesses. These data suggested several recommendations for Network subscribers and for professionals offering telephone conference programs, including ensuring optimal class enrollment and adequate physical facilities, and participant involvement in program implementation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent advances in the field of statistical learning have established that learners are able to track regularities of multimodal stimuli, yet it is unknown whether the statistical computations are performed on integrated representations or on separate, unimodal representations. In the present study, we investigated the ability of adults to integrate audio and visual input during statistical learning. We presented learners with a speech stream synchronized with a video of a speaker's face. In the critical condition, the visual (e.g., /gi/) and auditory (e.g., /mi/) signals were occasionally incongruent, which we predicted would produce the McGurk illusion, resulting in the perception of an audiovisual syllable (e.g., /ni/). In this way, we used the McGurk illusion to manipulate the underlying statistical structure of the speech streams, such that perception of these illusory syllables facilitated participants' ability to segment the speech stream. Our results therefore demonstrate that participants can integrate audio and visual input to perceive the McGurk illusion during statistical learning. We interpret our findings as support for modality-interactive accounts of statistical learning.