9 resultados para Text-to-speech

em Bucknell University Digital Commons - Pensilvania - USA


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech is often a multimodal process, presented audiovisually through a talking face. One area of speech perception influenced by visual speech is speech segmentation, or the process of breaking a stream of speech into individual words. Mitchel and Weiss (2013) demonstrated that a talking face contains specific cues to word boundaries and that subjects can correctly segment a speech stream when given a silent video of a speaker. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2013). In Experiment 1, subjects were found to spend the most time watching the eyes and mouth, with a trend suggesting that the mouth was viewed more than the eyes. Although subjects displayed significant learning of word boundaries, performance was not correlated with gaze duration on any individual feature, nor was performance correlated with a behavioral measure of autistic-like traits. However, trends suggested that as autistic-like traits increased, gaze duration of the mouth increased and gaze duration of the eyes decreased, similar to significant trends seen in autistic populations (Boratston & Blakemore, 2007). In Experiment 2, the same video was modified so that a black bar covered the eyes or mouth. Both videos elicited learning of word boundaries that was equivalent to that seen in the first experiment. Again, no correlations were found between segmentation performance and SRS scores in either condition. These results, taken with those in Experiment, suggest that neither the eyes nor mouth are critical to speech segmentation and that perhaps more global head movements indicate word boundaries (see Graf, Cosatto, Strom, & Huang, 2002). Future work will elucidate the contribution of individual features relative to global head movements, as well as extend these results to additional types of speech tasks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Street art and graffiti are integral parts of Berlin’s urban space, which has undergone dramatic transformations in the past two decades. Graffiti texts constitute a critical comment on these urban transformations. This talk analyzes the connection between the phenomenon of street art and trajectories in urban planning in post-wall Berlin. My current research explores the meaning of various forms of street art (such as graffiti, posters, sticker art, stencils) as texts in Berlin’s linguistic landscape. Linguistic Landscape research pays critical attention to language, words, and images displayed and exposed in public spaces. The field of Linguistic Landscapes has only recently begun to include graffiti texts in analyses of text and space to fully comprehend the semiotics of the street. In the case of Germany’s capital, graffiti writing enters into a critical dialogue with the environment and provides a readable text to understand the city.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For as far back as human history can be traced, mankind has questioned what it means to be human. One of the most common approaches throughout Western culture's intellectual tradition in attempts to answering this question has been to compare humans with or against other animals. I argue that it was not until Charles Darwin's publication of The Descent of Man and Selection in Relation to Sex (1871) that Western culture was forced to seriously consider human identity in relation to the human/ nonhuman primate line. Since no thinker prior to Charles Darwin had caused such an identity crisis in Western thought, this interdisciplinary analysis of the history of how the human/ nonhuman primate line has been understood focuses on the reciprocal relationship of popular culture and scientific representations from 1871 to the Human Genome Consortium in 2000. Focusing on the concept coined as the "Darwin-Müller debate," representations of the human/ nonhuman primate line are traced through themes of language, intelligence, and claims of variation throughout the popular texts: Descent of Man, The Jungle Books (1894), Tarzan of the Apes (1914), and Planet of the Apes (1963). Additional themes such as the nature versus nurture debate and other comparative phenotypic attributes commonly used for comparison between man and apes are also analyzed. Such popular culture representations are compared with related or influential scientific research during the respective time period of each text to shed light on the reciprocal nature of Western intellectual tradition, popular notions of the human/ nonhuman primate line, and the development of the field of primatology. Ultimately this thesis shows that the Darwin-Müller debate is indeterminable, and such a lack of resolution makes man uncomfortable. Man's unsettled response and desire for self-knowledge further facilitates a continued search for answers to human identity. As the Human Genome Project has led to the rise of new debates, and primate research has become less anthropocentric over time, the mysteries of man's future have become more concerning than the questions of our past. The human/ nonhuman primate line is reduced to a 1% difference, and new debates have begun to overshadow the Darwin-Müller debate. In conclusion, I argue that human identity is best represented through the metaphor of evolution: both have an unknown beginning, both have an indeterminable future with no definite end, and like a species under the influence of evolution, what it means to be human is a constant, indeterminable process of change.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a new method for the enhancement of speech. The method is designed for scenarios in which targeted speaker enrollment as well as system training within the typical noise environment are feasible. The proposed procedure is fundamentally different from most conventional and state-of-the-art denoising approaches. Instead of filtering a distorted signal we are resynthesizing a new “clean” signal based on its likely characteristics. These characteristics are estimated from the distorted signal. A successful implementation of the proposed method is presented. Experiments were performed in a scenario with roughly one hour of clean speech training data. Our results show that the proposed method compares very favorably to other state-of-the-art systems in both objective and subjective speech quality assessments. Potential applications for the proposed method include jet cockpit communication systems and offline methods for the restoration of audio recordings.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new idea for waveform coding using vector quantisation (VQ) is introduced. This idea makes it possible to deal with codevectors much larger than before for a fixed bit per sample rate. Also a solution to the matching problem (inherent in the present context) in the &-norm describing a measure of neamess is presented. The overall computational complexity of this solution is O(n3 log, n). Sample results are presented to demonstrate the advantage of using this technique in the context of coding of speech waveforms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao's method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.