Biblioteca Digital

581 resultados para Speaker

Training and search methods for speech recognition.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.

Voice-processing technologies--their application in telecommunications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.

Commercial applications of speech interface technology: an industry at the threshold.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.

Deployment of human-machine dialogue systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The deployment of systems for human-to-machine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.

Toward the ultimate synthesis/recognition system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

Trick Rider

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Trick Rider is a book-length poem in four sections, which uses characteristics of the epic and gothic, as well as strategies of chance operations, to explore the compositional process in relation to time, how time is experienced during the writing process and is communicated through the text as an object and through the process of reading. The polyphonic speaker of Trick Rider is a stunt double and experiences doubling, being both representative of and an outsider to the community she channels; this tension is simultaneously cause and effect of the text.

April 4, 1968: Death, Difference, and Dialogue

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Robert Kennedy's announcement of the assassination of Martin Luther King, Jr., in an Indianapolis urban community that did not revolt in riots on April 4, 1968, provides one significant example in which feelings, energy, and bodily risk resonate alongside the articulated message. The relentless focus on Kennedy's spoken words, in historical biographies and other critical research, presents a problem of isolated effect because the power really comes from elements outside the speech act. Thus, this project embraces the complexities of rhetorical effectivity, which involves such things as the unique situational context, all participants (both Kennedy and his audience) of the speech act, aesthetic argument, and the ethical implications. This version of the story embraces the many voices of the participants through first hand interviews and new oral history reports. Using evidence provided from actual participants in the 1968 Indianapolis event, this project reflects critically upon the world disclosure of the event as it emerges from those remembrances. Phenomenology provides one answer to the constitutive dilemma of rhetorical effectivity that stems from a lack of a framework that gets at questions of ethics, aesthetics, feelings, energy, etc. Thus, this work takes a pedagogical shift away from discourse (verbal/written) as the primary place to render judgments about the effects of communication interaction. With a turn to explore extra-sensory reasoning, by way of the physical, emotional, and numinous, a multi-dimensional look at public address is delivered. The rhetorician will be interested in new ways of assessing effects. The communication ethicist will appreciate the work as concepts like answerability, emotional-volitional tone, and care for the other, come to life via application and consideration of Kennedy's appearance. For argumentation scholars, the interest comes forth in a re-thinking of how we do argumentation. And the critical cultural scholar will find this story ripe with opportunities to uncover the politics of representation, racialized discourse, privilege, power, ideological hegemony, and reconciliation. Through an approach of multiple layers this real-life tale will expose the power of the presence among audience and speaker, emotive argument, as well as the magical turn of fate which all contributes the possibility of a dialogic rhetoric.

Avaliação do reconhecimento de melodias tradicionais em crianças normo-ouvintes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A contribuição da música no campo das ciências humanas vem sendo valorizado pelas ciências da saúde nas últimas décadas, favorecendo relações entre a Fonoaudiologia e a Musicoterapia. A avaliação da percepção musical busca compreender princípios básicos como a discriminação de timbres, melodias, ritmos, intensidade, altura, duração das notas, densidade, entre outros, além de conhecimentos inerentes em relação a audição, bem como as experiências musicais no decorrer da vida. O objetivo deste estudo foi elaborar um teste informatizado de avaliação do reconhecimento de melodias tradicionais brasileiras e verificar o desempenho de crianças com audição normal neste instrumento. Foi realizada a elaboração de um teste, denominado Avaliação do Reconhecimento de Melodias Tradicionais em Crianças normo-ouvintes (ARMTC), em formato de website, composto por 15 melodias tradicionais da cultura brasileira, gravadas com timbre sintetizado de piano, padronizadas com andamentos variáveis, intensidades similares, tonalidade de acordo com a partitura utilizada, reprodução de 12 segundos cada melodia e pausas de quatro segundos entre cada melodia. A casuística foi composta por 155 crianças, com faixa etária entre oito e 11 anos, de ambos os sexos, com limiares auditivos nas frequências de 500 Hz a 4000 Hz dentro dos padrões de normalidade e curva timpanométrica tipo A. Todas as crianças foram submetidas à triagem audiológica (frequências de 500 Hz, 1 KHz, 2 KHz e 4 KHz), Timpanometria e ao ARMTC. O ARMTC foi aplicado em campo livre com intensidade de 65 dBNA, com caixa de som posicionada a 0o azimute, à uma distância de um metro do participante que se manteve sentado. As crianças foram instruídas a clicar na tela do notebook no ícone correspondente ao nome e ilustração da melodia a qual ouviram e prosseguir dessa forma até o término das 15 melodias apresentadas. Na maioria das melodias selecionadas não houve diferença significante entre número de erros/acertos e tempo de reação quando estas variáveis foram correlacionadas ao sexo, idade e local em que o teste foi aplicado. As melodias mais reconhecidas foram: Cai, cai balão, Boi da cara preta, que teve igual score a Caranguejo, Escravos de Jó, O cravo, Parabéns a você e Marcha soldado, as quais obtiveram reconhecimento superior à 70% de acertos e a melodia com menor reconhecimento foi Capelinha de melão.

Salutatory address of Jonathan Trumbull, 1759

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Three-page manuscript copy of the salutatory address composed in Latin by graduate Jonathan Trumbull for the 1759 Harvard Commencement. The item is dated June 29, 1759.

Oratio salutatoria in comitiis academicis habita Cantabrigiae Nov-Anglorum. Iulii nonarum, 1697

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This hardcover modern binding contains a twenty-page manuscript copy of the salutatory address given by Elisha Cooke at the 1697 Harvard College Commencement. The text includes edits and struck-through words. A one-page copy of the first page of the oration signed by Thomas Banister and William Phips is at the end of the volume.

Valedictory address of Jonathan Trumbull, 1762

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Four-page manuscript copy of the valedictory Commencement oration composed by Jonathan Trumbull for the 1762 Harvard College Commencement.

Letter from Benjamin Colman to Edward Wigglesworth about John Leverett, 1728 March 4

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Benjamin Colman wrote this letter to Edward Wigglesworth on March 4, 1728; it was sent from Colman, in Boston, to Wigglesworth, in Cambridge. The letter concerns their mutual friend, John Leverett, who had died several years before. It appears that Wigglesworth was charged with writing an epitaph for Leverett and had solicited input from Colman. Colman writes of his great admiration for Leverett, praising his "virtue & piety, wisdom & gravity [...] majesty & authority [...] eye & voice, goodness & courtesie."

Embracing english as a lingua franca : learning from portuguese users of english in higher education

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tese de doutoramento, Linguística (Linguística Aplicada), Universidade de Lisboa, Faculdade de Letras, 2016

La voce per gli altri. Il Libro Parlato e la voce dell'interprete al servizio della comunità non vedente e ipovedente

Relevância:

10.00% 10.00%

Publicador:

Resumo:

L’obiettivo del presente elaborato è esplorare un ulteriore possibile impiego delle competenze dell’interprete di conferenza in un ambito ancora poco conosciuto ai più: la produzione di audiolibri e l’attività, a sfondo sociale, dei donatori di voce. Partendo da una panoramica generale sull’oralità e la valenza della voce in termini metafisico-antropologici, seguita dall’iniziazione del lettore al mondo degli audiolibri per non vedenti e ipovedenti e all’attività del Centro Internazionale del Libro Parlato di Feltre (oggetto del secondo capitolo), nel terzo capitolo il lavoro si concentra sul valore del gesto vocale nella comunicazione orale e, in modo specifico, per l’interprete di conferenza, riflessione che assume particolare interesse alla luce del percorso di studi di chi scrive. Quindi, si giunge, nel quarto capitolo, all’affermazione dell’importanza della pedagogia vocale e dei suoi strumenti regolatori del gesto vocale in relazione agli obiettivi espressivo-comunicativi di un “interprete”, sia esso attore, cantante, speaker radiofonico, audiodescrittore, donatore di voce o interprete interlinguistico e interculturale. Nel quinto capitolo il gesto vocale è descritto in una prospettiva anatomo-fisiologica, con un’analisi dei vari sistemi che costituiscono l’apparato pneumofonoarticolatorio preposto all’atto della fonazione. Infine, l’elaborato si chiude con i punti di vista di un’ex donatrice di voce, della professoressa di Tecniche di Presentazione Orale della Scuola di Lingue e Letterature, Traduzione e Interpretazione di Forlì, e di un utente non vedente di audiolibri, raccolti dalla loro viva voce attraverso delle interviste che hanno l’obiettivo di dare un riscontro reale a quanto affermato teoricamente nel corso della disamina.

Eye Gaze Behavior at Turn Transition: How Aphasic Patients Process Speakers' Turns during Video Observation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The human turn-taking system regulates the smooth and precise exchange of speaking turns during face-to-face interaction. Recent studies investigated the processing of ongoing turns during conversation by measuring the eye movements of noninvolved observers. The findings suggest that humans shift their gaze in anticipation to the next speaker before the start of the next turn. Moreover, there is evidence that the ability to timely detect turn transitions mainly relies on the lexico-syntactic content provided by the conversation. Consequently, patients with aphasia, who often experience deficits in both semantic and syntactic processing, might encounter difficulties to detect and timely shift their gaze at turn transitions. To test this assumption, we presented video vignettes of natural conversations to aphasic patients and healthy controls, while their eye movements were measured. The frequency and latency of event-related gaze shifts, with respect to the end of the current turn in the videos, were compared between the two groups. Our results suggest that, compared with healthy controls, aphasic patients have a reduced probability to shift their gaze at turn transitions but do not show significantly increased gaze shift latencies. In healthy controls, but not in aphasic patients, the probability to shift the gaze at turn transition was increased when the video content of the current turn had a higher lexico-syntactic complexity. Furthermore, the results from voxel-based lesion symptom mapping indicate that the association between lexico-syntactic complexity and gaze shift latency in aphasic patients is predicted by brain lesions located in the posterior branch of the left arcuate fasciculus. Higher lexico-syntactic processing demands seem to lead to a reduced gaze shift probability in aphasic patients. This finding may represent missed opportunities for patients to place their contributions during everyday conversation.

«
1
2
...
27
28
29
30
31
32
33
...
38
39
»