633 resultados para Nonnative speaker
Resumo:
In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.
Resumo:
Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.
Resumo:
As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.
Resumo:
Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.
Resumo:
The deployment of systems for human-to-machine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.
Resumo:
This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.
Resumo:
The GroE proteins are molecular chaperones involved in protein folding. The general mechanism by which they facilitate folding is still enigmatic. One of the central open questions is the conformation of the GroEL-bound nonnative protein. Several suggestions have been made concerning the folding stage at which a protein can interact with GroEL. Furthermore, the possibility exists that binding of the nonnative protein to GroEL results in its unfolding. We have addressed these issues that are basic for understanding the GroE-mediated folding cycle by using folding intermediates of an Fab antibody fragment as molecular probes to define the binding properties of GroEL. We show that, in addition to binding to an early folding intermediate, GroEL is able to recognize and interact with a late quaternary-structured folding intermediate (Dc) without measurably unfolding it. Thus, the prerequisite for binding is not a certain folding stage of a nonnative protein. In contrast, general surface properties of nonnative proteins seem to be crucial for binding. Furthermore, unfolding of a highly structured intermediate does not necessarily occur upon binding to GroEL. Folding of Dc in the presence of GroEL and ATP involves cycles of binding and release. Because in this system no off-pathway reactions or kinetic traps are involved, a quantitative analysis of the reactivation kinetics observed is possible. Our results indicate that the association reaction of Dc and GroEL in the presence of ATP is rather slow, whereas in the absence of ATP association is several orders of magnitude more efficient. Therefore, it seems that ATP functions by inhibiting reassociation rather than promoting release of the bound substrate.
Resumo:
We have prepared a family of peptide fragments of the 64-residue chymotrypsin inhibitor 2, corresponding to its progressive elongation from the N terminus. The growing polypeptide chain has little tendency to form stable structure until it is largely synthesized, and what structures are formed are nonnative and lack, in particular, the native secondary structural elements of alpha-helix and beta-sheet. These elements then develop as sufficient tertiary interactions are made in the nearly full-length chain. The growth of structure in the small module is highly cooperative and does not result from the hierarchical accretion of substructures.
Resumo:
Trick Rider is a book-length poem in four sections, which uses characteristics of the epic and gothic, as well as strategies of chance operations, to explore the compositional process in relation to time, how time is experienced during the writing process and is communicated through the text as an object and through the process of reading. The polyphonic speaker of Trick Rider is a stunt double and experiences doubling, being both representative of and an outsider to the community she channels; this tension is simultaneously cause and effect of the text.
Resumo:
Robert Kennedy's announcement of the assassination of Martin Luther King, Jr., in an Indianapolis urban community that did not revolt in riots on April 4, 1968, provides one significant example in which feelings, energy, and bodily risk resonate alongside the articulated message. The relentless focus on Kennedy's spoken words, in historical biographies and other critical research, presents a problem of isolated effect because the power really comes from elements outside the speech act. Thus, this project embraces the complexities of rhetorical effectivity, which involves such things as the unique situational context, all participants (both Kennedy and his audience) of the speech act, aesthetic argument, and the ethical implications. This version of the story embraces the many voices of the participants through first hand interviews and new oral history reports. Using evidence provided from actual participants in the 1968 Indianapolis event, this project reflects critically upon the world disclosure of the event as it emerges from those remembrances. Phenomenology provides one answer to the constitutive dilemma of rhetorical effectivity that stems from a lack of a framework that gets at questions of ethics, aesthetics, feelings, energy, etc. Thus, this work takes a pedagogical shift away from discourse (verbal/written) as the primary place to render judgments about the effects of communication interaction. With a turn to explore extra-sensory reasoning, by way of the physical, emotional, and numinous, a multi-dimensional look at public address is delivered. The rhetorician will be interested in new ways of assessing effects. The communication ethicist will appreciate the work as concepts like answerability, emotional-volitional tone, and care for the other, come to life via application and consideration of Kennedy's appearance. For argumentation scholars, the interest comes forth in a re-thinking of how we do argumentation. And the critical cultural scholar will find this story ripe with opportunities to uncover the politics of representation, racialized discourse, privilege, power, ideological hegemony, and reconciliation. Through an approach of multiple layers this real-life tale will expose the power of the presence among audience and speaker, emotive argument, as well as the magical turn of fate which all contributes the possibility of a dialogic rhetoric.
Resumo:
A contribuição da música no campo das ciências humanas vem sendo valorizado pelas ciências da saúde nas últimas décadas, favorecendo relações entre a Fonoaudiologia e a Musicoterapia. A avaliação da percepção musical busca compreender princípios básicos como a discriminação de timbres, melodias, ritmos, intensidade, altura, duração das notas, densidade, entre outros, além de conhecimentos inerentes em relação a audição, bem como as experiências musicais no decorrer da vida. O objetivo deste estudo foi elaborar um teste informatizado de avaliação do reconhecimento de melodias tradicionais brasileiras e verificar o desempenho de crianças com audição normal neste instrumento. Foi realizada a elaboração de um teste, denominado Avaliação do Reconhecimento de Melodias Tradicionais em Crianças normo-ouvintes (ARMTC), em formato de website, composto por 15 melodias tradicionais da cultura brasileira, gravadas com timbre sintetizado de piano, padronizadas com andamentos variáveis, intensidades similares, tonalidade de acordo com a partitura utilizada, reprodução de 12 segundos cada melodia e pausas de quatro segundos entre cada melodia. A casuística foi composta por 155 crianças, com faixa etária entre oito e 11 anos, de ambos os sexos, com limiares auditivos nas frequências de 500 Hz a 4000 Hz dentro dos padrões de normalidade e curva timpanométrica tipo A. Todas as crianças foram submetidas à triagem audiológica (frequências de 500 Hz, 1 KHz, 2 KHz e 4 KHz), Timpanometria e ao ARMTC. O ARMTC foi aplicado em campo livre com intensidade de 65 dBNA, com caixa de som posicionada a 0o azimute, à uma distância de um metro do participante que se manteve sentado. As crianças foram instruídas a clicar na tela do notebook no ícone correspondente ao nome e ilustração da melodia a qual ouviram e prosseguir dessa forma até o término das 15 melodias apresentadas. Na maioria das melodias selecionadas não houve diferença significante entre número de erros/acertos e tempo de reação quando estas variáveis foram correlacionadas ao sexo, idade e local em que o teste foi aplicado. As melodias mais reconhecidas foram: Cai, cai balão, Boi da cara preta, que teve igual score a Caranguejo, Escravos de Jó, O cravo, Parabéns a você e Marcha soldado, as quais obtiveram reconhecimento superior à 70% de acertos e a melodia com menor reconhecimento foi Capelinha de melão.
Resumo:
Nonnative aquatic species are invasive worldwide. These species adversely affect natural aquatic ecosystems in a variety of ways and can negatively affect agriculture, recreation and industry. This study addresses identification and control of aquatic plant species of concern in Colorado State Parks. Seventeen species identified as potential threats to the parks and safe, effective chemical control methodologies were determined for each species. A matrix was developed to include the plants, appropriate chemical controls and the type of aquatic habitat where chemical use would be safe and effective. The matrix and recommendations for its use will be provided to the Colorado Division of Parks and Outdoor Recreation to develop a management plan under Section 1204 of the National Invasive Species Act.
Resumo:
Three-page manuscript copy of the salutatory address composed in Latin by graduate Jonathan Trumbull for the 1759 Harvard Commencement. The item is dated June 29, 1759.
Resumo:
This hardcover modern binding contains a twenty-page manuscript copy of the salutatory address given by Elisha Cooke at the 1697 Harvard College Commencement. The text includes edits and struck-through words. A one-page copy of the first page of the oration signed by Thomas Banister and William Phips is at the end of the volume.
Resumo:
Four-page manuscript copy of the valedictory Commencement oration composed by Jonathan Trumbull for the 1762 Harvard College Commencement.