842 resultados para speaker diarization


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Featured Speaker

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Keynote Speaker

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O jornalismo é um dos principais meios de oferta de temas para a discussão e formação da opinião pública, porém depende de um sistema técnico para ser transmitido. Durante mais de cem anos as informações produzidas pela imprensa foram emitidas, armazenadas, transmitidas e recebidas pelos chamados veículos de comunicação de massa que utilizam a rede centralizada cujas características estão na escassez material, produção em série e massificação. Esse sistema separa no tempo e no espaço emissores e receptores criando uma relação desigual de força em que as grandes empresas controlaram o fluxo informativo, definindo quais fatos seriam veiculados como notícia. Em 1995, a internet cuja informação circula sob a tecnologia da rede distribuída, foi apropriada pela sociedade, alterando a forma de produção, armazenamento e transmissão de informação. A tecnologia despertou a esperança de que esta ferramenta poderia proporcionar uma comunicação mais dialógica e democrática. Mas aos poucos pode-se perceber novas empresas se apropriando da tecnologia da rede distribuída sob a qual circula a internet, gerando um novo controle do fluxo informativo. Realizou-se nessa pesquisa um levantamento bibliográfico para estabelecer uma reflexão crítica dos diferentes intermediários entre fato e a notícia tanto da rede centralizada como na rede distribuída, objetivando despertar uma discussão que possa oferecer novas ideias para políticas, bem como alternativas para uma comunicação mais democrática e mais libertária.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a variety of statistical methods for obtaining precise quantitative estimates of the similarities and differences in the structures of semantic domains in different languages. The methods include comparing mean correlations within and between groups, principal components analysis of interspeaker correlations, and analysis of variance of speaker by question data. Methods for graphical displays of the results are also presented. The methods give convergent results that are mutually supportive and equivalent under suitable interpretation. The methods are illustrated on the semantic domain of emotion terms in a comparison of the semantic structures of native English and native Japanese speaking subjects. We suggest that, in comparative studies concerning the extent to which semantic structures are universally shared or culture-specific, both similarities and differences should be measured and compared rather than placing total emphasis on one or the other polar position.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The deployment of systems for human-to-machine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Trick Rider is a book-length poem in four sections, which uses characteristics of the epic and gothic, as well as strategies of chance operations, to explore the compositional process in relation to time, how time is experienced during the writing process and is communicated through the text as an object and through the process of reading. The polyphonic speaker of Trick Rider is a stunt double and experiences doubling, being both representative of and an outsider to the community she channels; this tension is simultaneously cause and effect of the text.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Robert Kennedy's announcement of the assassination of Martin Luther King, Jr., in an Indianapolis urban community that did not revolt in riots on April 4, 1968, provides one significant example in which feelings, energy, and bodily risk resonate alongside the articulated message. The relentless focus on Kennedy's spoken words, in historical biographies and other critical research, presents a problem of isolated effect because the power really comes from elements outside the speech act. Thus, this project embraces the complexities of rhetorical effectivity, which involves such things as the unique situational context, all participants (both Kennedy and his audience) of the speech act, aesthetic argument, and the ethical implications. This version of the story embraces the many voices of the participants through first hand interviews and new oral history reports. Using evidence provided from actual participants in the 1968 Indianapolis event, this project reflects critically upon the world disclosure of the event as it emerges from those remembrances. Phenomenology provides one answer to the constitutive dilemma of rhetorical effectivity that stems from a lack of a framework that gets at questions of ethics, aesthetics, feelings, energy, etc. Thus, this work takes a pedagogical shift away from discourse (verbal/written) as the primary place to render judgments about the effects of communication interaction. With a turn to explore extra-sensory reasoning, by way of the physical, emotional, and numinous, a multi-dimensional look at public address is delivered. The rhetorician will be interested in new ways of assessing effects. The communication ethicist will appreciate the work as concepts like answerability, emotional-volitional tone, and care for the other, come to life via application and consideration of Kennedy's appearance. For argumentation scholars, the interest comes forth in a re-thinking of how we do argumentation. And the critical cultural scholar will find this story ripe with opportunities to uncover the politics of representation, racialized discourse, privilege, power, ideological hegemony, and reconciliation. Through an approach of multiple layers this real-life tale will expose the power of the presence among audience and speaker, emotive argument, as well as the magical turn of fate which all contributes the possibility of a dialogic rhetoric.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A contribuição da música no campo das ciências humanas vem sendo valorizado pelas ciências da saúde nas últimas décadas, favorecendo relações entre a Fonoaudiologia e a Musicoterapia. A avaliação da percepção musical busca compreender princípios básicos como a discriminação de timbres, melodias, ritmos, intensidade, altura, duração das notas, densidade, entre outros, além de conhecimentos inerentes em relação a audição, bem como as experiências musicais no decorrer da vida. O objetivo deste estudo foi elaborar um teste informatizado de avaliação do reconhecimento de melodias tradicionais brasileiras e verificar o desempenho de crianças com audição normal neste instrumento. Foi realizada a elaboração de um teste, denominado Avaliação do Reconhecimento de Melodias Tradicionais em Crianças normo-ouvintes (ARMTC), em formato de website, composto por 15 melodias tradicionais da cultura brasileira, gravadas com timbre sintetizado de piano, padronizadas com andamentos variáveis, intensidades similares, tonalidade de acordo com a partitura utilizada, reprodução de 12 segundos cada melodia e pausas de quatro segundos entre cada melodia. A casuística foi composta por 155 crianças, com faixa etária entre oito e 11 anos, de ambos os sexos, com limiares auditivos nas frequências de 500 Hz a 4000 Hz dentro dos padrões de normalidade e curva timpanométrica tipo A. Todas as crianças foram submetidas à triagem audiológica (frequências de 500 Hz, 1 KHz, 2 KHz e 4 KHz), Timpanometria e ao ARMTC. O ARMTC foi aplicado em campo livre com intensidade de 65 dBNA, com caixa de som posicionada a 0o azimute, à uma distância de um metro do participante que se manteve sentado. As crianças foram instruídas a clicar na tela do notebook no ícone correspondente ao nome e ilustração da melodia a qual ouviram e prosseguir dessa forma até o término das 15 melodias apresentadas. Na maioria das melodias selecionadas não houve diferença significante entre número de erros/acertos e tempo de reação quando estas variáveis foram correlacionadas ao sexo, idade e local em que o teste foi aplicado. As melodias mais reconhecidas foram: Cai, cai balão, Boi da cara preta, que teve igual score a Caranguejo, Escravos de Jó, O cravo, Parabéns a você e Marcha soldado, as quais obtiveram reconhecimento superior à 70% de acertos e a melodia com menor reconhecimento foi Capelinha de melão.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Three-page manuscript copy of the salutatory address composed in Latin by graduate Jonathan Trumbull for the 1759 Harvard Commencement. The item is dated June 29, 1759.