15 resultados para cross-language speaker recognition

em National Center for Biotechnology Information - NCBI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In three experiments, electric brain waves of 19 subjects were recorded under several different experimental conditions for two purposes. One was to test how well we could recognize which sentence, from a set of 24 or 48 sentences, was being processed in the cortex. The other was to study the invariance of brain waves between subjects. As in our earlier work, the analysis consisted of averaging over trials to create prototypes and test samples, to both of which Fourier transforms were applied, followed by filtering and an inverse transformation to the time domain. A least-squares criterion of fit between prototypes and test samples was used for classification. In all three experiments, averaging over subjects improved the recognition rates. The most significant finding was the following. When brain waves were averaged separately for two nonoverlapping groups of subjects, one for prototypes and the other for test samples, we were able to recognize correctly 90% of the brain waves generated by 48 different sentences about European geography.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The immunodominant, CD8+ cytotoxic T lymphocyte (CTL) response to the HLA-B8-restricted peptide, RAKFKQLL, located in the Epstein–Barr virus immediate-early antigen, BZLF1, is characterized by a diverse T cell receptor (TCR) repertoire. Here, we show that this diversity can be partitioned on the basis of crossreactive cytotoxicity patterns involving the recognition of a self peptide—RSKFRQIV—located in a serine/threonine kinase and a bacterial peptide—RRKYKQII—located in Staphylococcus aureus replication initiation protein. Thus CTL clones that recognized the viral, self, and bacterial peptides expressed a highly restricted αβ TCR phenotype. The CTL clones that recognized viral and self peptides were more oligoclonal, whereas clones that strictly recognized the viral peptide displayed a diverse TCR profile. Interestingly, the self and bacterial peptides equally were substantially less effective than the cognate viral peptide in sensitizing target cell lysis, and also resulted only in a weak reactivation of memory CTLs in limiting dilution assays, whereas the cognate peptide was highly immunogenic. The described crossreactions show that human antiviral, CD8+ CTL responses can be shaped by peptide ligands derived from autoantigens and environmental bacterial antigens, thereby providing a firm structural basis for molecular mimicry involving class I-restricted CTLs in the pathogenesis of autoimmune disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Electrical and magnetic brain waves of two subjects were recorded for the purpose of recognizing which one of 12 sentences or seven words auditorily presented was processed. The analysis consisted of averaging over trials to create prototypes and test samples, to each of which a Fourier transform was applied, followed by filtering and an inverse transformation to the time domain. The filters used were optimal predictive filters, selected for each subject. A still further improvement was obtained by taking differences between recordings of two electrodes to obtain bipolar pairs that then were used for the same analysis. Recognition rates, based on a least-squares criterion, varied, but the best were above 90%. The first words of prototypes of sentences also were cut and pasted to test, at least partially, the invariance of a word’s brain wave in different sentence contexts. The best result was above 80% correct recognition. Test samples made up only of individual trials also were analyzed. The best result was 134 correct of 288 (47%), which is promising, given that the expected recognition number by chance is just 24 (or 8.3%). The work reported in this paper extends our earlier work on brain-wave recognition of words only. The recognition rates reported here further strengthen the case that recordings of electric brain waves of words or sentences, together with extensive mathematical and statistical analysis, can be the basis of new developments in our understanding of brain processing of language.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hepatitis B virus (HBV) infection is thought to be controlled by virus-specific cytotoxic T lymphocytes (CTL). We have recently shown that HBV-specific CTL can abolish HBV replication noncytopathically in the liver of transgenic mice by secreting tumor necrosis factor alpha (TNF-alpha) and interferon gamma (IFN-gamma) after antigen recognition. We now demonstrate that hepatocellular HBV replication is also abolished noncytopathically during lymphocytic choriomeningitis virus (LCMV) infection, and we show that this process is mediated by TNF-alpha and IFN-alpha/beta produced by LCMV-infected hepatic macrophages. These results confirm the ability of these inflammatory cytokines to abolish HBV replication; they elucidate the mechanism likely to be responsible for clearance of HBV in chronically infected patients who become superinfected by other hepatotropic viruses; they suggest that pharmacological activation of intrahepatic macrophages may have therapeutic value in chronic HBV infection; and they raise the possibility that conceptually similar events may be operative in other viral infections as well.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper provides an overview of the colloquium's discussion session on natural language understanding, which followed presentations by M. Bates [Bates, M. (1995) Proc. Natl. Acad. Sci. USA 92, 9977-9982] and R. C. Moore [Moore, R. C. (1995) Proc. Natl. Acad. Sci. USA 92, 9983-9988]. The paper reviews the dual role of language processing in providing understanding of the spoken input and an additional source of constraint in the recognition process. To date, language processing has successfully provided understanding but has provided only limited (and computationally expensive) constraint. As a result, most current systems use a loosely coupled, unidirectional interface, such as N-best or a word network, with natural language constraints as a postprocess, to filter or resort the recognizer output. However, the level of discourse context provides significant constraint on what people can talk about and how things can be referred to; when the system becomes an active participant, it can influence this order. But sources of discourse constraint have not been extensively explored, in part because these effects can only be seen by studying systems in the context of their use in interactive problem solving. This paper argues that we need to study interactive systems to understand what kinds of applications are appropriate for the current state of technology and how the technology can move from the laboratory toward real applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The deployment of systems for human-to-machine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Because of variations in tRNA sequences in evolution, tRNA synthetases either do not acylate their cognate tRNAs from other organisms or execute misacylations which can be deleterious in vivo. We report here the cloning and primary sequence of a 958-aa Saccharomyces cerevisiae alanyl-tRNA synthetase. The enzyme is a close homologue of the human and Escherichia coli enzymes, particularly in the region of the primary structure needed for aminoacylation of RNA duplex substrates based on alanine tRNA acceptor stems with a G3.U70 base pair. An ala1 disrupted allele demonstrated that the gene is essential and that, therefore, ALA1 encodes an enzyme required for cytoplasmic protein synthesis. Growth of cells harboring the ala1 disrupted allele was restored by a cDNA clone encoding human alanyl-tRNA synthetase, which is a serum antigen for many polymyositis-afflicted individuals. The human enzyme in extracts from rescued yeast was detected with autoimmune antibodies from a polymyositis patient. We conclude that, in spite of substantial differences between human and yeast tRNA sequences in evolution, strong conservation of the G3.U70 system of recognition is sufficient to yield accurate aminoacylation in vivo across wide species distances.