17 resultados para Speech Recognition System using LPC
em National Center for Biotechnology Information - NCBI
Resumo:
This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.
Resumo:
The high incidence of neurological disorders in patients afflicted with acquired immunodeficiency syndrome (AIDS) may result from human immunodeficiency virus type 1 (HIV-1) induction of chemotactic signals and cytokines within the brain by virus-encoded gene products. Transforming growth factor beta1 (TGF-beta1) is an immunomodulator and potent chemotactic molecule present at elevated levels in HIV-1-infected patients, and its expression may thus be induced by viral trans-activating proteins such as Tat. In this report, a replication-defective herpes simplex virus (HSV)-1 tat gene transfer vector, dSTat, was used to transiently express HIV-1 Tat in glial cells in culture and following intracerebral inoculation in mouse brain in order to directly determine whether Tat can increase TGF-beta1 mRNA expression. dSTat infection of Vero cells transiently transfected by a panel of HIV-1 long terminal repeat deletion mutants linked to the bacterial chloramphenicol acetyltransferase reporter gene demonstrated that vector-expressed Tat activated the long terminal repeat in a trans-activation response element-dependent fashion independent of the HSV-mediated induction of the HIV-1 enhancer, or NF-kappaB domain. Northern blot analysis of human astrocytic glial U87-MG cells transfected by dSTat vector DNA resulted in a substantial increase in steady-state levels of TGF-beta1 mRNA. Furthermore, intracerebral inoculation of dSTat followed by Northern blot analysis of whole mouse brain RNA revealed an increase in levels of TGF-beta1 mRNA similar to that observed in cultured glial cells transfected by dSTat DNA. These results provided direct in vivo evidence for the involvement of HIV-1 Tat in activation of TGF-beta1 gene expression in brain. Tat-mediated stimulation of TGF-beta1 expression suggests a novel pathway by which HIV-1 may alter the expression of cytokines in the central nervous system, potentially contributing to the development of AIDS-associated neurological disease.
Resumo:
This paper introduces the session on advanced speech recognition technology. The two papers comprising this session argue that current technology yields a performance that is only an order of magnitude in error rate away from human performance and that incremental improvements will bring us to that desired level. I argue that, to the contrary, present performance is far removed from human performance and a revolution in our thinking is required to achieve the goal. It is further asserted that to bring about the revolution more effort should be expended on basic research and less on trying to prematurely commercialize a deficient technology.
Resumo:
In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.
Resumo:
Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.
Resumo:
Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.
Resumo:
Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.
Resumo:
How a reacting system climbs through a transition state during the course of a reaction has been an intriguing subject for decades. Here we present and quantify a technique to identify and characterize local invariances about the transition state of an N-particle Hamiltonian system, using Lie canonical perturbation theory combined with microcanonical molecular dynamics simulation. We show that at least three distinct energy regimes of dynamical behavior occur in the region of the transition state, distinguished by the extent of their local dynamical invariance and regularity. Isomerization of a six-atom Lennard–Jones cluster illustrates this: up to energies high enough to make the system manifestly chaotic, approximate invariants of motion associated with a reaction coordinate in phase space imply a many-body dividing hypersurface in phase space that is free of recrossings even in a sea of chaos. The method makes it possible to visualize the stable and unstable invariant manifolds leading to and from the transition state, i.e., the reaction path in phase space, and how this regularity turns to chaos with increasing total energy of the system. This, in turn, illuminates a new type of phase space bottleneck in the region of a transition state that emerges as the total energy and mode coupling increase, which keeps a reacting system increasingly trapped in that region.
Resumo:
Successful cryopreservation of most multicompartmental biological systems has not been achieved. One prerequisite for success is quantitative information on cryoprotectant permeation into and amongst the compartments. This report describes direct measurements of cryoprotectant permeation into a multicompartmental system using chemical shift selective magnetic resonance (MR) microscopy and MR spectroscopy. We used the developing zebrafish embryo as a model for studying these complex systems because these embryos are composed of two membrane-limited compartments: (i) a large yolk (surrounded by the yolk syncytial layer) and (ii) differentiating blastoderm cells (each surrounded by a plasma membrane). MR images of the spatial distribution of three cryoprotectants (dimethyl sulfoxide, propylene glycol, and methanol) demonstrated that methanol permeated the entire embryo within 15 min. In contrast, the other cryoprotectants exhibited little or no permeation over 2.5 h. MR spectroscopy and microinjections of cryoprotectants into the yolk inferred that the yolk syncytial layer plays a critical role in limiting the permeation of some cryoprotectants throughout the embryo. This study demonstrates the power of MR technology combined with micromanipulation for elucidating key physiological factors in cryobiology.
Resumo:
The scientific bases for human-machine communication by voice are in the fields of psychology, linguistics, acoustics, signal processing, computer science, and integrated circuit technology. The purpose of this paper is to highlight the basic scientific and technological issues in human-machine communication by voice and to point out areas of future research opportunity. The discussion is organized around the following major issues in implementing human-machine voice communication systems: (i) hardware/software implementation of the system, (ii) speech synthesis for voice output, (iii) speech recognition and understanding for voice input, and (iv) usability factors related to how humans interact with machines.
Resumo:
Computer speech synthesis has reached a high level of performance, with increasingly sophisticated models of linguistic structure, low error rates in text analysis, and high intelligibility in synthesis from phonemic input. Mass market applications are beginning to appear. However, the results are still not good enough for the ubiquitous application that such technology will eventually have. A number of alternative directions of current research aim at the ultimate goal of fully natural synthetic speech. One especially promising trend is the systematic optimization of large synthesis systems with respect to formal criteria of evaluation. Speech recognition has progressed rapidly in the past decade through such approaches, and it seems likely that their application in synthesis will produce similar improvements.
Resumo:
The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding.
Resumo:
As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.
Resumo:
Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.
Resumo:
The deployment of systems for human-to-machine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.