7 resultados para Speech Recognition System using MFCC

em National Center for Biotechnology Information - NCBI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The high incidence of neurological disorders in patients afflicted with acquired immunodeficiency syndrome (AIDS) may result from human immunodeficiency virus type 1 (HIV-1) induction of chemotactic signals and cytokines within the brain by virus-encoded gene products. Transforming growth factor beta1 (TGF-beta1) is an immunomodulator and potent chemotactic molecule present at elevated levels in HIV-1-infected patients, and its expression may thus be induced by viral trans-activating proteins such as Tat. In this report, a replication-defective herpes simplex virus (HSV)-1 tat gene transfer vector, dSTat, was used to transiently express HIV-1 Tat in glial cells in culture and following intracerebral inoculation in mouse brain in order to directly determine whether Tat can increase TGF-beta1 mRNA expression. dSTat infection of Vero cells transiently transfected by a panel of HIV-1 long terminal repeat deletion mutants linked to the bacterial chloramphenicol acetyltransferase reporter gene demonstrated that vector-expressed Tat activated the long terminal repeat in a trans-activation response element-dependent fashion independent of the HSV-mediated induction of the HIV-1 enhancer, or NF-kappaB domain. Northern blot analysis of human astrocytic glial U87-MG cells transfected by dSTat vector DNA resulted in a substantial increase in steady-state levels of TGF-beta1 mRNA. Furthermore, intracerebral inoculation of dSTat followed by Northern blot analysis of whole mouse brain RNA revealed an increase in levels of TGF-beta1 mRNA similar to that observed in cultured glial cells transfected by dSTat DNA. These results provided direct in vivo evidence for the involvement of HIV-1 Tat in activation of TGF-beta1 gene expression in brain. Tat-mediated stimulation of TGF-beta1 expression suggests a novel pathway by which HIV-1 may alter the expression of cytokines in the central nervous system, potentially contributing to the development of AIDS-associated neurological disease.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces the session on advanced speech recognition technology. The two papers comprising this session argue that current technology yields a performance that is only an order of magnitude in error rate away from human performance and that incremental improvements will bring us to that desired level. I argue that, to the contrary, present performance is far removed from human performance and a revolution in our thinking is required to achieve the goal. It is further asserted that to bring about the revolution more effort should be expended on basic research and less on trying to prematurely commercialize a deficient technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.