914 resultados para robust speech recognition


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this letter, a speech recognition algorithm based on the least-squares method is presented. Particularly, the intention is to exemplify how such a traditional numerical technique can be applied to solve a signal processing problem that is usually treated by using more elaborated formulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces the session on advanced speech recognition technology. The two papers comprising this session argue that current technology yields a performance that is only an order of magnitude in error rate away from human performance and that incremental improvements will bring us to that desired level. I argue that, to the contrary, present performance is far removed from human performance and a revolution in our thinking is required to achieve the goal. It is further asserted that to bring about the revolution more effort should be expended on basic research and less on trying to prematurely commercialize a deficient technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis addresses the viability of automatic speech recognition for control room systems; with careful system design, automatic speech recognition (ASR) devices can be useful means for human computer interaction in specific types of task. These tasks can be defined as complex verbal activities, such as command and control, and can be paired with spatial tasks, such as monitoring, without detriment. It is suggested that ASR use be confined to routine plant operation, as opposed the critical incidents, due to possible problems of stress on the operators' speech.  It is proposed that using ASR will require operators to adapt a commonly used skill to cater for a novel use of speech. Before using the ASR device, new operators will require some form of training. It is shown that a demonstration by an experienced user of the device can lead to superior performance than instructions. Thus, a relatively cheap and very efficient form of operator training can be supplied by demonstration by experienced ASR operators. From a series of studies into speech based interaction with computers, it is concluded that the interaction be designed to capitalise upon the tendency of operators to use short, succinct, task specific styles of speech. From studies comparing different types of feedback, it is concluded that operators be given screen based feedback, rather than auditory feedback, for control room operation. Feedback will take two forms: the use of the ASR device will require recognition feedback, which will be best supplied using text; the performance of a process control task will require task feedback integrated into the mimic display. This latter feedback can be either textual or symbolic, but it is suggested that symbolic feedback will be more beneficial. Related to both interaction style and feedback is the issue of handling recognition errors. These should be corrected by simple command repetition practices, rather than use error handling dialogues. This method of error correction is held to be non intrusive to primary command and control operations. This thesis also addresses some of the problems of user error in ASR use, and provides a number of recommendations for its reduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition technology is regarded as a key enabler for increasing the usability of applications deployed on mobile devices -- devices which are becoming increasingly prevalent in modern hospital-based healthcare. Although the use of speech recognition is not new to the hospital-based healthcare domain, its use with mobile devices has thus far been limited. This paper presents the results of a literature review we conducted in order to observe the manner in which speech recognition technology has been used in hospital-based healthcare and to gain an understanding of how this technology is being evaluated, in terms of its dependability and reliability, in healthcare settings. Our intent is that this review will help identify scope for future uses of speech recognition technologies in the healthcare domain, as well as to identify implications for the meaningful evaluation of such technologies given the specific context of use.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition technology is regarded as a key enabler for increasing the usability of applications deployed on mobile devices -- devices which are becoming increasingly prevalent in modern hospital-based healthcare. Although the use of speech recognition is not new to the hospital-based healthcare domain, its use with mobile devices has thus far been limited. This paper presents the results of a literature review we conducted in order to observe the manner in which speech recognition technology has been used in hospital-based healthcare and to gain an understanding of how this technology is being evaluated, in terms of its dependability and reliability, in healthcare settings. Our intent is that this review will help identify scope for future uses of speech recognition technologies in the healthcare domain, as well as to identify implications for the meaningful evaluation of such technologies given the specific context of use.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a speech recognition engine using hybrid model of Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM). Both the models have been trained independently and the respective likelihood values have been considered jointly and input to a decision logic which provides net likelihood as the output. This hybrid model has been compared with the HMM model. Training and testing has been done by using a database of 20 Hindi words spoken by 80 different speakers. Recognition rates achieved by normal HMM are 83.5% and it gets increased to 85% by using the hybrid approach of HMM and GMM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.