31 resultados para Proportional apparent error rate

em Cambridge University Engineering Department Publications Database


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The non-deterministic relationship between Bit Error Rate and Packet Error Rate is demonstrated for an optical media access layer in common use. We show that frequency components of coded, non-random data can cause this relationship. © 2005 Optical Society of America.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel method for the training of a complementary acoustic model with respect to set of given acoustic models. The method is based upon an extension of the Minimum Phone Error (MPE) criterion and aims at producing a model that makes complementary phone errors to those already trained. The technique is therefore called Complementary Phone Error (CPE) training. The method is evaluated using an Arabic large vocabulary continuous speech recognition task. Reductions in word error rate (WER) after combination with a CPE-trained system were obtained with up to 0.7% absolute for a system trained on 172 hours of acoustic data and up to 0.2% absolute for the final system trained on nearly 2000 hours of Arabic data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of blind multiuser detection. We adopt a Bayesian approach where unknown parameters are considered random and integrated out. Computing the maximum a posteriori estimate of the input data sequence requires solving a combinatorial optimization problem. We propose here to apply the Cross-Entropy method recently introduced by Rubinstein. The performance of cross-entropy is compared to Markov chain Monte Carlo. For similar Bit Error Rate performance, we demonstrate that Cross-Entropy outperforms a generic Markov chain Monte Carlo method in terms of operation time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute. © 2007 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In speech recognition systems language model (LMs) are often constructed by training and combining multiple n-gram models. They can be either used to represent different genres or tasks found in diverse text sources, or capture stochastic properties of different linguistic symbol sequences, for example, syllables and words. Unsupervised LM adaptation may also be used to further improve robustness to varying styles or tasks. When using these techniques, extensive software changes are often required. In this paper an alternative and more general approach based on weighted finite state transducers (WFSTs) is investigated for LM combination and adaptation. As it is entirely based on well-defined WFST operations, minimum change to decoding tools is needed. A wide range of LM combination configurations can be flexibly supported. An efficient on-the-fly WFST decoding algorithm is also proposed. Significant error rate gains of 7.3% relative were obtained on a state-of-the-art broadcast audio recognition task using a history dependently adapted multi-level LM modelling both syllable and word sequences. ©2010 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the development of the CU-HTK Mandarin Speech-To-Text (STT) system and assesses its performance as part of a transcription-translation pipeline which converts broadcast Mandarin audio into English text. Recent improvements to the STT system are described and these give Character Error Rate (CER) gains of 14.3% absolute for a Broadcast Conversation (BC) task and 5.1% absolute for a Broadcast News (BN) task. The output of these STT systems is then post-processed, so that it consists of sentence-like segments, and translated into English text using a Statistical Machine Translation (SMT) system. The performance of the transcription-translation pipeline is evaluated using the Translation Edit Rate (TER) and BLEU metrics. It is shown that improving both the STT system and the post-STT segmentations can lower the TER scores by up to 5.3% absolute and increase the BLEU scores by up to 2.7% absolute. © 2007 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

State-of-the-art large vocabulary continuous speech recognition (LVCSR) systems often combine outputs from multiple subsystems developed at different sites. Cross system adaptation can be used as an alternative to direct hypothesis level combination schemes such as ROVER. In normal cross adaptation it is assumed that useful diversity among systems exists only at acoustic level. However, complimentary features among complex LVCSR systems also manifest themselves in other layers of modelling hierarchy, e.g., subword and word level. It is thus interesting to also cross adapt language models (LM) to capture them. In this paper cross adaptation of multi-level LMs modelling both syllable and word sequences was investigated to improve LVCSR system combination. Significant error rate gains up to 6.7% rel. were obtained over ROVER and acoustic model only cross adaptation when combining 13 Chinese LVCSR subsystems used in the 2010 DARPA GALE evaluation. © 2010 ISCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We demonstrate a record 150km transmission of microwave signals by a directly-modulated radio-over-fiber link with a bit-error-rate of less than 10-12. Cascaded semiconductor optical amplifiers are employed in this link to extend the transmission link length. © 2005 Optical Society of America.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of 40 Gbit/s optical time-division multiplexed (OTDM) communication systems can be severely limited when the extinction ratio of the optical pulses is low. This is a consequence of the coherent interference noise between individual OTDM channels. When taken alone, the multiple quantum well-distributed feedback laser+dispersion compensating fiber source exhibits a relatively poor extinction ratio which impairs its potential for use in a 40 Gbit/s OTDM system. However, with the addition of an electroabsorption modulator to suppress the pulse pedestals to better than 30 dB extinction, coherent interference noise is reduced, the bit-error-rate performance is greatly improved, and the source shows good potential for 40 Gbit/s OTDM communication.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Fabry-Perot laser source operating at 1300 nm was modulated at 2.5 Gb/s with a 27-1 pseudo-random bit sequence. Three techniques were examined for increasing the bandwidth of optical links using multimode fiber (MMF). With an offset launch of 14 μm, the eye remained open after the 2 km link of 50 μm core MMF containing seven connectors and three splices. An approximate four-fold bandwidth improvement was obtained using the offset launch with a bandwidth-length product of 7.5 Gb/s.km and a bit error rate below 10-10. The bandwidth enhancement was stable against environmental influences on the fiber link, such as mechanical agitation. Detailed simulations demonstrated that the technique allows enhanced operating bandwidths in over 99% of existing link.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifications of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A parallel processing network derived from Kanerva's associative memory theory Kanerva 1984 is shown to be able to train rapidly on connected speech data and recognize further speech data with a label error rate of 0·68%. This modified Kanerva model can be trained substantially faster than other networks with comparable pattern discrimination properties. Kanerva presented his theory of a self-propagating search in 1984, and showed theoretically that large-scale versions of his model would have powerful pattern matching properties. This paper describes how the design for the modified Kanerva model is derived from Kanerva's original theory. Several designs are tested to discover which form may be implemented fastest while still maintaining versatile recognition performance. A method is developed to deal with the time varying nature of the speech signal by recognizing static patterns together with a fixed quantity of contextual information. In order to recognize speech features in different contexts it is necessary for a network to be able to model disjoint pattern classes. This type of modelling cannot be performed by a single layer of links. Network research was once held back by the inability of single-layer networks to solve this sort of problem, and the lack of a training algorithm for multi-layer networks. Rumelhart, Hinton & Williams 1985 provided one solution by demonstrating the "back propagation" training algorithm for multi-layer networks. A second alternative is used in the modified Kanerva model. A non-linear fixed transformation maps the pattern space into a space of higher dimensionality in which the speech features are linearly separable. A single-layer network may then be used to perform the recognition. The advantage of this solution over the other using multi-layer networks lies in the greater power and speed of the single-layer network training algorithm. © 1989.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One important issue in designing state-of-the-art LVCSR systems is the choice of acoustic units. Context dependent (CD) phones remain the dominant form of acoustic units. They can capture the co-articulatory effect in speech via explicit modelling. However, for other more complicated phonological processes, they rely on the implicit modelling ability of the underlying statistical models. Alternatively, it is possible to construct acoustic models based on higher level linguistic units, for example, syllables, to explicitly capture these complex patterns. When sufficient training data is available, this approach may show an advantage over implicit acoustic modelling. In this paper a wide range of acoustic units are investigated to improve LVCSR system performance. Significant error rate gains up to 7.1% relative (0.8% abs.) were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using word and syllable position dependent triphone and quinphone models. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A scalable multi-channel optical regenerative bus architecture based on the use of polymer waveguides is presented for the first time. The architecture offers high-speed interconnection between electrical cards allowing regenerative bus extension with multiple segments and therefore connection of an arbitrary number of cards onto the bus. In a proof-ofprinciple demonstration, a 4-channel 3-card polymeric bus module is designed and fabricated on standard FR4 substrates. Low insertion losses (≤ -15 dB) and low crosstalk values (< -30 dB) are achieved for the fabricated samples while better than ± 6 μm -1 dB alignment tolerances are obtained. 10 Gb/s data communication with a bit-error-rate (BER) lower than 10-12 is demonstrated for the first time between card interfaces on two different bus modules using a prototype 3R regenerator. © 2012 Optical Society of America.