846 resultados para Bit error rate
Resumo:
In this article, we aim at reducing the error rate of the online Tamil symbol recognition system by employing multiple experts to reevaluate certain decisions of the primary support vector machine classifier. Motivated by the relatively high percentage of occurrence of base consonants in the script, a reevaluation technique has been proposed to correct any ambiguities arising in the base consonants. Secondly, a dynamic time-warping method is proposed to automatically extract the discriminative regions for each set of confused characters. Class-specific features derived from these regions aid in reducing the degree of confusion. Thirdly, statistics of specific features are proposed for resolving any confusions in vowel modifiers. The reevaluation approaches are tested on two databases (a) the isolated Tamil symbols in the IWFHR test set, and (b) the symbols segmented from a set of 10,000 Tamil words. The recognition rate of the isolated test symbols of the IWFHR database improves by 1.9 %. For the word database, the incorporation of the reevaluation step improves the symbol recognition rate by 3.5 % (from 88.4 to 91.9 %). This, in turn, boosts the word recognition rate by 11.9 % (from 65.0 to 76.9 %). The reduction in the word error rate has been achieved using a generic approach, without the incorporation of language models.
Resumo:
A design methodology based on the Minimum Bit Error Ratio (MBER) framework is proposed for a non-regenerative Multiple-Input Multiple-Output (MIMO) relay-aided system to determine various linear parameters. We consider both the Relay-Destination (RD) as well as the Source-Relay-Destination (SRD) link design based on this MBER framework, including the pre-coder, the Amplify-and-Forward (AF) matrix and the equalizer matrix of our system. It has been shown in the previous literature that MBER based communication systems are capable of reducing the Bit-Error-Ratio (BER) compared to their Linear Minimum Mean Square Error (LMMSE) based counterparts. We design a novel relay-aided system using various signal constellations, ranging from QPSK to the general M-QAM and M-PSK constellations. Finally, we propose its sub-optimal versions for reducing the computational complexity imposed. Our simulation results demonstrate that the proposed scheme indeed achieves a significant BER reduction over the existing LMMSE scheme.
Resumo:
In this letter, we quantify the transmit diversity order of the SM system operating in a closed-loop scenario. Specifically, the SM system relying on Euclidean distance based antenna subset selection (EDAS) is considered and the achievable diversity gain is evaluated. Furthermore, the resultant trade-off between the achievable diversity gain and switching gain is studied. Simulation results confirm our theoretical results. Specifically, at a symbol error rate of about 10(-4) the signal-to-noise ratio gain achieved by EDAS is about 7 dB in case of 16-QAM and about 5 dB in case of 64-QAM.
Resumo:
This paper considers the problem of receive antenna selection (AS) in a multiple-antenna communication system having a single radio-frequency (RF) chain. The AS decisions are based on noisy channel estimates obtained using known pilot symbols embedded in the data packets. The goal here is to minimize the average packet error rate (PER) by exploiting the known temporal correlation of the channel. As the underlying channels are only partially observed using the pilot symbols, the problem of AS for PER minimization is cast into a partially observable Markov decision process (POMDP) framework. Under mild assumptions, the optimality of a myopic policy is established for the two-state channel case. Moreover, two heuristic AS schemes are proposed based on a weighted combination of the estimated channel states on the different antennas. These schemes utilize the continuous valued received pilot symbols to make the AS decisions, and are shown to offer performance comparable to the POMDP approach, which requires one to quantize the channel and observations to a finite set of states. The performance improvement offered by the POMDP solution and the proposed heuristic solutions relative to existing AS training-based approaches is illustrated using Monte Carlo simulations.
Resumo:
Speech polarity detection is a crucial first step in many speech processing techniques. In this paper, an algorithm is proposed that improvises the existing technique using the skewness of the voice source (VS) signal. Here, the integrated linear prediction residual (ILPR) is used as the VS estimate, which is obtained using linear prediction on long-term frames of the low-pass filtered speech signal. This excludes the unvoiced regions from analysis and also reduces the computation. Further, a modified skewness measure is proposed for decision, which also considers the magnitude of the skewness of the ILPR along with its sign. With the detection error rate (DER) as the performance metric, the algorithm is tested on 8 large databases and its performance (DER=0.20%) is found to be comparable to that of the best technique (DER=0.06%) on both clean and noisy speech. Further, the proposed method is found to be ten times faster than the best technique.
Resumo:
We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.
Discriminative language model adaptation for Mandarin broadcast speech transcription and translation
Resumo:
This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute. © 2007 IEEE.
Resumo:
In speech recognition systems language model (LMs) are often constructed by training and combining multiple n-gram models. They can be either used to represent different genres or tasks found in diverse text sources, or capture stochastic properties of different linguistic symbol sequences, for example, syllables and words. Unsupervised LM adaptation may also be used to further improve robustness to varying styles or tasks. When using these techniques, extensive software changes are often required. In this paper an alternative and more general approach based on weighted finite state transducers (WFSTs) is investigated for LM combination and adaptation. As it is entirely based on well-defined WFST operations, minimum change to decoding tools is needed. A wide range of LM combination configurations can be flexibly supported. An efficient on-the-fly WFST decoding algorithm is also proposed. Significant error rate gains of 7.3% relative were obtained on a state-of-the-art broadcast audio recognition task using a history dependently adapted multi-level LM modelling both syllable and word sequences. ©2010 IEEE.
Resumo:
Resumen: Se estudió el efecto de incluir detalles biográficos seductores en un texto expositivo de ciencias. 66 estudiantes con bajo conocimiento sobre el tema leyeron el texto sin detalle seductor o con una anécdota biográfica interesante. El interés asociado a los materiales fue examinado en un estudio preliminar. Se recolectaron medidas de retención (recuerdo del texto y verificación de afirmaciones literales), comprensión (verificación de afirmaciones inferidas) y síntesis del texto (selección de un título). Los resultados indicaron que la condición que recibió el detalle biográfico tuvo más dificultades para recordar los contenidos y para contestar afirmaciones de la sección del texto próxima al detalle. Estos resultados son interpretados a la luz de la hipótesis de la integración desviada.
Resumo:
This paper describes the development of the CU-HTK Mandarin Speech-To-Text (STT) system and assesses its performance as part of a transcription-translation pipeline which converts broadcast Mandarin audio into English text. Recent improvements to the STT system are described and these give Character Error Rate (CER) gains of 14.3% absolute for a Broadcast Conversation (BC) task and 5.1% absolute for a Broadcast News (BN) task. The output of these STT systems is then post-processed, so that it consists of sentence-like segments, and translated into English text using a Statistical Machine Translation (SMT) system. The performance of the transcription-translation pipeline is evaluated using the Translation Edit Rate (TER) and BLEU metrics. It is shown that improving both the STT system and the post-STT segmentations can lower the TER scores by up to 5.3% absolute and increase the BLEU scores by up to 2.7% absolute. © 2007 IEEE.
Resumo:
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.
Resumo:
State-of-the-art large vocabulary continuous speech recognition (LVCSR) systems often combine outputs from multiple subsystems developed at different sites. Cross system adaptation can be used as an alternative to direct hypothesis level combination schemes such as ROVER. In normal cross adaptation it is assumed that useful diversity among systems exists only at acoustic level. However, complimentary features among complex LVCSR systems also manifest themselves in other layers of modelling hierarchy, e.g., subword and word level. It is thus interesting to also cross adapt language models (LM) to capture them. In this paper cross adaptation of multi-level LMs modelling both syllable and word sequences was investigated to improve LVCSR system combination. Significant error rate gains up to 6.7% rel. were obtained over ROVER and acoustic model only cross adaptation when combining 13 Chinese LVCSR subsystems used in the 2010 DARPA GALE evaluation. © 2010 ISCA.
Resumo:
Neurons in the primate lateral intraparietal area (area LIP) carry visual, saccade-related and eye position activities. The visual and saccade activities are anchored in a retinotopic framework and the overall response magnitude is modulated by eye position. It was proposed that the modulation by eye position might be the basis of a distributed coding of target locations in a head-centered space. Other recording studies demonstrated that area LIP is involved in oculomotor planning. These results overall suggest that area LIP transforms sensory information for motor functions. In this thesis I further explore the role of area LIP in processing saccadic eye movements by observing the effects of reversible inactivation of this area. Macaque monkeys were trained to do visually guided and memory saccades and a double saccade task to examine the use of eye position signal. Finally, by intermixing visual saccades with trials in which two targets were presented at opposite sides of the fixation point, I examined the behavior of visual extinction.
In chapter 2, I will show that lesion of area LIP results in increased latency of contralesional visual and memory saccades. Contralesional memory saccades are also hypometric and slower in velocity. Moreover, the impairment of memory saccades does not vary with the duration of the delay period. This suggests that the oculomotor deficits observed after inactivation of area LIP is not due to the disruption of spatial memory.
In chapter 3, I will show that lesion of area LIP does not severely affect the processing of spontaneous eye movement. However, the monkeys made fewer contralesional saccades and tended to confine their gaze to the ipsilesional field after inactivation of area LIP. On the other hand, lesion of area LIP results in extinction of the contralesional stimulus. When the initial fixation position was varied so that the retinal and spatial locations of the targets could be dissociated, it was found that the extinction behavior could best be described in a head-centered coordinate.
In chapter 4, I will show that inactivation of area LIP disrupts the use of eye position signal to compute the second movement correctly in the double saccade task. If the first saccade steps into the contralesional field, the error rate and latency of the second saccade are both increased. Furthermore, the direction of the first eye movement largely does not have any effect on the impairment of the second saccade. I will argue that this study provides important evidence that the extraretinal signal used for saccadic localization is eye position rather than a displacement vector.
In chapter 5, I will demonstrate that in parietal monkeys the eye drifts toward the lesion side at the end of the memory saccade in darkness. This result suggests that the eye position activity in the posterior parietal cortex is active in nature and subserves gaze holding.
Overall, these results further support the view that area LIP neurons encode spatial locations in a craniotopic framework and is involved in processing voluntary eye movements.
Resumo:
针对光纤自身对光谱的非线性衰减的影响,提出了一种基于傅里叶变换的光谱校正方法。首先对是否经过光纤的两种情况下光电倍增管的输出电流进行傅里叶变换,得到光谱频域中的校正函数,然后通过傅里叶逆变换得到光谱域中的校正函数。为实现测试,建立一个光电检测系统,在可见光范围内进行测试。分别对是否带有光纤的两种情况下数据进行采集,使用该校正方法可以使光线的衰减得到较好的修正,误差小于1.54%。实验结果表明,该方法对特定的光纤传导系统的光谱非线性衰减有较好的校正效果。
Resumo:
Riboflavin is employed as the photosensitizer of a novel photopolyrner material for holographic recording, This material has a broad absorption spectrum range (More than 200nm) due to the addition of this dye. The experimental results show that our material has high diffraction efficiency and large refractive index modulation. The maximum diffraction efficiency of the photopolymer is about 56%. The digital data pages are stored in this medium and the reconstructed data page has a good fidelity, with the bit-error-ratio of about 1.8 X 10(-4). it is found that the photopolymer material is suitable for high-density volume holographic digital storage.