Biblioteca Digital

373 resultados para speech features

Development of the 2003 CU-HTK conversational telephone speech transcription system

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the development of the 2003 CU-HTK large vocabulary speech recognition system for Conversational Telephone Speech (CTS). The system was designed based on a multi-pass, multi-branch structure where the output of all branches is combined using system combination. A number of advanced modelling techniques such as Speaker Adaptive Training, Heteroscedastic Linear Discriminant Analysis, Minimum Phone Error estimation and specially constructed Single Pronunciation dictionaries were employed. The effectiveness of each of these techniques and their potential contribution to the result of system combination was evaluated in the framework of a state-of-the-art LVCSR system with sophisticated adaptation. The final 2003 CU-HTK CTS system constructed from some of these models is described and its performance on the DARPA/NIST 2003 Rich Transcription (RT-03) evaluation test set is discussed.

Improving speech transcription for Mandarin-English translation

Relevância:

20.00% 20.00%

Publicador:

On the Benefits of Confidence Visualization in Speech Recognition

Relevância:

20.00% 20.00%

Publicador:

Speech Dasher: Fast Writing using Speech and Gaze

Relevância:

20.00% 20.00%

Publicador:

Computational study of viscous effects on lobed mixer flow features and performance

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article describes a computational study of viscous effects on lobed mixer flowfields. The computations, which were carried out using a compressible, three-dimensional, unstructured-mesh Navier-Stokes solver, were aimed at assessing the impacts on mixer performance of inlet boundary-layer thickness and boundary-layer separation within the lobe. The geometries analyzed represent a class of lobed mixer configurations used in turbofan engines. Parameters investigated included lobe penetration angles from 22 to 45 deg, stream-to-stream velocity ratios from 0.5 to 1.0, and two inlet boundary-layer displacement thicknesses. The results show quantitatively the increasing influence of viscous effects as lobe penetration angle is increased. It is shown that the simple estimate of shed circulation given by Skebe et al. (Experimental Investigation of Three-Dimensional Forced Mixer Lobe Flow Field, AIAA Paper 88-3785, July, 1988) can be extended even to situations in which the flow is separated, provided an effective mixer exit angle and height are defined. An examination of different loss sources is also carried out to illustrate the relative contributions of mixing loss and of boundary-layer viscous effects in cases of practical interest.

Turbulence modeling for flows around convex features giving rapid eddy distortion

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reynolds averaged Navier-Stokes model performances in the stagnation and wake regions for turbulent flows with relatively large Lagrangian length scales (generally larger than the scale of geometrical features) approaching small cylinders (both square and circular) is explored. The effective cylinder (or wire) diameter based Reynolds number, ReW ≤ 2.5 × 103. The following turbulence models are considered: a mixing-length; standard Spalart and Allmaras (SA) and streamline curvature (and rotation) corrected SA (SARC); Secundov's νt-92; Secundov et al.'s two equation νt-L; Wolfshtein's k-l model; the Explicit Algebraic Stress Model (EASM) of Abid et al.; the cubic model of Craft et al.; various linear k-ε models including those with wall distance based damping functions; Menter SST, k-ω and Spalding's LVEL model. The use of differential equation distance functions (Poisson and Hamilton-Jacobi equation based) for palliative turbulence modeling purposes is explored. The performance of SA with these distance functions is also considered in the sharp convex geometry region of an airfoil trailing edge. For the cylinder, with ReW ≈ 2.5 × 103 the mixing length and k-l models give strong turbulence production in the wake region. However, in agreement with eddy viscosity estimates, the LVEL and Secundov νt-92 models show relatively little cylinder influence on turbulence. On the other hand, two equation models (as does the one equation SA) suggest the cylinder gives a strong turbulence deficit in the wake region. Also, for SA, an order or magnitude cylinder diameter decrease from ReW = 2500 to 250 surprisingly strengthens the cylinder's disruptive influence. Importantly, results for ReW ≪ 250 are virtually identical to those for ReW = 250 i.e. no matter how small the cylinder/wire its influence does not, as it should, vanish. Similar tests for the Launder-Sharma k-ε, Menter SST and k-ω show, in accordance with physical reality, the cylinder's influence diminishing albeit slowly with size. Results suggest distance functions palliate the SA model's erroneous trait and improve its predictive performance in wire wake regions. Also, results suggest that, along the stagnation line, such functions improve the SA, mixing length, k-l and LVEL results. For the airfoil, with SA, the larger Poisson distance function increases the wake region turbulence levels by just under 5%. © 2007 Elsevier Inc. All rights reserved.

Discriminative language model adaptation for Mandarin broadcast speech transcription and translation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute. © 2007 IEEE.

Exploiting Chinese character models to improve speech recognition performance

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Chinese language is based on characters which are syllabic in nature. Since languages have syllabotactic rules which govern the construction of syllables and their allowed sequences, Chinese character sequence models can be used as a first level approximation of allowed syllable sequences. N-gram character sequence models were trained on 4.3 billion characters. Characters are used as a first level recognition unit with multiple pronunciations per character. For comparison the CU-HTK Mandarin word based system was used to recognize words which were then converted to character sequences. The character only system error rates for one best recognition were slightly worse than word based character recognition. However combining the two systems using log-linear combination gives better results than either system separately. An equally weighted combination gave consistent CER gains of 0.1-0.2% absolute over the word based standard system. Copyright © 2009 ISCA.

Robust noise reduction for speech and audio signals

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Statistical model-based methods are presented for the reconstruction of autocorrelated signals in impulsive plus continuous noise environments. Signals are modelled as autoregressive and noise sources as discrete and continuous mixtures of Gaussians, allowing for robustness in highly impulsive and non-Gaussian environments. Markov Chain Monte Carlo methods are used for reconstruction of the corrupted waveforms within a Bayesian probabilistic framework and results are presented for contaminated voice and audio signals.

Coarse-to-fine vision-based localization by indexing scale-invariant features

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel coarse-to-fine global localization approach inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by scale-invariant transformation feature descriptors are used as natural landmarks. They are indexed into two databases: a location vector space model (LVSM) and a location database. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the LVSM is fast, but not accurate enough, whereas localization from the location database using a voting algorithm is relatively slow, but more accurate. The integration of coarse and fine stages makes fast and reliable localization possible. If necessary, the localization result can be verified by epipolar geometry between the representative view in the database and the view to be localized. In addition, the localization system recovers the position of the camera by essential matrix decomposition. The localization system has been tested in indoor and outdoor environments. The results show that our approach is efficient and reliable. © 2006 IEEE.

Image-based localization and pose recovery using scale invariant features

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a vision based mobile robot localization strategy. Local scale-invariant features are used as natural landmarks in unstructured and unmodified environment. The local characteristics of the features we use prove to be robust to occlusion and outliers. In addition, the invariance of the features to viewpoint change makes them suitable landmarks for mobile robot localization. Scale-invariant features detected in the first exploration are indexed into a location database. Indexing and voting allow efficient recognition of global localization. The localization result is verified by epipolar geometry between the representative view in database and the view to be localized, thus the probability of false localization will be decreased. The localization system can recover the pose of the camera mounted on the robot by essential matrix decomposition. Then the position of the robot can be computed easily. Both calibrated and un-calibrated cases are discussed and relative position estimation based on calibrated camera turns out to be the better choice. Experimental results show that our approach is effective and reliable in the case of illumination changes, similarity transformations and extraneous features. © 2004 IEEE.

Improving speech transcription for Mandarin-english translation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the development of the CU-HTK Mandarin Speech-To-Text (STT) system and assesses its performance as part of a transcription-translation pipeline which converts broadcast Mandarin audio into English text. Recent improvements to the STT system are described and these give Character Error Rate (CER) gains of 14.3% absolute for a Broadcast Conversation (BC) task and 5.1% absolute for a Broadcast News (BN) task. The output of these STT systems is then post-processed, so that it consists of sentence-like segments, and translated into English text using a Statistical Machine Translation (SMT) system. The performance of the transcription-translation pipeline is evaluated using the Translation Edit Rate (TER) and BLEU metrics. It is shown that improving both the STT system and the post-STT segmentations can lower the TER scores by up to 5.3% absolute and increase the BLEU scores by up to 2.7% absolute. © 2007 IEEE.

The CU-HTK Mandarin broadcast news transcription system

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses the development of the CU-HTK Mandarin Broadcast News (BN) transcription system. The Mandarin BN task includes a significant amount of English data. Hence techniques have been investigated to allow the same system to handle both Mandarin and English by augmenting the Mandarin training sets with English acoustic and language model training data. A range of acoustic models were built including models based on Gaussianised features, speaker adaptive training and feature-space MPE. A multi-branch system architecture is described in which multiple acoustic model types, alternate phone sets and segmentations can be used in a system combination framework to generate the final output. The final system shows state-of-the-art performance over a range of test sets. ©2006 British Crown Copyright.

Automatic transcription of conversational telephone speech

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.

Language model cross adaptation for LVCSR system combination

Relevância:

20.00% 20.00%

Publicador:

Resumo:

State-of-the-art large vocabulary continuous speech recognition (LVCSR) systems often combine outputs from multiple subsystems developed at different sites. Cross system adaptation can be used as an alternative to direct hypothesis level combination schemes such as ROVER. In normal cross adaptation it is assumed that useful diversity among systems exists only at acoustic level. However, complimentary features among complex LVCSR systems also manifest themselves in other layers of modelling hierarchy, e.g., subword and word level. It is thus interesting to also cross adapt language models (LM) to capture them. In this paper cross adaptation of multi-level LMs modelling both syllable and word sequences was investigated to improve LVCSR system combination. Significant error rate gains up to 6.7% rel. were obtained over ROVER and acoustic model only cross adaptation when combining 13 Chinese LVCSR subsystems used in the 2010 DARPA GALE evaluation. © 2010 ISCA.

«
1
2
3
4
5
6
7
8
...
24
25
»