Biblioteca Digital

This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute. © 2007 IEEE.

Veja mais

Language model combination and adaptation using weighted finite state transducers

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In speech recognition systems language model (LMs) are often constructed by training and combining multiple n-gram models. They can be either used to represent different genres or tasks found in diverse text sources, or capture stochastic properties of different linguistic symbol sequences, for example, syllables and words. Unsupervised LM adaptation may also be used to further improve robustness to varying styles or tasks. When using these techniques, extensive software changes are often required. In this paper an alternative and more general approach based on weighted finite state transducers (WFSTs) is investigated for LM combination and adaptation. As it is entirely based on well-defined WFST operations, minimum change to decoding tools is needed. A wide range of LM combination configurations can be flexibly supported. An efficient on-the-fly WFST decoding algorithm is also proposed. Significant error rate gains of 7.3% relative were obtained on a state-of-the-art broadcast audio recognition task using a history dependently adapted multi-level LM modelling both syllable and word sequences. ©2010 IEEE.

Veja mais

Hydrogen content estimation of hydrogenated amorphous carbon by visible Raman spectroscopy

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present study, we report the hydrogen content estimation of the hydrogenated amorphous carbon (a-C:H) films using visible Raman spectroscopy in a fast and nondestructive way. Hydrogenated diamondlike carbon films were deposited by the plasma enhanced chemical vapor deposition, plasma beam source, and integrated distributed electron cyclotron resonance techniques. Methane and acetylene were used as source gases resulting in different hydrogen content and sp2/sp3 fraction. Ultraviolet-visible (UV-Vis) spectroscopic ellipsometry (1.5-5 eV) as well as UV-Vis spectroscopy were provided with the optical band gap (Tauc gap). The sp2/sp3 fraction and the hydrogen content were independently estimated by electron energy loss spectroscopy and elastic recoil detection analysis-Rutherford back scattering, respectively. The Raman spectra that were acquired in the visible region using the 488 nm line shows the superposition of Raman features on a photoluminescence (PL) background. The direct relationship of the sp2 content and the optical band gap has been confirmed. The difference in the PL background for samples of the same optical band gap (sp2 content) and different hydrogen content was demonstrated and an empirical relationship between the visible Raman spectra PL background slope and the corresponding hydrogen content was extracted. © 2004 American Institute of Physics.

Veja mais

Combining interest points and edges for content-based image retrieval

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel approach using combined features to retrieve images containing specific objects, scenes or buildings. The content of an image is characterized by two kinds of features: Harris-Laplace interest points described by the SIFT descriptor and edges described by the edge color histogram. Edges and corners contain the maximal amount of information necessary for image retrieval. The feature detection in this work is an integrated process: edges are detected directly based on the Harris function; Harris interest points are detected at several scales and Harris-Laplace interest points are found using the Laplace function. The combination of edges and interest points brings efficient feature detection and high recognition ratio to the image retrieval system. Experimental results show this system has good performance. © 2005 IEEE.

Veja mais

Analysis of the ellipsometric spectra of amorphous carbon thin films for evaluation of the sp3-bonded carbon content

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Using spcctroscopic ellipsometry (SE), we have measured the optical properties and optical gaps of a series of amorphous carbon (a-C) films ∼ 100-300 Å thick, prepared using a filtered beam of C+ ions from a cathodic arc. Such films exhibit a wide range of sp3-bonded carbon contents from 20 to 76 at.%, as measured by electron energy loss spectroscopy (EELS). The Taue optical gaps of the a-C films increase monotonically from 0.65 eV for 20 at.% sp3 C to 2.25 eV for 76 at.% sp3 C. Spectra in the ellipsometric angles (1.5-5 eV) have been analyzed using different effective medium theories (EMTs) applying a simplified optical model for the dielectric function of a-C, assuming a composite material with sp2 C and sp3 C components. The most widely used EMT, namely that of Bruggeman (with three-dimensionally isotropic screening), yields atomic fractions of sp3 C that correlate monotonically with those obtained from EELS. The results of the SE analysis, however, range from 10 to 25 at.% higher than those from EELS. In fact, we have found that the volume percent sp3 C from SE using the Bruggeman EMT shows good numerical agreement with the atomic percent sp3 C from EELS. The SE-EELS discrepancy has been reduced by using an optical model in which the dielectric function of the a-C is determined as a volume-fraction-weighted average of the dielectric functions of the sp2 C and sp3 C components. © 1998 Elsevier Science S.A.

Veja mais

Language model cross adaptation for LVCSR system combination

Relevância:

20.00% 20.00%

Publicador:

Resumo:

State-of-the-art large vocabulary continuous speech recognition (LVCSR) systems often combine outputs from multiple subsystems developed at different sites. Cross system adaptation can be used as an alternative to direct hypothesis level combination schemes such as ROVER. In normal cross adaptation it is assumed that useful diversity among systems exists only at acoustic level. However, complimentary features among complex LVCSR systems also manifest themselves in other layers of modelling hierarchy, e.g., subword and word level. It is thus interesting to also cross adapt language models (LM) to capture them. In this paper cross adaptation of multi-level LMs modelling both syllable and word sequences was investigated to improve LVCSR system combination. Significant error rate gains up to 6.7% rel. were obtained over ROVER and acoustic model only cross adaptation when combining 13 Chinese LVCSR subsystems used in the 2010 DARPA GALE evaluation. © 2010 ISCA.

Veja mais