15 resultados para Interpreting and translation
em Cambridge University Engineering Department Publications Database
Discriminative language model adaptation for Mandarin broadcast speech transcription and translation
Resumo:
This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute. © 2007 IEEE.
Discriminative language model adaptation for Mandarin broadcast speech transcription and translation
Resumo:
Small RNAs have several important biological functions. MicroRNAs (miRNAs) and trans-acting small interfering RNAs (tasiRNAs) regulate mRNA stability and translation, and siRNAs cause post-transcriptional gene silencing of transposons, viruses and transgenes and are important in both the establishment and maintenance of cytosine DNA methylation. Here, we study the role of the four Arabidopsis thaliana DICER-LIKE genes (DCL1-DCL4) in these processes. Sequencing of small RNAs from a dcl2 dcl3 dcl4 triple mutant showed markedly reduced tasiRNA and siRNA production and indicated that DCL1, in addition to its role as the major enzyme for processing miRNAs, has a previously unknown role in the production of small RNAs from endogenous inverted repeats. DCL2, DCL3 and DCL4 showed functional redundancy in siRNA and tasiRNA production and in the establishment and maintenance of DNA methylation. Our studies also suggest that asymmetric DNA methylation can be maintained by pathways that do not require siRNAs.
Resumo:
The brain extracts useful features from a maelstrom of sensory information, and a fundamental goal of theoretical neuroscience is to work out how it does so. One proposed feature extraction strategy is motivated by the observation that the meaning of sensory data, such as the identity of a moving visual object, is often more persistent than the activation of any single sensory receptor. This notion is embodied in the slow feature analysis (SFA) algorithm, which uses “slowness” as an heuristic by which to extract semantic information from multi-dimensional time-series. Here, we develop a probabilistic interpretation of this algorithm showing that inference and learning in the limiting case of a suitable probabilistic model yield exactly the results of SFA. Similar equivalences have proved useful in interpreting and extending comparable algorithms such as independent component analysis. For SFA, we use the equivalent probabilistic model as a conceptual spring-board, with which to motivate several novel extensions to the algorithm.
Resumo:
This paper investigates several approaches to bootstrapping a new spoken language understanding (SLU) component in a target language given a large dataset of semantically-annotated utterances in some other source language. The aim is to reduce the cost associated with porting a spoken dialogue system from one language to another by minimising the amount of data required in the target language. Since word-level semantic annotations are costly, Semantic Tuple Classifiers (STCs) are used in conjunction with statistical machine translation models both of which are trained from unaligned data to further reduce development time. The paper presents experiments in which a French SLU component in the tourist information domain is bootstrapped from English data. Results show that training STCs on automatically translated data produced the best performance for predicting the utterance's dialogue act type, however individual slot/value pairs are best predicted by training STCs on the source language and using them to decode translated utterances. © 2010 ISCA.
Resumo:
This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen verb forms. Translation results for an English to Spanish task are reported, producing a significant performance improvement.
Resumo:
In this paper a method to incorporate linguistic information regarding single-word and compound verbs is proposed, as a first step towards an SMT model based on linguistically-classified phrases. By substituting these verb structures by the base form of the head verb, we achieve a better statistical word alignment performance, and are able to better estimate the translation model and generalize to unseen verb forms during translation. Preliminary experiments for the English - Spanish language pair are performed, and future research lines are detailed. © 2005 Association for Computational Linguistics.
Resumo:
The most widespread vibration measurement on musical instrument bodies is of the point mobility at the bridge. Analysis of such measurements is presented, with a view to assessing what range of information could feasibly be extracted from the corpus of data. Analysis approaches include (1) pole-residue extraction; (2) damping trend analysis based on time decay information; (3) statistical estimates based on SEA power-balance and variance theory. Comparative results are shown for some key quantities. Damping trends with frequency are shown to have unexpectedly different forms for violins and for guitars. Linear averaging to estimate the "direct field" component gives a simple and clear visualisation of any local resonance behaviour near the bridge, such as the "bridge hill", and reveals some violins that show a double hill, while viols show only weak hills, and guitars none at all. © S. Hirzel Verlag · EAA.