894 resultados para Machine translation system
Resumo:
This paper describes the UPM system for the Spanish-English translation task at the NAACL 2012 workshop on statistical machine translation. This system is based on Moses. We have used all available free corpora, cleaning and deleting some repetitions. In this paper, we also propose a technique for selecting the sentences for tuning the system. This technique is based on the similarity with the sentences to translate. With our approach, we improve the BLEU score from 28.37% to 28.57%. And as a result of the WMT12 challenge we have obtained a 31.80% BLEU with the 2012 test set. Finally, we explain different experiments that we have carried out after the competition.
Resumo:
The language barrier prevents Latino students from experiencing academic success, and prevents Latino parents from participating in their children's education. Through a review of journal articles, research projects, doctoral dissertations, legislation, and books, this project studies the benefits and dangers of various methods of translating and interpreting in the education system, including issues created by language barriers in schools, common methods of translating and interpreting, and legislation addressing language barriers and education. The project reveals that schools use various methods to translate and interpret, including relying on children, school staff and machine translation, although such methods are often problematic and inaccurate. The project also reveals that professional translation and interpretation are superior to the various non-professional methods.
Resumo:
Comunicación presentada en Cross-Language Evaluation Forum (CLEF 2008), Aarhus, Denmark, September 17-19, 2008.
Resumo:
Mode of access: Internet.
Resumo:
A diagnostic method based on Bayesian Networks (probabilistic graphical models) is presented. Unlike conventional diagnostic approaches, in this method instead of focusing on system residuals at one or a few operating points, diagnosis is done by analyzing system behavior patterns over a window of operation. It is shown how this approach can loosen the dependency of diagnostic methods on precise system modeling while maintaining the desired characteristics of fault detection and diagnosis (FDD) tools (fault isolation, robustness, adaptability, and scalability) at a satisfactory level. As an example, the method is applied to fault diagnosis in HVAC systems, an area with considerable modeling and sensor network constraints.
Resumo:
In this paper, we describe a voting mechanism for accurate named entity (NE) translation in English–Chinese question answering (QA). This mechanism involves translations from three different sources: machine translation,online encyclopaedia, and web documents. The translation with the highest number of votes is selected. We evaluated this approach using test collection, topics and assessment results from the NTCIR-8 evaluation forum. This mechanism achieved 95% accuracy in NEs translation and 0.3756 MAP in English–Chinese cross-lingual information retrieval of QA.
Resumo:
Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.
Resumo:
Results of an investigation dealing with the behaviour of grid-connected induction generators (GCIGs) driven by typical prime movers such as mini-hydro/wind turbines are presented. Certain practical operational problems of such systems are identified. Analytical techniques are developed to study the behavior of such systems. The system consists of the induction generator (IG) feeding a 11 kV grid through a step-up transformer and a transmission line. Terminal capacitors to compensate for the lagging VAr are included in the study. Computer simulation was carried out to predict the system performance at the given input power from the turbine. Effects of variations in grid voltage, frequency, input power, and terminal capacitance on the machine and system performance are studied. An analysis of self-excitation conditions on disconnection of supply was carried out. The behavior of a 220 kW hydel system and 55/11 kW and 22 kW wind driven system corresponding to actual field conditions is discussed
Resumo:
Scatter/Gather systems are increasingly becoming useful in browsing document corpora. Usability of the present-day systems are restricted to monolingual corpora, and their methods for clustering and labeling do not easily extend to the multilingual setting, especially in the absence of dictionaries/machine translation. In this paper, we study the cluster labeling problem for multilingual corpora in the absence of machine translation, but using comparable corpora. Using a variational approach, we show that multilingual topic models can effectively handle the cluster labeling problem, which in turn allows us to design a novel Scatter/Gather system ShoBha. Experimental results on three datasets, namely the Canadian Hansards corpus, the entire overlapping Wikipedia of English, Hindi and Bengali articles, and a trilingual news corpus containing 41,000 articles, confirm the utility of the proposed system.
Discriminative language model adaptation for Mandarin broadcast speech transcription and translation
Resumo:
This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute. © 2007 IEEE.
Resumo:
148 p.: graf.
Resumo:
In recent years, the use of morphological decomposition strategies for Arabic Automatic Speech Recognition (ASR) has become increasingly popular. Systems trained on morphologically decomposed data are often used in combination with standard word-based approaches, and they have been found to yield consistent performance improvements. The present article contributes to this ongoing research endeavour by exploring the use of the 'Morphological Analysis and Disambiguation for Arabic' (MADA) tools for this purpose. System integration issues concerning language modelling and dictionary construction, as well as the estimation of pronunciation probabilities, are discussed. In particular, a novel solution for morpheme-to-word conversion is presented which makes use of an N-gram Statistical Machine Translation (SMT) approach. System performance is investigated within a multi-pass adaptation/combination framework. All the systems described in this paper are evaluated on an Arabic large vocabulary speech recognition task which includes both Broadcast News and Broadcast Conversation test data. It is shown that the use of MADA-based systems, in combination with word-based systems, can reduce the Word Error Rates by up to 8.1 relative. © 2012 Elsevier Ltd. All rights reserved.
Resumo:
This paper investigates the use of plug-in parking lots (SmartPark) as integral energy storage to improve small-signal stability using plug-in electric vehicles (PEV). The paper establishes the Phillips-Heffron model of a power system for a SmartPark solution. Based on this model, SmartPark-based stabilisers have been designed based using phase compensation to improve power system oscillation stability. The effectiveness of stabilisation superimposed on the active and reactive power regulators is verified by simulations obtained from a multi-machine power system model with SmartPark and a large-scale wind farm inclusion.
Resumo:
This paper presents a case-study of a PMU application with PSS support in a real large scale Chinese power system to suppress inter-area oscillations. The paper uses PMU measured feedback signals from a PSS input signal for dynamic torque analysis (DTA). In the paper, a mathematical model of multi-machine power system is described, followed by formation of the residue and DTA indices. Simulations of the model are used with a large-scale power system model to demonstrate the role of PSS and the equivalence of DTA residue indices.
Resumo:
Inspired by the commercial application of the Exechon machine, this paper proposed a novel parallel kinematic machine (PKM) named Exe-Variant. By exchanging the sequence of kinematic pairs in each limb of the Exechon machine, the Exe-Variant PKM claims an arrangement of 2UPR/1SPR topology and consists of two identical UPR limbs and one SPR limb. The inverse kinematics of the 2UPR/1SPR parallel mechanism was firstly analyzed based on which a conceptual design of the Exe-Variant was carried out. Then an algorithm of reachable workspace searching for the Exe-Variant and the Exchon was proposed. Finally, the workspaces of two example systems of the Exechon and the Exe-Variant with approximate dimensions were numerically simulated and compared. The comparison shows that the Exe-Variant possesses a competitive workspace with the Exechon machine, indicating it can be used as a promising reconfigurable module in a hybrid 5-DOF machine tool system.