227 resultados para sistemi integrati, CAT tools, machine translation
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
This paper presents a webservice architecture for Statistical Machine Translation aimed at non-technical users. A workfloweditor allows a user to combine different webservices using a graphical user interface. In the current state of this project,the webservices have been implemented for a range of sentential and sub-sententialaligners. The advantage of a common interface and a common data format allows the user to build workflows exchanging different aligners.
Resumo:
This article describes the developmentof an Open Source shallow-transfer machine translation system from Czech to Polish in theApertium platform. It gives details ofthe methods and resources used in contructingthe system. Although the resulting system has quite a high error rate, it is still competitive with other systems.
Resumo:
This paper proposes to enrich RBMTdictionaries with Named Entities(NEs) automatically acquired fromWikipedia. The method is appliedto the Apertium English-Spanishsystem and its performance comparedto that of Apertium with and withouthandtagged NEs. The system withautomatic NEs outperforms the onewithout NEs, while results vary whencompared to a system with handtaggedNEs (results are comparable forSpanish to English but slightly worstfor English to Spanish). Apart fromthat, adding automatic NEs contributesto decreasing the amount of unknownterms by more than 10%.
Resumo:
This paper discusses the qualitativecomparative evaluation performed on theresults of two machine translation systemswith different approaches to the processing ofmulti-word units. It proposes a solution forovercoming the difficulties multi-word unitspresent to machine translation by adopting amethodology that combines the lexicongrammar approach with OpenLogos ontologyand semantico-syntactic rules. The paper alsodiscusses the importance of a qualitativeevaluation metrics to correctly evaluate theperformance of machine translation engineswith regards to multi-word units.
Resumo:
Softcatalà is a non-profit associationcreated more than 10 years ago to fightthe marginalisation of the Catalan languagein information and communicationtechnologies. It has led the localisationof many applications and thecreation of a website which allows itsusers to translate texts between Spanishand Catalan using an external closed-sourcetranslation engine. Recently,the closed-source translation back-endhas been replaced by a free/open-sourcesolution completely managed by Softcatalà: the Apertium machine translationplatform and the ScaleMT web serviceframework. Thanks to the opennessof the new solution, it is possibleto take advantage of the huge amount ofusers of the Softcatalà translation serviceto improve it, using a series ofmethods presented in this paper. In addition,a study of the translations requestedby the users has been carriedout, and it shows that the translationback-end change has not affected theusage patterns.
Resumo:
This paper describes the development of a two-way shallow-transfer rule-based machine translation system between Bulgarian and Macedonian. It gives an account of the resources and the methods used for constructing the system, including the development of monolingual and bilingual dictionaries, syntactic transfer rules and constraint grammars. An evaluation of thesystem's performance was carried out and compared to another commercially available MT system for the two languages. Some future work was suggested.
Estudi comparatiu per ala millora dels resultatsdels traductorsautomàtics de l’empresaAutomaticTrans
Resumo:
Aquest treball pretén millorar els resultats dels traductors automàtics de l’empresa AutomaticTrans i la traducció a l’agència de notícies EuropaPress mitjançant la comparació d’un corpus de notícies en castellà amb la corresponent traducció al català per dos traductors automàtics: l’ATS1, utilitzat per EuropaPress, i l’ATS4, l’última versió del traductor
Resumo:
This documents sums up a projectaimed at building a new web interfaceto the Apertium machine translationplatform, including pre-editing andpost-editing environments. It containsa description of the accomplished workon this project, as well as an overviewof possible evolutions.
Resumo:
This paper presents the platform developed in the PANACEA project, a distributed factory that automates the stages involved in the acquisition, production, updating and maintenance of Language Resources required by Machine Translation and other Language Technologies. We adopt a set of tools that have been successfully used in the Bioinformatics field, they are adapted to the needs of our field and used to deploy web services, which can be combined to build more complex processing chains (workflows). This paper describes the platform and its different components (web services, registry, workflows, social network and interoperability). We demonstrate the scalability of the platform by carrying out a set of massive data experiments. Finally, a validation of the platform across a set of required criteria proves its usability for different types of users (non-technical users and providers).
Resumo:
Este artículo describe una metodología de construcción de WordNets que se basa en la traducción automática de un corpus en inglés desambiguado por sentidos. El corpus que utilizamos está formado por las propias glosas de WN 3.0 etiquetadas semánticamente y por el corpus Semcor. Los resultados de precisión son comparables a los obtenidos mediante métodos basados en diccionarios bilingües para las mismas lenguas. La metodología descrita se está utilizando, en combinación con otras estrategias, en la creación de los WordNets 3.0 del español y catalán.
Resumo:
The objective of PANACEA is to build a factory of LRs that automates the stages involved in the acquisition, production, updating and maintenance of LRs required by MT systems and by other applications based on language technologies, and simplifies eventual issues regarding intellectual property rights. This automation will cut down the cost, time and human effort significantly. These reductions of costs and time are the only way to guarantee the continuous supply of LRs that MT and other language technologies will be demanding in the multilingual Europe.
Resumo:
The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.
Resumo:
This paper presents a novel efficiencybased evaluation of sentence and word aligners. This assessment is critical in order to make a reliable use in industrial scenarios. The evaluation shows that the resourcesrequired by aligners differ rather broadly. Subsequently, we establish limitation mechanisms on a set of aligners deployed as web services. These results, paired with the quality expected from the aligners, allow providers to choose the most appropriate aligner according to the task at hand.
Resumo:
Extensible Dependency Grammar (XDG; Debusmann, 2007) is a flexible, modular dependency grammarframework in which sentence analyses consist of multigraphs and processing takes the form of constraint satisfaction. This paper shows how XDGlends itself to grammar-driven machine translation and introduces the machinery necessary for synchronous XDG. Since the approach relies on a shared semantics, it resembles interlingua MT.It differs in that there are no separateanalysis and generation phases. Rather, translation consists of the simultaneousanalysis and generation of a single source-target sentence.
Resumo:
There are a number of morphological analysers for Polish. Most of these, however, are non-free resources. What is more, different analysers employ different tagsets and tokenisation strategies. This situation calls for a simpleand universal framework to join different sources of morphological information, including the existing resources as well as user-provided dictionaries. We present such a configurable framework that allows to write simple configuration files that define tokenisation strategies and the behaviour of morphologicalanalysers, including simple tagset conversion.