Aquest treball pretén millorar els resultats dels traductors automàtics de l’empresa AutomaticTrans i la traducció a l’agència de notícies EuropaPress mitjançant la comparació d’un corpus de notícies en castellà amb la corresponent traducció al català per dos traductors automàtics: l’ATS1, utilitzat per EuropaPress, i l’ATS4, l’última versió del traductor


Aquest projecte es va realitzar per a un client que disposa d’un conjunt d’uns 100 ordinadors per a accés a internet amb accés temporitzat per monedes, ubicats en diferents locals, I havia observat una creixent demanda del servei d’impressió, entre d'altres per a fer el checkin per a ryanair.L'objecte d'aquest projecte és crear un sistema que permeti realitzar el cobrament de les impressions de forma prèvia i automàtica, alhora que clara per al client final. Aquest sistema ha de funcionar de forma autònoma, alliberant així al personal del local de les tasques de gestió de la impressora.Cal tenir en compte que aquest sistema ha de funcionar tant per a Linux com per a Windows XP o superior.Es dissenyarà l’electrònica i el software corresponent al servidor d’ impressió,així com les comunicacions entre el servidor d’impressió i els ordinadors per monedes. Quant als ordinadors controlats per monedes, s'implantarà la comunicació amb el moneder, per tal de controlar el crèdit disponible i descomptar el temps corresponent a les impressions. Per altre banda, es realitzarà una interfície d'usuari on es comunicarà a l’usuari el preu de les impressions, el temps que se li restarà, el que té disponible i el que li restaràdesprés de realitzar la impressió. En aquesta mateixa pantalla se li donarà la opció d'acceptar o rebutjar la impressió abans que li sigui descomptada del temps disponible


O objetivo deste trabalho foi avaliar o desempenho de um modelo probabilístico de amostragem estratificada por pontos, e definir um tamanho de amostra adequado para estimar a área cultivada com soja no Rio Grande do Sul. A área foi estratificada de acordo com a percentagem de soja cultivada em cada município do estado: menor que 20, de 20 a 40 e maior que 40%. Foram avaliadas estimativas obtidas por meio de seis tamanhos de amostras, resultantes da combinação de três níveis de significância (10, 5 e 1%) e dois valores de erro amostral (5 e 2,5%). Para cada tamanho de amostra, foram realizados 400 sorteios aleatórios. As estimativas foram avaliadas com base na área de soja obtida de um mapa temático de referência proveniente de uma cuidadosa classificação automática e visual de imagens multitemporais dos satélites TM/Landsat-5 e ETM+/Landsat-7 disponível para a safra 2000/2001. A área de soja no Rio Grande do Sul pode ser estimada por meio de um modelo de amostragem probabilística estratificada por pontos, sendo que a melhor estimativa é obtida para o maior tamanho amostral (1.990 pontos), com diferença de apenas -0,14% em relação à estimativa do mapa de referência e um coeficiente de variação de 6,98%.


Este trabajo desarrolla el análisis, diseño e implementación de un prototipo para un sistema de obtención y análisis automático de noticias, estando enfocado al uso en el ámbito de los mercados financieros.


The most adequate approach for benchmarking web accessibility is manual expert evaluation supplemented by automatic analysis tools. But manual evaluation has a high cost and is impractical to be applied on large websites. In reality, there is no choice but to rely on automated tools when reviewing large web sites for accessibility. The question is: to what extent the results from automatic evaluation of a web site and individual web pages can be used as an approximation for manual results? This paper presents the initial results of an investigation aimed at answering this question. He have performed both manual and automatic evaluations of the accessibility of web pages of two sites and we have compared the results. In our data set automatically retrieved results could most definitely be used as an approximation manual evaluation results.


This documents sums up a projectaimed at building a new web interfaceto the Apertium machine translationplatform, including pre-editing andpost-editing environments. It containsa description of the accomplished workon this project, as well as an overviewof possible evolutions.


This article describes the developmentof an Open Source shallow-transfer machine translation system from Czech to Polish in theApertium platform. It gives details ofthe methods and resources used in contructingthe system. Although the resulting system has quite a high error rate, it is still competitive with other systems.


Extensible Dependency Grammar (XDG; Debusmann, 2007) is a flexible, modular dependency grammarframework in which sentence analyses consist of multigraphs and processing takes the form of constraint satisfaction. This paper shows how XDGlends itself to grammar-driven machine translation and introduces the machinery necessary for synchronous XDG. Since the approach relies on a shared semantics, it resembles interlingua MT.It differs in that there are no separateanalysis and generation phases. Rather, translation consists of the simultaneousanalysis and generation of a single source-target sentence.


This paper proposes to enrich RBMTdictionaries with Named Entities(NEs) automatically acquired fromWikipedia. The method is appliedto the Apertium English-Spanishsystem and its performance comparedto that of Apertium with and withouthandtagged NEs. The system withautomatic NEs outperforms the onewithout NEs, while results vary whencompared to a system with handtaggedNEs (results are comparable forSpanish to English but slightly worstfor English to Spanish). Apart fromthat, adding automatic NEs contributesto decreasing the amount of unknownterms by more than 10%.


There are a number of morphological analysers for Polish. Most of these, however, are non-free resources. What is more, different analysers employ different tagsets and tokenisation strategies. This situation calls for a simpleand universal framework to join different sources of morphological information, including the existing resources as well as user-provided dictionaries. We present such a configurable framework that allows to write simple configuration files that define tokenisation strategies and the behaviour of morphologicalanalysers, including simple tagset conversion.


This paper discusses the qualitativecomparative evaluation performed on theresults of two machine translation systemswith different approaches to the processing ofmulti-word units. It proposes a solution forovercoming the difficulties multi-word unitspresent to machine translation by adopting amethodology that combines the lexicongrammar approach with OpenLogos ontologyand semantico-syntactic rules. The paper alsodiscusses the importance of a qualitativeevaluation metrics to correctly evaluate theperformance of machine translation engineswith regards to multi-word units.


We describe a series of experiments in which we start with English to French and English to Japanese versions of an Open Source rule-based speech translation system for a medical domain, and bootstrap correspondign statistical systems. Comparative evaluation reveals that the rule-based systems are still significantly better than the statistical ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; also, a hybrid system only marginally improved recall at the cost of a los in precision. The result suggests that rule-based architectures may still be preferable to statistical ones for safety-critical speech translation tasks.


Softcatalà is a non-profit associationcreated more than 10 years ago to fightthe marginalisation of the Catalan languagein information and communicationtechnologies. It has led the localisationof many applications and thecreation of a website which allows itsusers to translate texts between Spanishand Catalan using an external closed-sourcetranslation engine. Recently,the closed-source translation back-endhas been replaced by a free/open-sourcesolution completely managed by Softcatalà: the Apertium machine translationplatform and the ScaleMT web serviceframework. Thanks to the opennessof the new solution, it is possibleto take advantage of the huge amount ofusers of the Softcatalà translation serviceto improve it, using a series ofmethods presented in this paper. In addition,a study of the translations requestedby the users has been carriedout, and it shows that the translationback-end change has not affected theusage patterns.


This paper presents an Italian to CatalanRBMT system automatically built bycombining the linguistic data of theexisting pairs Spanish-Catalan andSpanish-Italian. A lightweight manualpostprocessing is carried out in order tofix inconsistencies in the automaticallyderived dictionaries and to add very frequentwords that are missing accordingto a corpus analysis. The system isevaluated on the KDE4 corpus and outperformsGoogle Translate by approximatelyten absolute points in terms ofboth TER and GTM.


El principal objectiu d’aquest projecte és aconseguir classificar diferents vídeos d’esports segons la seva categoria. Els cercadors de text creen un vocabulari segons el significat de les diferents paraules per tal de poder identificar un document. En aquest projecte es va fer el mateix però mitjançant paraules visuals. Per exemple, es van intentar englobar com a una única paraula les diferents rodes que apareixien en els cotxes de rally. A partir de la freqüència amb què apareixien les paraules dels diferents grups dins d’una imatge vàrem crear histogrames de vocabulari que ens permetien tenir una descripció de la imatge. Per classificar un vídeo es van utilitzar els histogrames que descrivien els seus fotogrames. Com que cada histograma es podia considerar un vector de valors enters vàrem optar per utilitzar una màquina classificadora de vectors: una Support vector machine o SVM