155 resultados para Language resources

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo de PANACEA es engranar diferentes herramientas avanzadas para construir una fábrica de Recursos Lingüísticos (RL), una línea de producción que automatice los pasos implicados en la adquisición, producción, actualización y mantenimiento de los RL que la Traducción Automática y otras tecnologías lingüísticas, necesitan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of PANACEA is to build a factory of LRs that automates the stages involved in the acquisition, production, updating and maintenance of LRs required by MT systems and by other applications based on language technologies, and simplifies eventual issues regarding intellectual property rights. This automation will cut down the cost, time and human effort significantly. These reductions of costs and time are the only way to guarantee the continuous supply of LRs that MT and other language technologies will be demanding in the multilingual Europe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Language Resources are a critical component for Natural Language Processing applications. Throughout the years many resources were manually created for the same task, but with different granularity and coverage information. To create richer resources for a broad range of potential reuses, nformation from all resources has to be joined into one. The hight cost of comparing and merging different resources by hand has been a bottleneck for merging existing resources. With the objective of reducing human intervention, we present a new method for automating merging resources. We have addressed the merging of two verbs subcategorization frame (SCF) lexica for Spanish. The results achieved, a new lexicon with enriched information and conflicting information signalled, reinforce our idea that this approach can be applied for other task of NLP.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the platform developed in the PANACEA project, a distributed factory that automates the stages involved in the acquisition, production, updating and maintenance of Language Resources required by Machine Translation and other Language Technologies. We adopt a set of tools that have been successfully used in the Bioinformatics field, they are adapted to the needs of our field and used to deploy web services, which can be combined to build more complex processing chains (workflows). This paper describes the platform and its different components (web services, registry, workflows, social network and interoperability). We demonstrate the scalability of the platform by carrying out a set of massive data experiments. Finally, a validation of the platform across a set of required criteria proves its usability for different types of users (non-technical users and providers).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This document describes some of the technological aspects of a project devoted to the creation of a factory for language resources. The project’s objectives are explained, as well as the idea to create a distributed infrastructure of web services. This document focuses on two main topics of the factory: (1) the technological approaches chosen to develop the factory, i.e. software, protocols, servers, etc. (2) and Interoperability as the main challenge is to permit different NLP tools work together in the factory. This document explains why XCES and GrAF are chosen as the main formats used for the linguistic data exchange.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper demonstrates a novel distributed architecture to facilitate the acquisition of Language Resources. We build a factory that automates the stages involved in the acquisition, production, updating and maintenance of these resources. The factory is designed as a platform where functionalities are deployed as web services, which can be combined in complex acquisition chains using workflows. We show a case study, which acquires a Translation Memory for a given pair of languages and a domain using web services for crawling, sentence alignment and conversion to TMX.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentamos el proyecto CLARIN, un proyecto cuyo objetivo es potenciar el uso de instrumentos tecnológicos en la investigación en las Humanidades y Ciencias Sociales

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article reports on the results of the research done towards the fully automatically merging of lexical resources. Our main goal is to show the generality of the proposed approach, which have been previously applied to merge Spanish Subcategorization Frames lexica. In this work we extend and apply the same technique to perform the merging of morphosyntactic lexica encoded in LMF. The experiments showed that the technique is general enough to obtain good results in these two different tasks which is an important step towards performing the merging of lexical resources fully automatically.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Subjective language detection is one of the most important challenges in Sentiment Analysis. Because of the weight and frequency in opinionated texts, adjectives are considered a key piece in the opinion extraction process. These subjective units are more and more frequently collected in polarity lexicons in which they appear annotated with their prior polarity. However, at the moment, any polarity lexicon takes into account prior polarity variations across domains. This paper proves that a majority of adjectives change their prior polarity value depending on the domain. We propose a distinction between domain dependent and romain independent adjectives. Moreover, our analysis led us to propose a further classification related to subjectivity degree: constant, mixed and highly subjective adjectives. Following this classification, polarity values will be a better support for Sentiment Analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The work we present here addresses cue-based noun classification in English and Spanish. Its main objective is to automatically acquire lexical semantic information by classifying nouns into previously known noun lexical classes. This is achieved by using particular aspects of linguistic contexts as cues that identify a specific lexical class. Here we concentrate on the task of identifying such cues and the theoretical background that allows for an assessment of the complexity of the task. The results show that, despite of the a-priori complexity of the task, cue-based classification is a useful tool in the automatic acquisition of lexical semantic classes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work we present the results of experimental work on the development of lexical class-based lexica by automatic means. Our purpose is to assess the use of linguistic lexical-class based information as a feature selection methodology for the use of classifiers in quick lexical development. The results show that the approach can help reduce the human effort required in the development of language resources significantly.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En aquest article presentem CLARIN (Common Language Resources and Technologies), un projecte de col·laboració europea a gran escala l"objectiu del qual és potenciar l"ús d"instruments tecnològics en la recerca en els àmbits de les humanitats i les ciències socials. CLARIN és un dels trenta-cinc projectes seleccionats pel Comitè ESFRI (European Strategy Forum on Research Infraestructures) per a la llista de les infraestructures que s"han d"haver construït, per la seva importància per a la recerca, d"aquí a deu anys. CLARIN vol portar a les humanitats i a les ciències socials els beneficis de l"accés compartit i en col·laboració a recursos digitals, i també l"ús del còmput intensiu amb instruments específics d"anàlisi i explotació per a l"accés intel·ligent a grans bases de dades. Amb aquest objectiu, CLARIN crearà la infraestructura necessària per a poder donar un accés genèric a grans bancs de dades i als instruments d"anàlisi i explotació d"aquestes dades mitjançant la utilització de tecnologia. Per a això implementarà, en una estructura de xarxa grid, i mitjançant tecnologia de serveis web i de web semàntic, una única interfície d"accés a les dades i als instruments d"anàlisi, i també a eines de processament i altres serveis necessaris. Aquesta interfície, pel fet de ser dissenyada per a servir els objectius comuns de la recerca en humanitats i ciències socials, en facilitarà l"ús a investigadors de diferents àmbits sense necessitat de tenir coneixements sobre les tecnologies implicades.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to have richer resources with a broad range of potential uses for a significant number of languages.With the objective of reducing cost byeliminating human intervention, we present a new method for automating the merging of resources,with special emphasis in what we call the mapping step. This mapping step, which converts the resources into a common format that allows latter the merging, is usually performed with huge manual effort and thus makes the whole process very costly. Thus, we propose a method to perform this mapping fully automatically. To test our method, we have addressed the merging of two verb subcategorization frame lexica for Spanish, The resultsachieved, that almost replicate human work, demonstrate the feasibility of the approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to obtain richer resources and a broader range of potential uses for a significant number of languages. With the objective of reducing cost by eliminating human intervention, we present a new method towards the automatic merging of resources. This method includes both, the automatic mapping of resources involved to a common format and merging them, once in this format. This paper presents how we have addressed the merging of two verb subcategorization frame lexica for Spanish, but our method will be extended to cover other types of Lexical Resources. The achieved results, that almost replicate human work, demonstrate the feasibility of the approach.