Biblioteca Digital

155 resultados para Language resources

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain

Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies: PANACEA

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo de PANACEA es engranar diferentes herramientas avanzadas para construir una fábrica de Recursos Lingüísticos (RL), una línea de producción que automatice los pasos implicados en la adquisición, producción, actualización y mantenimiento de los RL que la Traducción Automática y otras tecnologías lingüísticas, necesitan.

PANACEA (Platform for Automatic, Normalised Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of PANACEA is to build a factory of LRs that automates the stages involved in the acquisition, production, updating and maintenance of LRs required by MT systems and by other applications based on language technologies, and simplifies eventual issues regarding intellectual property rights. This automation will cut down the cost, time and human effort significantly. These reductions of costs and time are the only way to guarantee the continuous supply of LRs that MT and other language technologies will be demanding in the multilingual Europe.

Towards the automatic merging of language resources

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Language Resources are a critical component for Natural Language Processing applications. Throughout the years many resources were manually created for the same task, but with different granularity and coverage information. To create richer resources for a broad range of potential reuses, nformation from all resources has to be joined into one. The hight cost of comparing and merging different resources by hand has been a bottleneck for merging existing resources. With the objective of reducing human intervention, we present a new method for automating merging resources. We have addressed the merging of two verbs subcategorization frame (SCF) lexica for Spanish. The results achieved, a new lexicon with enriched information and conflicting information signalled, reinforce our idea that this approach can be applied for other task of NLP.

Towards a User-Friendly Platform for Building Language Resources based on Web Services

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the platform developed in the PANACEA project, a distributed factory that automates the stages involved in the acquisition, production, updating and maintenance of Language Resources required by Machine Translation and other Language Technologies. We adopt a set of tools that have been successfully used in the Bioinformatics field, they are adapted to the needs of our field and used to deploy web services, which can be combined to build more complex processing chains (workflows). This paper describes the platform and its different components (web services, registry, workflows, social network and interoperability). We demonstrate the scalability of the platform by carrying out a set of massive data experiments. Finally, a validation of the platform across a set of required criteria proves its usability for different types of users (non-technical users and providers).

Interoperability and Technology for a Language Resources Factory

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This document describes some of the technological aspects of a project devoted to the creation of a factory for language resources. The project’s objectives are explained, as well as the idea to create a distributed infrastructure of web services. This document focuses on two main topics of the factory: (1) the technological approaches chosen to develop the factory, i.e. software, protocols, servers, etc. (2) and Interoperability as the main challenge is to permit different NLP tools work together in the factory. This document explains why XCES and GrAF are chosen as the main formats used for the linguistic data exchange.

Language Resources Factory: case study on the acquisition of Translation Memories

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper demonstrates a novel distributed architecture to facilitate the acquisition of Language Resources. We build a factory that automates the stages involved in the acquisition, production, updating and maintenance of these resources. The factory is designed as a platform where functionalities are deployed as web services, which can be combined in complex acquisition chains using workflows. We show a case study, which acquires a Translation Memory for a given pair of languages and a domain using web services for crawling, sentence alignment and conversion to TMX.

CLARIN: common language resources and technology infrastructure

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentamos el proyecto CLARIN, un proyecto cuyo objetivo es potenciar el uso de instrumentos tecnológicos en la investigación en las Humanidades y Ciencias Sociales

Towards the Fully Automatic Merging of Lexical Resources: A Step Forward

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article reports on the results of the research done towards the fully automatically merging of lexical resources. Our main goal is to show the generality of the proposed approach, which have been previously applied to merge Spanish Subcategorization Frames lexica. In this work we extend and apply the same technique to perform the merging of morphosyntactic lexica encoded in LMF. The experiments showed that the technique is general enough to obtain good results in these two different tasks which is an important step towards performing the merging of lexical resources fully automatically.

Mining and exploiting domain-specific corpora in the PANACEA platform

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.

A Classification of Adjectives for Polarity Lexicons Enhancement

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Subjective language detection is one of the most important challenges in Sentiment Analysis. Because of the weight and frequency in opinionated texts, adjectives are considered a key piece in the opinion extraction process. These subjective units are more and more frequently collected in polarity lexicons in which they appear annotated with their prior polarity. However, at the moment, any polarity lexicon takes into account prior polarity variations across domains. This paper proves that a majority of adjectives change their prior polarity value depending on the domain. We propose a distinction between domain dependent and romain independent adjectives. Moreover, our analysis led us to propose a further classification related to subjectivity degree: constant, mixed and highly subjective adjectives. Following this classification, polarity values will be a better support for Sentiment Analysis.

Automatic Lexical Semantic Classification of Nouns

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The work we present here addresses cue-based noun classification in English and Spanish. Its main objective is to automatically acquire lexical semantic information by classifying nouns into previously known noun lexical classes. This is achieved by using particular aspects of linguistic contexts as cues that identify a specific lexical class. Here we concentrate on the task of identifying such cues and the theoretical background that allows for an assessment of the complexity of the task. The results show that, despite of the a-priori complexity of the task, cue-based classification is a useful tool in the automatic acquisition of lexical semantic classes.

Automatic Detection of Non-deverbal Event Nouns for Quick Lexicon Production

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work we present the results of experimental work on the development of lexical class-based lexica by automatic means. Our purpose is to assess the use of linguistic lexical-class based information as a feature selection methodology for the use of classifiers in quick lexical development. The results show that the approach can help reduce the human effort required in the development of language resources significantly.

El Proyecto CLARIN: Una infraestructura de investigación científica para las Humanidades y las Ciencias Sociales

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En aquest article presentem CLARIN (Common Language Resources and Technologies), un projecte de col·laboració europea a gran escala l"objectiu del qual és potenciar l"ús d"instruments tecnològics en la recerca en els àmbits de les humanitats i les ciències socials. CLARIN és un dels trenta-cinc projectes seleccionats pel Comitè ESFRI (European Strategy Forum on Research Infraestructures) per a la llista de les infraestructures que s"han d"haver construït, per la seva importància per a la recerca, d"aquí a deu anys. CLARIN vol portar a les humanitats i a les ciències socials els beneficis de l"accés compartit i en col·laboració a recursos digitals, i també l"ús del còmput intensiu amb instruments específics d"anàlisi i explotació per a l"accés intel·ligent a grans bases de dades. Amb aquest objectiu, CLARIN crearà la infraestructura necessària per a poder donar un accés genèric a grans bancs de dades i als instruments d"anàlisi i explotació d"aquestes dades mitjançant la utilització de tecnologia. Per a això implementarà, en una estructura de xarxa grid, i mitjançant tecnologia de serveis web i de web semàntic, una única interfície d"accés a les dades i als instruments d"anàlisi, i també a eines de processament i altres serveis necessaris. Aquesta interfície, pel fet de ser dissenyada per a servir els objectius comuns de la recerca en humanitats i ciències socials, en facilitarà l"ús a investigadors de diferents àmbits sense necessitat de tenir coneixements sobre les tecnologies implicades.

Towards the Automatic Merging of Lexical Resources: Automatic Mapping

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to have richer resources with a broad range of potential uses for a significant number of languages.With the objective of reducing cost byeliminating human intervention, we present a new method for automating the merging of resources,with special emphasis in what we call the mapping step. This mapping step, which converts the resources into a common format that allows latter the merging, is usually performed with huge manual effort and thus makes the whole process very costly. Thus, we propose a method to perform this mapping fully automatically. To test our method, we have addressed the merging of two verb subcategorization frame lexica for Spanish, The resultsachieved, that almost replicate human work, demonstrate the feasibility of the approach.

A Method Towards the Fully Automatic Merging of Lexical Resources

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to obtain richer resources and a broader range of potential uses for a significant number of languages. With the objective of reducing cost by eliminating human intervention, we present a new method towards the automatic merging of resources. This method includes both, the automatic mapping of resources involved to a common format and merging them, once in this format. This paper presents how we have addressed the merging of two verb subcategorization frame lexica for Spanish, but our method will be extended to cover other types of Lexical Resources. The achieved results, that almost replicate human work, demonstrate the feasibility of the approach.

«
1
2
3
4
5
6
7
8
9
10
11
»