464 resultados para multilingual
Resumo:
This paper presents the 2005 MIRACLE team’s approach to Cross-Language Geographical Retrieval (GeoCLEF). The main goal of the GeoCLEF participation of the MIRACLE team was to test the effect that geographical information retrieval techniques have on information retrieval. The baseline approach is based on the development of named entity recognition and geospatial information retrieval tools and on its combination with linguistic techniques to carry out indexing and retrieval tasks.
Resumo:
This paper presents the 2005 Miracle’s team approach to the Ad-Hoc Information Retrieval tasks. The goal for the experiments this year was twofold: to continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extraction, paragraph extraction, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second-order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query. In the multilingual track, we concentrated our work on the merging process of the results of monolingual runs to get the overall multilingual result, relying on available translations. In both cross-lingual tracks, we have used available translation resources, and in some cases we have used a combination approach.
Resumo:
The main goal of the bilingual and monolingual participation of the MIRACLE team in CLEF 2004 was to test the effect of combination approaches on information retrieval. The starting point was a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. A second order combination was also tested, mainly by averaging or selective combination of the documents retrieved by different approaches for a particular query.
Resumo:
ImageCLEF is a pilot experiment run at CLEF 2003 for cross language image retrieval using textual captions related to image contents. In this paper, we describe the participation of the MIRACLE research team (Multilingual Information RetrievAl at CLEF), detailing the different experiments and discussing their preliminary results.
Resumo:
Semantic Web aims to allow machines to make inferences using the explicit conceptualisations contained in ontologies. By pointing to ontologies, Semantic Web-based applications are able to inter-operate and share common information easily. Nevertheless, multilingual semantic applications are still rare, owing to the fact that most online ontologies are monolingual in English. In order to solve this issue, techniques for ontology localisation and translation are needed. However, traditional machine translation is difficult to apply to ontologies, owing to the fact that ontology labels tend to be quite short in length and linguistically different from the free text paradigm. In this paper, we propose an approach to enhance machine translation of ontologies based on exploiting the well-structured concept descriptions contained in the ontology. In particular, our approach leverages the semantics contained in the ontology by using Cross Lingual Explicit Semantic Analysis (CLESA) for context-based disambiguation in phrase-based Statistical Machine Translation (SMT). The presented work is novel in the sense that application of CLESA in SMT has not been performed earlier to the best of our knowledge.
Resumo:
Folksonomies emerge as the result of the free tagging activity of a large number of users over a variety of resources. They can be considered as valuable sources from which it is possible to obtain emerging vocabularies that can be leveraged in knowledge extraction tasks. However, when it comes to understanding the meaning of tags in folksonomies, several problems mainly related to the appearance of synonymous and ambiguous tags arise, specifically in the context of multilinguality. The authors aim to turn folksonomies into knowledge structures where tag meanings are identified, and relations between them are asserted. For such purpose, they use DBpedia as a general knowledge base from which they leverage its multilingual capabilities.
Resumo:
In the context of the Semantic Web, resources on the net can be enriched by well-defined, machine-understandable metadata describing their associated conceptual meaning. These metadata consisting of natural language descriptions of concepts are the focus of the activity we describe in this chapter, namely, ontology localization. In the framework of the NeOn Methodology, ontology localization is defined as the activity of adapting an ontology to a particular language and culture. This adaptation mainly involves the translation of the natural language descriptions of the ontology from a source natural language to a target natural language, with the final objective of obtaining a multilingual ontology, that is, an ontology documented in several natural languages. The purpose of this chapter is to provide detailed and prescriptive methodological guidelines to support the performance of this activity.
Resumo:
Interoperability on multiple levels, concerning both the ontologies themselves and their engineering activities, is a key requirement for ontology networks to be efficient, with minimal redundancy and high reuse. This requirement has a strict binding for software tools that can support some interoperability levels, yet they can be hindered by a lack of shared models and vocabularies describing the resources to be handled, as well as the ways of handling them. Here, three examples of metalevel vocabularies are proposed, each covering at least one peculiar interoperability aspect: OMV for modeling the artifacts themselves, LIR for managing a multilingual layer on top of them, and C-ODO Light for modeling collaboration-supportive life cycle management tasks and processes. All of these models lend themselves to handling by dedicated software tools and are all being employed within NeOn products.
Resumo:
Many progresses have been made since the Digital Earth notion was envisioned thirteen years ago. However, the mechanism for integrating geographic information into the Digital Earth is still quite limited. In this context, we have developed a process to generate, integrate and publish geospatial Linked Data from several Spanish National data-sets. These data-sets are related to four Infrastructure for Spatial Information in the European Community (INSPIRE) themes, specifically with Administrative units, Hydrography, Statistical units, and Meteorology. Our main goal is to combine different sources (heterogeneous, multidisciplinary, multitemporal, multiresolution, and multilingual) using Linked Data principles. This goal allows the overcoming of current problems of information integration and driving geographical information toward the next decade scenario, that is, ?Linked Digital Earth.?
Resumo:
Este trabajo describe el diseño y la implementación de un ejercicio virtual que es parte de una práctica que se realiza en un laboratorio virtual de biotecnología, la adaptación de la misma para que alumnos de secundaria la puedan realizar y por último, la adaptación del laboratorio a un entorno multilingüe. La práctica consiste en transformar genéticamente un árbol (chopo) para dotarlo de una mayor resistencia a enfermedades, especialmente las producidas por hongos y más en concreto, el ejercicio o fase de la práctica a desarrollar consiste en introducir en el plásmido un gen amplificado por la PCR obtenido en la fase anterior de la práctica virtual. La adaptación para alumnos de secundaria servirá para fomentar el interés de estos alumnos por la biotecnología. Asimismo, la adaptación a un entorno multilingüe permitirá que varios alumnos de distintos idiomas realicen la práctica de forma simultánea. Como parte de este trabajo, se ha realizado un análisis sobre OpenSimulator, que es la herramienta utilizada para la creación del entorno virtual, así como de sus visores gráficos para visitar y desarrollar el mundo virtual. Debido a que este proyecto toma como punto de partida un laboratorio virtual con una parte de la práctica virtual ya desarrollada, se ha incluido una descripción de dicho laboratorio para comprender mejor el trabajo que se ha realizado en este proyecto. Finalmente, en este trabajo se presentan los modelos y especificaciones para la extensión del laboratorio virtual. ---ABSTRACT---This document describes the design and implementation of virtual exercise that is part of a practice that is performed in a virtual biotechnology laboratory, the adaptation of this phase to high-school students and finally, the adaptation of laboratory for a multilingual environment. In this practice a tree is genetically modified to give it resistance to diseases produced by fungi. Specifically, the exercise or phase developed consists in introducing in the plasmid a gene amplified by PCR in the previous phase. The adaptation for high-school students will motivate to new students about biotechnology. And the adapting to the multilingual environment will allow several students, such as Erasmus, to do the practice in different languages simultaneously. We analyzed the OpenSimulator platform and the graphic viewers to visit and develop the virtual world. This tool is used for creating the virtual environment. Because of the fact that the project takes a starting point a laboratory with some parts already developed, we have included a description with information related to the laboratory to better understand the work carried out in this project. Finally, this document presents the models and specifications for the extension of the virtual laboratory.
Resumo:
This approach aims at aligning, unifying and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. A sentiment lexicon is a critical and essential resource for tagging subjective corpora on the web or elsewhere. In many situations, the multilingual property of the sentiment lexicon is important because the writer is using two languages alternately in the same text, message or post. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and -1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and -1 means they are perfectly inversely correlated and the UnifiedMetrics procedure for CPU and GPU, respectively.
Resumo:
Language resources, such as multilingual lexica and multilingual electronic dictionaries, contain collections of lexical entries in several languages. Having access to the corresponding explicit or implicit translation relations between such entries might be of great interest for many NLP-based applications. By using Semantic Web-based techniques, translations can be available on the Web to be consumed by other (semantic enabled) resources in a direct manner, not relying on application-specific formats. To that end, in this paper we propose a model for representing translations as linked data, as an extension of the lemon model. Our translation module represents some core information associated to term translations and does not commit to specific views or translation theories. As a proof of concept, we have extracted the translations of the terms contained in Terminesp, a multilingual terminological database, and represented them as linked data. We have made them accessible on the Web both for humans (via a Web interface) and software agents (with a SPARQL endpoint).
Resumo:
Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper.
Resumo:
In this paper we present a dataset componsed of domain-specific sentiment lexicons in six languages for two domains. We used existing collections of reviews from Trip Advisor, Amazon, the Stanford Network Analysis Project and the OpinRank Review Dataset. We use an RDF model based on the lemon and Marl formats to represent the lexicons. We describe the methodology that we applied to generate the domain-specific lexicons and we provide access information to our datasets.
Resumo:
El software se ha convertido en el eje central del mundo actual, una compleja creación humana que influye en la vida, negocios y comunicación de todas las personas pertenecientes a la Sociedad de la Información. El rápido crecimiento experimentado en el ámbito del desarrollo software ha permitido la creación de avanzadas estructuras tecnológicas, denominadas “Sistemas Intensivos Software”, capaces de comunicarse con otros sistemas, dispositivos, sensores y personas. A lo largo de los próximos años los sistemas se enfrentarán a una mayor complejidad, surgida de la necesidad de operar en entornos de grandes dimensiones y de comportamientos no deterministas. Los métodos y herramientas actuales no son lo suficientemente potentes para diseñar, construir,implementar y mantener sistemas intensivos software con estas características, y detener la construcción de sistemas intensivos software o construir sistemas poco flexibles o fiables no es una alternativa real. En el desarrollo de “Sistemas Intensivos Software” pueden llegar a intervenir distintas entidades o compañías software que suelen estar en ubicaciones geográficas distintas y constituidas por grandes equipos de desarrollo, multidisciplinares e incluso multilingües. Debido a la criticidad del resultado de las actividades realizadas de forma independiente en el sistema resultante, éstas se han de controlar y monitorizar para asegurar la correcta integración de todos los elementos del sistema completo. El objetivo de este proyecto es la creación de una herramienta software para dar soporte a la gestión y monitorización de la construcción e integración de sistemas intensivos software, siendo extensible también a proyectos de otra índole. La herramienta resultante se denomina Positioning System, una aplicación web del tipo SPA (Single Page Application) creada con tecnología de última generación como el framework JavaScript AngularJS y tecnología de back-end como SlimPHP. Positioning System provee la funcionalidad necesaria para la creación de proyectos, familias y subfamilias de productos que constituyen los productos software de los proyectos creados, así como la gestión de socios comerciales y gestión de contactos de dichos proyectos. Todas estas funcionalidades son fácilmente monitorizadas y controladas por gráficos estadísticos generados para cada proyecto. ABSTRACT Software has become the backbone of today’s world, a complex human creation that has an important impact in the life, business and communication of all people involved with the Information Society. The quick growth that software development has undergone for last years has enabled the creation of advanced technological structures called “Software Intensive Systems”. They are able to communicate with other systems, devices, sensors and people. Next years, systems will face more complexity. It arises from the need of operating systems of large dimensions with non-deterministic behaviors. Current methods and tools are not powerful enough to design, build, implement and maintain software intensive systems; however stopping the development or developing unreliable and non-flexible systems is not a real alternative. Software Intensive Systems” development may involve different entities or software companies which may be in different geographical locations and may be constituted by large, multidisciplinary and even multilingual development teams. Due to the criticality of the result of each conducted activity, independently in the resulting system, these activities must be controlled and monitored to ensure the proper integration of all the elements within the complete system. The goal of this project is the creation of a software tool to support the management and monitoring of the construction and integration of software intensive systems, being possible to be extended to other kind of projects. The resultant tool is called Positioning System, a web application that follows the SPA (Single Page Application) style. It was created with the latest technologies, such as, the AngularJS framework and SlimPHP. The Positioning System provides the necessary features for the creation of projects, families and subfamilies of products that constitute the software products of the created projects, as well as the management of business partners and contacts of these projects. All these features are easily monitored and controlled by statistical graphs generated for each project.