16 resultados para multilingual dictionary
em Universidad Politécnica de Madrid
Resumo:
This paper describes the first set of experiments defined by the MIRACLE (Multilingual Information RetrievAl for the CLEf campaign) research group for some of the cross language tasks defined by CLEF. These experiments combine different basic techniques, linguistic-oriented and statistic-oriented, to be applied to the indexing and retrieval processes.
Resumo:
In the context of the Semantic Web, natural language descriptions associated with ontologies have proven to be of major importance not only to support ontology developers and adopters, but also to assist in tasks such as ontology mapping, information extraction, or natural language generation. In the state-of-the-art we find some attempts to provide guidelines for URI local names in English, and also some disagreement on the use of URIs for describing ontology elements. When trying to extrapolate these ideas to a multilingual scenario, some of these approaches fail to provide a valid solution. On the basis of some real experiences in the translation of ontologies from English into Spanish, we provide a preliminary set of guidelines for naming and labeling ontologies in a multilingual scenario.
Resumo:
Recently, the Semantic Web has experienced significant advancements in standards and techniques, as well as in the amount of semantic information available online. Nevertheless, mechanisms are still needed to automatically reconcile information when it is expressed in different natural languages on the Web of Data, in order to improve the access to semantic information across language barriers. In this context several challenges arise [1], such as: (i) ontology translation/localization, (ii) cross-lingual ontology mappings, (iii) representation of multilingual lexical information, and (iv) cross-lingual access and querying of linked data. In the following we will focus on the second challenge, which is the necessity of establishing, representing and storing cross-lingual links among semantic information on the Web. In fact, in a “truly” multilingual Semantic Web, semantic data with lexical representations in one natural language would be mapped to equivalent or related information in other languages, thus making navigation across multilingual information possible for software agents.
Resumo:
Recently, the Semantic Web has experienced signi�cant advancements in standards and techniques, as well as in the amount of semantic information available online. Even so, mechanisms are still needed to automatically reconcile semantic information when it is expressed in di�erent natural languages, so that access to Web information across language barriers can be improved. That requires developing techniques for discovering and representing cross-lingual links on the Web of Data. In this paper we explore the different dimensions of such a problem and reflect on possible avenues of research on that topic.
Resumo:
The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in different natural languages. Indeed, although the Web of Data can contain any kind of information in any language, it still lacks explicit mechanisms to automatically reconcile such information when it is expressed in different languages. This leads to situations in which data expressed in a certain language is not easily accessible to speakers of other languages. The Web of Data shows the potential for being extended to a truly multilingual web as vocabularies and data can be published in a language-independent fashion, while associated language-dependent (linguistic) information supporting the access across languages can be stored separately. In this sense, the multilingual Web of Data can be realized in our view as a layer of services and resources on top of the existing Linked Data infrastructure adding i) linguistic information for data and vocabularies in different languages, ii) mappings between data with labels in different languages, and iii) services to dynamically access and traverse Linked Data across different languages. In this article we present this vision of a multilingual Web of Data. We discuss challenges that need to be addressed to make this vision come true and discuss the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this. Further, we propose an initial architecture and describe a roadmap that can provide a basis for the implementation of this vision.
Resumo:
The Semantic Web is growing at a fast pace, recently boosted by the creation of the Linked Data initiative and principles. Methods, standards, techniques and the state of technology are becoming more mature and therefore are easing the task of publication and consumption of semantic information on the Web.
Resumo:
In this paper we present the MultiFarm dataset, which has been designed as a benchmark for multilingual ontology matching. The MultiFarm dataset is composed of a set of ontologies translated in different languages and the corresponding alignments between these ontologies. It is based on the OntoFarm dataset, which has been used successfully for several years in the Ontology Alignment Evaluation Initiative (OAEI). By translating the ontologies of the OntoFarm dataset into eight different languages – Chinese, Czech, Dutch, French, German, Portuguese, Russian, and Spanish – we created a comprehensive set of realistic test cases. Based on these test cases, it is possible to evaluate and compare the performance of matching approaches with a special focus on multilingualism.
Resumo:
Today?s knowledge management (KM) systems seldom account for language management and, especially, multilingual information processing. Document management is one of the strongest components of KM systems. If these systems do not include a multilingual knowledge management policy, intranet searches, excessive document space occupancy and redundant information slow down what are the most effective processes in a single language environment. In this paper, we model information flow from the sources of knowledge to the persons/systems searching for specific information. Within this framework, we focus on the importance of multilingual information processing, which is a hugely complex component of modern organizations.
Resumo:
Many attempts have been made to provide multilinguality to the Semantic Web, by means of annotation properties in Natural Language (NL), such as RDFs or SKOS labels, and other lexicon-ontology models, such as lemon, but there are still many issues to be solved if we want to have a truly accessible Multilingual Semantic Web (MSW). Reusability of monolingual resources (ontologies, lexicons, etc.), accessibility of multilingual resources hindered by many formats, reliability of ontological sources, disambiguation problems and multilingual presentation to the end user of all this information in NL can be mentioned as some of the most relevant problems. Unless this NL presentation is achieved, MSW will be restricted to the limits of IT experts, but even so, with great dissatisfaction and disenchantment
Resumo:
In this paper we present a revisited classification of term variation in the light of the Linked Data initiative. Linked Data refers to a set of best practices for publishing and connecting structured data on the Web with the idea of transforming it into a global graph. One of the crucial steps of this initiative is the linking step, in which datasets in one or more languages need to be linked or connected with one another. We claim that the linking process would be facilitated if datasets are enriched with lexical and terminological information. Being that the final aim, we propose a classification of lexical, terminological and semantic variants that will become part of a model of linguistic descriptions that is currently being proposed within the framework of the W3C Ontology-Lexica Community Group to enrich ontologies and Linked Data vocabularies. Examples of modeling solutions of the different types of variants are also provided.
Resumo:
In this paper we present a revisited classification of term variation in the light of the Linked Data initiative. Linked Data refers to a set of best practices for publishing and connecting structured data on the Web with the idea of transforming it into a global graph. One of the crucial steps of this initiative is the linking step, in which datasets in one or more languages need to be linked or connected with one another. We claim that the linking process would be facilitated if datasets are enriched with lexical and terminological information. Being that the final aim, we propose a classification of lexical, terminological and semantic variants that will become part of a model of linguistic descriptions that is currently being proposed within the framework of the W3C Ontology- Lexica Community Group to enrich ontologies and Linked Data vocabularies. Examples of modeling solutions of the different types of variants are also provided.
Resumo:
In this article, we argue that there is a growing number of linked datasets in different natural languages, and that there is a need for guidelines and mechanisms to ensure the quality and organic growth of this emerging multilingual data network. However, we have little knowledge regarding the actual state of this data network, its current practices, and the open challenges that it poses. Questions regarding the distribution of natural languages, the links that are established across data in different languages, or how linguistic features are represented, remain mostly unanswered. Addressing these and other language-related issues can help to identify existing problems, propose new mechanisms and guidelines or adapt the ones in use for publishing linked data including language-related features, and, ultimately, provide metrics to evaluate quality aspects. In this article we review, discuss, and extend current guidelines for publishing linked data by focusing on those methods, techniques and tools that can help RDF publishers to cope with language barriers. Whenever possible, we will illustrate and discuss each of these guidelines, methods, and tools on the basis of practical examples that we have encountered in the publication of the datos.bne.es dataset.
Resumo:
This paper describes the text normalization module of a text to speech fully-trainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrase-based translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect in the text normalization problem.
Resumo:
This paper provides an overview of an ongoing research project work: “A Polytechnical Bilingual Dictionary of Metaphors: Spanish-English/English-Spanish” done by the UPM consolidated research group “DISCYT” (Estudios Cognitivos del Discurso Científico-Técnico). A detailed explanation of the method adopted to identify key metaphors collected from the different subject areas is included. Drawing from recognized empirical methods (Pragglejaz 2007, Cameron 2007, Steen 2007), the examples have been examined according to the main tenets of conceptual metaphor and conceptual integration theory (Deignan 2005, Gibbs 2008, Lakoff 1993, Lakoff & Johnson 1999, Steen 2007, Fauconnier & Turner 2008). This forthcoming dictionary comprises metaphors of over 10 scientific and technical areas such as Aeronautical engineering, Agronomy, Architecture, Biotechnology, Civil engineering, Geology and Mining, Mechanical engineering, Nanotechnology, Naval and Maritime engineering, Sports and Telecommunications. In this paper, we focus on the study of examples taken from civil engineering, materials engineering and naval engineering. Representative cases are analyzed from several points of view (multimodal metaphor, linguistic information strategies and translation into target language) highlighting cross linguistic variations between Spanish and English.
Resumo:
Within the European Union, member states are setting up official data catalogues as entry points to access PSI (Public Sector Information). In this context, it is important to describe the metadata of these data portals, i.e., of data catalogs, and allow for interoperability among them. To tackle these issues, the Government Linked Data Working Group developed DCAT (Data Catalog Vocabulary), an RDF vocabulary for describing the metadata of data catalogs. This topic report analyzes the current use of the DCAT vocabulary in several European data catalogs and proposes some recommendations to deal with an inconsistent use of the metadata across countries. The enrichment of such metadata vocabularies with multilingual descriptions, as well as an account for cultural divergences, is seen as a necessary step to guarantee interoperability and ensure wider adoption.