927 resultados para Multilingual lexical
Resumo:
Language resources, such as multilingual lexica and multilingual electronic dictionaries, contain collections of lexical entries in several languages. Having access to the corresponding explicit or implicit translation relations between such entries might be of great interest for many NLP-based applications. By using Semantic Web-based techniques, translations can be available on the Web to be consumed by other (semantic enabled) resources in a direct manner, not relying on application-specific formats. To that end, in this paper we propose a model for representing translations as linked data, as an extension of the lemon model. Our translation module represents some core information associated to term translations and does not commit to specific views or translation theories. As a proof of concept, we have extracted the translations of the terms contained in Terminesp, a multilingual terminological database, and represented them as linked data. We have made them accessible on the Web both for humans (via a Web interface) and software agents (with a SPARQL endpoint).
Resumo:
Within the European Union, member states are setting up official data catalogues as entry points to access PSI (Public Sector Information). In this context, it is important to describe the metadata of these data portals, i.e., of data catalogs, and allow for interoperability among them. To tackle these issues, the Government Linked Data Working Group developed DCAT (Data Catalog Vocabulary), an RDF vocabulary for describing the metadata of data catalogs. This topic report analyzes the current use of the DCAT vocabulary in several European data catalogs and proposes some recommendations to deal with an inconsistent use of the metadata across countries. The enrichment of such metadata vocabularies with multilingual descriptions, as well as an account for cultural divergences, is seen as a necessary step to guarantee interoperability and ensure wider adoption.
Resumo:
The W3C Best Practises for Multilingual Linked Open Data community group was born one year ago during the last MLW workshop in Rome. Nowadays, it continues leading the effort of a numerous community towards acquiring a shared view of the issues caused by multilingualism on the Web of Data and their possible solutions. Despite our initial optimism, we found the task of identifying best practises for ML-LOD a difficult one, requiring a deep understanding of the Web of Data in its multilingual dimension and in its practical problems. In this talk we will review the progresses of the group so far, mainly in the identification and analysis of topics, use cases, and design patterns, as well as the future challenges.
Resumo:
Three studies investigated the relation between symbolic gestures and words, aiming at discover the neural basis and behavioural features of the lexical semantic processing and integration of the two communicative signals. The first study aimed at determining whether elaboration of communicative signals (symbolic gestures and words) is always accompanied by integration with each other and, if present, this integration can be considered in support of the existence of a same control mechanism. Experiment 1 aimed at determining whether and how gesture is integrated with word. Participants were administered with a semantic priming paradigm with a lexical decision task and pronounced a target word, which was preceded by a meaningful or meaningless prime gesture. When meaningful, the gesture could be either congruent or incongruent with word meaning. Duration of prime presentation (100, 250, 400 ms) randomly varied. Voice spectra, lip kinematics, and time to response were recorded and analyzed. Formant 1 of voice spectra, and mean velocity in lip kinematics increased when the prime was meaningful and congruent with the word, as compared to meaningless gesture. In other words, parameters of voice and movement were magnified by congruence, but this occurred only when prime duration was 250 ms. Time to response to meaningful gesture was shorter in the condition of congruence compared to incongruence. Experiment 2 aimed at determining whether the mechanism of integration of a prime word with a target word is similar to that of a prime gesture with a target word. Formant 1 of the target word increased when word prime was meaningful and congruent, as compared to meaningless congruent prime. Increase was, however, present for whatever prime word duration. In the second study, experiment 3 aimed at determining whether symbolic prime gesture comprehension makes use of motor simulation. Transcranial Magnetic Stimulation was delivered to left primary motor cortex 100, 250, 500 ms after prime gesture presentation. Motor Evoked Potential of First Dorsal Interosseus increased when stimulation occurred 100 ms post-stimulus. Thus, gesture was understood within 100ms and integrated with the target word within 250 ms. Experiment 4 excluded any hand motor simulation in order to comprehend prime word. The effect of the prior presentation of a symbolic gesture on congruent target word processing was investigated in study 3. In experiment 5, symbolic gestures were presented as primes, followed by semantically congruent target word or pseudowords. In this case, lexical-semantic decision was accompanied by a motor simulation at 100ms after the onset of the verbal stimuli. Summing up, the same type of integration with a word was present for both prime gesture and word. It was probably subsequent to understanding of the signal, which used motor simulation for gesture and direct access to semantics for words. However, gesture and words could be understood at the same motor level through simulation if words were preceded by an adequate gestural context. Results are discussed in the prospective of a continuum between transitive actions and emblems, in parallelism with language; the grounded/symbolic content of the different signals evidences relation between sensorimotor and linguistic systems, which could interact at different levels.
Resumo:
Esta tese, com o intuito de contribuir para uma reflexão em torno da história da formação da língua portuguesa no Brasil, propõe como objetivo geral realizar um estudo do léxico no município de Cáceres-MT, tendo como base a discussão sobre manutenção, tendência à manutenção, desuso, tendência ao desuso e neologismo semântico de unidades lexicais extraídas de um manuscrito oitocentista. Os objetivos específicos são os seguintes: (i) compreender a história social da Capitania de Mato Grosso e do município de Cáceres, a partir das informações constantes no manuscrito Memoria, e aspectos que envolvam as condições de produção do documento e a biografia do autor; (ii) levantar o léxico do manuscrito, com recorte nos substantivos e adjetivos para servir de base na seleção das unidades lexicais a serem testadas in loco, e investigar a acepção registrada no documento das unidades lexicais, caracterizando, assim, o léxico do período oitocentista; (iii), fazer um cotejo lexicográfico abrangendo dicionários gerais dos séculos XVIII ao XXI; (iv) testar e identificar, a partir do corpus oral constituído por meio de pesquisa de campo na região urbana cacerense, o grau de manutenção, tendência à manutenção, desuso, tendência ao desuso e neologismo semântico em relação às unidades lexicais e suas respectivas acepções registradas no manuscrito. Dessa forma, toma-se como corpus de língua escrita de análise o manuscrito oitocentista Memoria sobre o plano de guerra offensiva e deffensiva da Capitania de Matto Grosso e, a partir das unidades lexicais selecionadas e extraídas dele, realizou-se a pesquisa de campo para o recolhimento do corpus de língua oral. Antes dessa recolha, tendo como base teórico-metodológica as disciplinas de Dialetologia e de Geolinguística, selecionou-se a localidade (município de Cáceres - MT) e os informantes (total de dezesseis); elaborou-se o questionário semântico-lexical, considerando fundamentalmente a proposta apresentada pelo Comitê Nacional do Projeto ALiB (2001); e realizou-se a pesquisa de campo e as transcrições das entrevistas. Para análise de natureza semântico-lexical dos corpora, recorreu aos estudos lexicográficos e lexicológicos. Tomando por base os resultados do estudo realizado, constatou-se que na realidade linguística do informante cacerense encontram-se unidades que já integravam o léxico oitocentista da língua portuguesa escrita no Brasil, ou seja, há uma memória semântico-lexical que se mantém no sistema lexical, provavelmente, devido às condições sócioculturais do município de Cáceres, Mato Grosso, cuja população, em grande parte, por quase duzentos anos, viveu na área rural. Todavia, vislumbrou-se um certo equilíbrio entre a manutenção do léxico oitocentista sem deixar de lado a inovação e o mecanismo polissêmico constitutivo do léxico.
Resumo:
Tesis doctoral con mención europea en procesamiento del lenguaje natural realizada en la Universidad de Alicante por Ester Boldrini bajo la dirección del Dr. Patricio Martínez-Barco. El acto de defensa de la tesis tuvo lugar en la Universidad de Alicante el 23 de enero de 2012 ante el tribunal formado por los doctores Manuel Palomar (Universidad de Alicante), Dr. Paloma Moreda (UA), Dr. Mariona Taulé (Universidad de Barcelona), Dr. Horacio Saggion (Universitat Pompeu Fabra) y Dr. Mike Thelwall (University of Wolverhampton). Calificación: Sobresaliente Cum Laude por unanimidad.
Resumo:
The extension to new languages is a well known bottleneck for rule-based systems. Considerable human effort, which typically consists in re-writing from scratch huge amounts of rules, is in fact required to transfer the knowledge available to the system from one language to a new one. Provided sufficient annotated data, machine learning algorithms allow to minimize the costs of such knowledge transfer but, up to date, proved to be ineffective for some specific tasks. Among these, the recognition and normalization of temporal expressions still remains out of their reach. Focusing on this task, and still adhering to the rule-based framework, this paper presents a bunch of experiments on the automatic porting to Italian of a system originally developed for Spanish. Different automatic rule translation strategies are evaluated and discussed, providing a comprehensive overview of the challenge.
Resumo:
This paper presents the automatic extension to other languages of TERSEO, a knowledge-based system for the recognition and normalization of temporal expressions originally developed for Spanish. TERSEO was first extended to English through the automatic translation of the temporal expressions. Then, an improved porting process was applied to Italian, where the automatic translation of the temporal expressions from English and from Spanish was combined with the extraction of new expressions from an Italian annotated corpus. Experimental results demonstrate how, while still adhering to the rule-based paradigm, the development of automatic rule translation procedures allowed us to minimize the effort required for porting to new languages. Relying on such procedures, and without any manual effort or previous knowledge of the target language, TERSEO recognizes and normalizes temporal expressions in Italian with good results (72% precision and 83% recall for recognition).
Resumo:
In this paper we present an automatic system for the extraction of syntactic semantic patterns applied to the development of multilingual processing tools. In order to achieve optimum methods for the automatic treatment of more than one language, we propose the use of syntactic semantic patterns. These patterns are formed by a verbal head and the main arguments, and they are aligned among languages. In this paper we present an automatic system for the extraction and alignment of syntactic semantic patterns from two manually annotated corpora, and evaluate the main linguistic problems that we must deal with in the alignment process.
Resumo:
In the last few years, there has been a wide development in the research on textual information systems. The goal is to improve these systems in order to allow an easy localization, treatment and access to the information stored in digital format (Digital Databases, Documental Databases, and so on). There are lots of applications focused on information access (for example, Web-search systems like Google or Altavista). However, these applications have problems when they must access to cross-language information, or when they need to show information in a language different from the one of the query. This paper explores the use of syntactic-sematic patterns as a method to access to multilingual information, and revise, in the case of Information Retrieval, where it is possible and useful to employ patterns when it comes to the multilingual and interactive aspects. On the one hand, the multilingual aspects that are going to be studied are the ones related to the access to documents in different languages from the one of the query, as well as the automatic translation of the document, i.e. a machine translation system based on patterns. On the other hand, this paper is going to go deep into the interactive aspects related to the reformulation of a query based on the syntactic-semantic pattern of the request.
Resumo:
In this paper, a proposal of a multi-modal dialogue system oriented to multilingual question-answering is presented. This system includes the following ways of access: voice, text, avatar, gestures and signs language. The proposal is oriented to the question-answering task as a user interaction mechanism. The proposal here presented is in the first stages of its development phase and the architecture is presented for the first time on the base of the experiences in question-answering and dialogues previously developed. The main objective of this research work is the development of a solid platform that will permit the modular integration of the proposed architecture.
Resumo:
Comunicación presentada en Cross-Language Evaluation Forum (CLEF 2008), Aarhus, Denmark, September 17-19, 2008.
Resumo:
The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than generating ranked lists of websites.
Resumo:
The English language and the Internet, both separately and taken together, are nowadays well-acknowledged as powerful forces which influence and affect the lexico-grammatical characteristics of other languages world-wide. In fact, many authors like Crystal (2004) have pointed out the emergence of the so-called Netspeak, that is, the language used in the Net or World Wide Web; as Crystal himself (2004: 19) puts it, ‘a type of language displaying features that are unique to the Internet […] arising out of its character as a medium which is electronic, global and interactive’. This ‘language’, however, may be differently understood: either as an adaptation of the English language proper to internet requirements and purposes, or as a new and rapidly-changing and developing language as a result of a rapid evolution or adaptation to Internet requirements of almost all world languages, for whom English is a trendsetter. If the second and probably most plausible interpretation is adopted, there are three salient features of ‘Netspeak’: (a) the rapid expansion of all its new linguistic developments thanks to the Internet itself, which may lead to the generalization and widespread acceptance of new words, coinages, or meanings, hundreds of times faster than was the case with the printed media. As said above, (b) the visible influence of English, the most prevalent language on the Internet. Consequently, (c) this new language tends to reduce the ‘distance’ between English and other languages as well as the ignorance of the former by speakers of other languages, since the ‘Netspeak’ version of the latter adopts grammatical, syntactic and lexical features of English. Thus, linguistic differences may even disappear when code-switching and/or borrowing occurs, as whole fragments of English appear in other language contexts. As a consequence of the new situation, an ideal context appears for interlanguage or multilingual word formation to thrive: puns, blends, compounds and word creativity in general find in the web the ideal place to gain rapid acceptance world-wide, as a result of fashion, coincidence, or sheer merit of the new linguistic proposals.