890 resultados para Cross-lingual conceptual-semantic relations
Resumo:
The study focuses on picture captions: their grammar and interplay with photographs and their position as semi-independent elements of the news stories. The research was conducted in the framework of critical discourse analysis, social semiotic visual theory and fennistic syntactical research. The data consist of 441 press photographs, 1,815 captions and a number of news items from Finnish dailies. The generic structure potential of the caption includes the caption headline, the caption proper, i.e. the verbalization of the picture content, and the frame. In the data, 41 per cent of the captions have a headline, and 44 per cent contain a caption proper. Characteristic of the caption proper is omission of the finite verb and the use of the present tense, both of which have decreased in Finnish papers during the 20th century. The caption proper is typically a main clause, and both subordinate clauses and participal phrases occur mostly in the frame. While comparing caption variants attached to the same pictures, the processes and their participants proved to be identified considerably identically, following the news agency captions. Instead, the reader?s interpretations of a picture could be directed by framing it in different ways. For example, the caption may focus on the only person depicted, deal with a whole group, or give an abstract account of the situation. The caption is a paratext, a typographically marked, semi-independent element of a news story. Between the headline and the caption, four semantic relations have been identified. The caption may be a paraphrase of the headline, or a close-up illustrating an abstract headline with a concrete example. If the name of the person depicted is their only common factor, the relation between the caption and the headline is additive. A specifying caption will give more details than the headline. The caption may complete, repeat, or summarize the body copy. Naturally, most captions completing the story verbalize the content of the picture. As the caption is often based on the story, it may even repeat the body copy verbatim. The summarizing function is probably becoming increasingly important, as most Finnish newspapers have abandoned the use of a separate standfirst.
Resumo:
Recent advances in neural language models have contributed new methods for learning distributed vector representations of words (also called word embeddings). Two such methods are the continuous bag-of-words model and the skipgram model. These methods have been shown to produce embeddings that capture higher order relationships between words that are highly effective in natural language processing tasks involving the use of word similarity and word analogy. Despite these promising results, there has been little analysis of the use of these word embeddings for retrieval. Motivated by these observations, in this paper, we set out to determine how these word embeddings can be used within a retrieval model and what the benefit might be. To this aim, we use neural word embeddings within the well known translation language model for information retrieval. This language model captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance. The word embeddings used to estimate neural language models produce translations that differ from previous translation language model approaches; differences that deliver improvements in retrieval effectiveness. The models are robust to choices made in building word embeddings and, even more so, our results show that embeddings do not even need to be produced from the same corpus being used for retrieval.
Resumo:
FinnWordNet is a WordNet for Finnish that conforms to the framework given in Fellbaum (1998) and Vossen (ed.) (1998). FinnWordNet is open source and currently contains 117,000 synsets. A classic WordNet consists of synsets, or sets of partial synonyms whose shared meaning is described and exemplified by a gloss, a common part of speech and a hyperonym. Synsets in a WordNet are arranged in hierarchical partial orderings according to semantic relations like hyponymy/hyperonymy. Together the gloss, part of speech and hyperonym fix the meaning of a word and constrain the possible translations of a word in a given synset. The Finnish group has opted for translating Princeton WordNet 3.0 synsets wholesale into Finnish by professional translators, because the translation process can be controlled with regard to quality, coverage, cost and speed of translation. The project was financed by FIN-CLARIN at the University of Helsinki. According to our preliminary evaluation, the translation process was diligent and the quality is on a par with the original Princeton WordNet.
Resumo:
To effectively support today’s global economy, database systems need to manage data in multiple languages simultaneously. While current database systems do support the storage and management of multilingual data, they are not capable of querying across different natural languages. To address this lacuna, we have recently proposed two cross-lingual functionalities, LexEQUAL[13] and SemEQUAL[14], for matching multilingual names and concepts, respectively. In this paper, we investigate the native implementation of these multilingual functionalities as first-class operators on relational engines. Specifically, we propose a new multilingual storage datatype, and an associated algebra of the multilingual operators on this datatype. These components have been successfully implemented in the PostgreSQL database system, including integration of the algebra with the query optimizer and inclusion of a metric index in the access layer. Our experiments demonstrate that the performance of the native implementation is up to two orders-of-magnitude faster than the corresponding outsidethe- server implementation. Further, these multilingual additions do not adversely impact the existing functionality and performance. To the best of our knowledge, our prototype represents the first practical implementation of a crosslingual database query engine.
Resumo:
Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.
Resumo:
Aplicações cientes de contexto precisam de mecanismos para recuperar informações sobre o seu contexto de execução. Com base no contexto atual, tais aplicações são capazes de se autoadaptar para fornecer informações e serviços adequados aos seus usuários. A abordagem comum para infraestruturas de apoio às aplicações sensíveis ao contexto fornece serviços para a descoberta de recursos através da utilização de pares
Resumo:
Esta investigação analisa projetos em disputas na produção de políticas de currículo em EJA (Educação de Jovens e Adultos). Para tal, são investigadas produções de dois espaços em que circulam diferentes textos que enunciam demandas de diversos grupos, a saber, os ENEJAs (Encontro Nacional de Educação de Jovens e Adultos) e o GT 18 da ANPEd (Associação Nacional de Pós-graduação e Pesquisa em Educação). Nesses, são identificadas e problematizadas demandas em disputa na produção de políticas de currículo em EJA, que articuladas no processo de significação, tencionam constituir um discurso hegemônico no currículo voltado à EJA. É destacada, no processo de produção das políticas, a atuação das comunidades epistêmicas em diferentes contextos na tentativa de influenciar e hegemonizar determinados sentidos em torno da produção das políticas de currículo em EJA. Nesse sentido, as políticas de currículo são entendidas como discurso, o que implica abordar e problematizar discursos que circulam em diferentes contextos como atravessados por relações de saber-poder. Para tal, dialogamos com a teoria do discurso proposta por Ernesto Laclau, a abordagem metodológica do ciclo contínuo de políticas de Stephen Ball e a vertente analítica das comunidades epistêmicas. É argumentado que as políticas de currículo são produzidas em diferentes contextos, com o envolvimento de diferentes atores sociais. É defendido, ainda, com base na análise de diferentes documentos e das demandas, que possíveis discursos são constituídos em função da articulação de certas demandas tornadas equivalentes e que buscam hegemonizar determinados sentidos da/na política curricular da EJA. Apontamos ainda, a possibilidade de futuras investigações no campo das políticas de currículo em EJA.
Resumo:
O objetivo deste trabalho é estudar a melancolia em duas vias: uma primeira via que chamaremos histórico-investigativa e que estudará esse conceito na filosofia antiga (Aristóteles) que carrega o legado direto da medicina hipocrática, e estudará ainda a trajetória da melancolia na psiquiatria moderna. Em outras palavras, estudar a forma como a melancolia foi construida como uma afecção do corpo nesses dois momentos históricos fundamentais do termo. Uma segunda via analisará o papel fundamental da construção do conceito de melancolia no interior e ao longo da obra de Freud, tanto na sua função de delimitação de um campo propriamente psicanalítico de reflexão sobre essa doença, como nas suas relações de vizinhança conceitual (onde estão envolvidos alguns dos conceitos mais fundamentais da obra freudiana)
Resumo:
«Realismo e Lirismo em Terra Sonâmbula e Chuva Braba» é um trabalho de leitura que reflecte a nossa percepção em relação a dois mundos particulares que se constroem a partir das obras dos dois escritores africanos. Com efeito, optamos por uma estrutura pragmática do estudo, centrando a nossa atenção na leitura e interpretação dos romances, sem incluirmos um capítulo específico de referências teóricas. Tal estratégia permitiu cruzar o quadro conceptual com as informações textuais resultantes do processo de análise e interpretação do «corpus» do trabalho. As duas obras estabelecem pontos de intersecção no domínio linguístico e cultural como consequência de partilha de um passado histórico, político e social. A localização geográfica de Cabo Verde, a fome prolongada, por um lado, e a guerra catastrófica que abalou Moçambique entre 1976 e 1992, por outro, permitiram extrapolar recorrências temáticas inspiradas em impressões e experiências dos autores, relacionadas com práticas e vivências que, no trato literário, ganharam uma dimensão lírico-realista de grande valor hermenêutico. A insularidade e a continentalidade que opõem Cabo Verde e Moçambique, assim como a fome e a guerra que os caracterizam respectivamente, a procura de um espaço literário a partir das marcas de crioulidade e moçambicanidade compõem um conjunto de valores estéticos que configuram o imaginário cultural dos dois países africanos de língua portuguesa. Esta tese pesquisa as imagens e os aspectos fundamentais ínsitos nos dois romances, procurando mostrar até que ponto, a partir de temáticas de fome e guerra se pode construir narrativas lírico-realistas. O estudo permitiu observar que as imagens de sofrimento, desolação e desassossego constituem, geralmente, o paradigma estético da escrita lírica e realista de Mia Couto e Manuel Lopes.
Resumo:
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal
Resumo:
La diversification des résultats de recherche (DRR) vise à sélectionner divers documents à partir des résultats de recherche afin de couvrir autant d’intentions que possible. Dans les approches existantes, on suppose que les résultats initiaux sont suffisamment diversifiés et couvrent bien les aspects de la requête. Or, on observe souvent que les résultats initiaux n’arrivent pas à couvrir certains aspects. Dans cette thèse, nous proposons une nouvelle approche de DRR qui consiste à diversifier l’expansion de requête (DER) afin d’avoir une meilleure couverture des aspects. Les termes d’expansion sont sélectionnés à partir d’une ou de plusieurs ressource(s) suivant le principe de pertinence marginale maximale. Dans notre première contribution, nous proposons une méthode pour DER au niveau des termes où la similarité entre les termes est mesurée superficiellement à l’aide des ressources. Quand plusieurs ressources sont utilisées pour DER, elles ont été uniformément combinées dans la littérature, ce qui permet d’ignorer la contribution individuelle de chaque ressource par rapport à la requête. Dans la seconde contribution de cette thèse, nous proposons une nouvelle méthode de pondération de ressources selon la requête. Notre méthode utilise un ensemble de caractéristiques qui sont intégrées à un modèle de régression linéaire, et génère à partir de chaque ressource un nombre de termes d’expansion proportionnellement au poids de cette ressource. Les méthodes proposées pour DER se concentrent sur l’élimination de la redondance entre les termes d’expansion sans se soucier si les termes sélectionnés couvrent effectivement les différents aspects de la requête. Pour pallier à cet inconvénient, nous introduisons dans la troisième contribution de cette thèse une nouvelle méthode pour DER au niveau des aspects. Notre méthode est entraînée de façon supervisée selon le principe que les termes reliés doivent correspondre au même aspect. Cette méthode permet de sélectionner des termes d’expansion à un niveau sémantique latent afin de couvrir autant que possible différents aspects de la requête. De plus, cette méthode autorise l’intégration de plusieurs ressources afin de suggérer des termes d’expansion, et supporte l’intégration de plusieurs contraintes telles que la contrainte de dispersion. Nous évaluons nos méthodes à l’aide des données de ClueWeb09B et de trois collections de requêtes de TRECWeb track et montrons l’utilité de nos approches par rapport aux méthodes existantes.
Resumo:
Social tagging has become very popular around the Internet as well as in research. The main idea behind tagging is to allow users to provide metadata to the web content from their perspective to facilitate categorization and retrieval. There are many factors that influence users' tag choice. Many studies have been conducted to reveal these factors by analysing tagging data. This paper uses two theories to identify these factors, namely the semiotics theory and activity theory. The former treats tags as signs and the latter treats tagging as an activity. The paper uses both theories to analyse tagging behaviour by explaining all aspects of a tagging system, including tags, tagging system components and the tagging activity. The theoretical analysis produced a framework that was used to identify a number of factors. These factors can be considered as categories that can be consulted to redirect user tagging choice in order to support particular tagging behaviour, such as cross-lingual tagging.
Resumo:
Perception is linked to action via two routes: a direct route based on affordance information in the environment and an indirect route based on semantic knowledge about objects. The present study explored the factors modulating the recruitment of the two routes, in particular which factors affecting the selection of paired objects. In Experiment 1, we presented real objects among semantically related or unrelated distracters. Participants had to select two objects that can interact. The presence of distracters affected selection times, but not the semantic relations of the objects with the distracters. Furthermore, participants first selected the active object (e.g. teaspoon) with their right hand, followed by the passive object (e.g. mug), often with their left hand. In Experiment 2, we presented pictures of the same objects with no hand grip, congruent or incongruent hand grip. Participants had to decide whether the two objects can interact. Action decisions were faster when the presentation of the active object preceded the presentation of the passive object, and when the grip was congruent. Interestingly, participants were slower when the objects were semantically but not functionally related; this effect increased with congruently gripped objects. Our data showed that action decisions in the presence of strong affordance cues (real objects, pictures of congruently gripped objects) relied on sensory-motor representation, supporting the direct route from perception-to-action that bypasses semantic knowledge. However, in the case of weak affordance cues (pictures), semantic information interfered with action decisions, indicating that semantic knowledge impacts action decisions. The data support the dual-route account from perception-to-action.
Resumo:
This paper presents a proposal for the semantic treatment of ambiguous homographic forms in Brazilian Portuguese, and to offer linguistic strategies for its computational implementation in Systems of Natural Language Processing (SNLP). Pustejovsky's Generative Lexicon was used as a theoretical model. From this model, the Qualia Structure - QS (and the Formal, Telic, Agentive and Constitutive roles) was selected as one of the linguistic and semantic expedients for the achievement of disambiguation of homonym forms. So that analyzed and treated data could be manipulated, we elaborated a Lexical Knowledge Base (LKB) where lexical items are correlated and interconnected by different kinds of semantic relations in the QS and ontological information.