882 resultados para Word (Linguistics)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Models of word meaning, built from a corpus of text, have demonstrated success in emulating human performance on a number of cognitive tasks. Many of these models use geometric representations of words to store semantic associations between words. Often word order information is not captured in these models. The lack of structural information used by these models has been raised as a weakness when performing cognitive tasks. This paper presents an efficient tensor based approach to modelling word meaning that builds on recent attempts to encode word order information, while providing flexible methods for extracting task specific semantic information.
Resumo:
This paper explores the literature and analyses the different uses and understandings of the word “design” in Portuguese colonised countries, using Brazil as the main example. It investigates the relationship between the linguistic existence of terms to define and describe “design” as an activity and field, and the roles and perceptions of Design by the general society. It also addresses the effects that the lack of a proper translation causes on the local community from a cultural point of view. The current perception of Design in Portuguese colonies is associated to two main aspects: linguistic and historical. Both of them differentiate the countries taken into consideration from other countries that have a different background. The changes associated to the meaning of “design” throughout the years, caused a great impact on the perceptions that people have about Design. On the other hand, the development of Design has also influenced the changes on the meaning of the term, as a result of the legacy from the colonisation period and also as a characteristic of the Portuguese language. Design has developed and reached a level of excellence in Portuguese colonised countries that competes with the most traditional Design cultures in the world. However, this level of Design is enmeshed into an elite belonging to universities and specialised markets, therefore Design is not democratised. The ultimate aim of this study is to promote discussions on how to make the discourse surrounding this area more accessible to people from non-English speaking countries that do not have the word “design” in their local language.
Resumo:
The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.
Resumo:
ACM SIGIR; ACM SIGWEB
Resumo:
The analytic advantages of central concepts from linguistics and information theory, and the analogies demonstrated between them, for understanding patterns of retrieval from full-text indexes to documents are developed. The interaction between the syntagm and the paradigm in computational operations on written language in indexing, searching, and retrieval is used to account for transformations of the signified or meaning between documents and their representation and between queries and documents retrieved. Characteristics of the message, and messages for selection for written language, are brought to explain the relative frequency of occurrence of words and multiple word sequences in documents. The examples given in the companion article are revisited and a fuller example introduced. The signified of the sequence stood for, the term classically used in the definitions of the sign, as something standing for something else, can itself change rapidly according to its syntagm. A greater than ordinary discourse understanding of patterns in retrieval is obtained.
Resumo:
An analogy is established between the syntagm and paradigm from Saussurean linguistics and the message and messages for selection from the information theory initiated by Claude Shannon. The analogy is pursued both as an end itself and for its analytic value in understanding patterns of retrieval from full text systems. The multivalency of individual words when isolated from their syntagm is contrasted with the relative stability of meaning of multi-word sequences, when searching ordinary written discourse. The syntagm is understood as the linear sequence of oral and written language. Saussureâ??s understanding of the word, as a unit which compels recognition by the mind, is endorsed, although not regarded as final. The lesser multivalency of multi-word sequences is understood as the greater determination of signification by the extended syntagm. The paradigm is primarily understood as the network of associations a word acquires when considered apart from the syntagm. The restriction of information theory to expression or signals, and its focus on the combinatorial aspects of the message, is sustained. The message in the model of communication in information theory can include sequences of written language. Shannonâ??s understanding of the written word, as a cohesive group of letters, with strong internal statistical influences, is added to the Saussurean conception. Sequences of more than one word are regarded as weakly correlated concatenations of cohesive units.
Resumo:
Ce mémoire présente une étude de la morphologie de ce qui est généralement appelé le pluriel nominal du persan (parler de Téhéran) dans le cadre d’une théorie de la morphologie basée sur le mot : Whole Word Morphology, développée par Ford et Singh (1991). Ce modèle lexicaliste adopte une position plus forte que les modèles proposés par Aronoff (1976) et Anderson (1992) en n’admettant aucune opération morphologique sur des unités plus petites que le mot. Selon cette théorie, une description morphologique consiste en l’énumération des Stratégies de Formation de Mots (SFM), licencées chacunes par au moins deux paires de mots ayant la même covariation formelle et sémantique. Tous les SFM suit le même schéma. Nous avons répertorié 49 SFM regroupant les pluriels et les collectifs. Nous constatons qu’il est difficile de saisir le pluriel nominal du persan en tant que catégorie syntaxique et que les différentes « marques du pluriel » présentées dans la littérature ne constituent pas un ensemble homogène : elles partagent toutes un sens de pluralité qui cependant varie d’une interprétation référentielle à une interprétation collective non-référentielle. Cette étude vise la déscription de la compétence morphologique, ce qui ne dépend d’aucune considération extralinguistique. Nous argumentons notamment contre la dichotomie arabe/persan généralement admise dans la littérature. Nous avons également fourni des explications quant à la production des pluriels doubles et avons discuté de la variation supposée du fait d’un choix multiple de « marques du pluriel ».