847 resultados para lexical unit
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Département de linguistique et de traduction
Resumo:
Notre mémoire porte sur l’attrition de la marque d’usage PROVERBIAL de la 7e (1878) à la 8e édition du Dictionnaire de l’Académie française (1932-35). L’informatisation des huit éditions achevées de l’ouvrage (Redon 2002), qui permet tant le relevé quantitatif que qualitatif des données, laisse voir que la marque jouit d’une grande stabilité dans les sept premières éditions, mais que son emploi chute considérablement de l’édition de 1878 à celle de 1932-35. Dans les limites notre projet, nous tâchons de comprendre le comment de cette érosion. Pour ce faire, nous avons recensé exhaustivement les lexies touchées de la 7e à la 8e édition, en tenant compte des cas de figure logiquement possibles : retrait d’un article ou d’une lexie dans la 8e édition, lexies partagées avec la 7e édition mais a) dépouillées de la marque, b) marquées différemment ou c) héritières du marquage d’origine. Dans l’édition de 1878, PROVERBIAL s’applique à 4 674 lexies distribuées dans 1 645 articles. Au terme de notre recherche, nous identifions les expressions proverbiales disparues ou maintenues dans le passage crucial de la 7e à la 8e édition du Dictionnaire de l’Académie française. Nous en tirons des résultats qui portent, entre autres, sur la transformation du système de marquage dans la tradition du Dictionnaire de l’institution.
Resumo:
L’annotation en rôles sémantiques est une tâche qui permet d’attribuer des étiquettes de rôles telles que Agent, Patient, Instrument, Lieu, Destination etc. aux différents participants actants ou circonstants (arguments ou adjoints) d’une lexie prédicative. Cette tâche nécessite des ressources lexicales riches ou des corpus importants contenant des phrases annotées manuellement par des linguistes sur lesquels peuvent s’appuyer certaines approches d’automatisation (statistiques ou apprentissage machine). Les travaux antérieurs dans ce domaine ont porté essentiellement sur la langue anglaise qui dispose de ressources riches, telles que PropBank, VerbNet et FrameNet, qui ont servi à alimenter les systèmes d’annotation automatisés. L’annotation dans d’autres langues, pour lesquelles on ne dispose pas d’un corpus annoté manuellement, repose souvent sur le FrameNet anglais. Une ressource telle que FrameNet de l’anglais est plus que nécessaire pour les systèmes d’annotation automatisé et l’annotation manuelle de milliers de phrases par des linguistes est une tâche fastidieuse et exigeante en temps. Nous avons proposé dans cette thèse un système automatique pour aider les linguistes dans cette tâche qui pourraient alors se limiter à la validation des annotations proposées par le système. Dans notre travail, nous ne considérons que les verbes qui sont plus susceptibles que les noms d’être accompagnés par des actants réalisés dans les phrases. Ces verbes concernent les termes de spécialité d’informatique et d’Internet (ex. accéder, configurer, naviguer, télécharger) dont la structure actancielle est enrichie manuellement par des rôles sémantiques. La structure actancielle des lexies verbales est décrite selon les principes de la Lexicologie Explicative et Combinatoire, LEC de Mel’čuk et fait appel partiellement (en ce qui concerne les rôles sémantiques) à la notion de Frame Element tel que décrit dans la théorie Frame Semantics (FS) de Fillmore. Ces deux théories ont ceci de commun qu’elles mènent toutes les deux à la construction de dictionnaires différents de ceux issus des approches traditionnelles. Les lexies verbales d’informatique et d’Internet qui ont été annotées manuellement dans plusieurs contextes constituent notre corpus spécialisé. Notre système qui attribue automatiquement des rôles sémantiques aux actants est basé sur des règles ou classificateurs entraînés sur plus de 2300 contextes. Nous sommes limités à une liste de rôles restreinte car certains rôles dans notre corpus n’ont pas assez d’exemples annotés manuellement. Dans notre système, nous n’avons traité que les rôles Patient, Agent et Destination dont le nombre d’exemple est supérieur à 300. Nous avons crée une classe que nous avons nommé Autre où nous avons rassemblé les autres rôles dont le nombre d’exemples annotés est inférieur à 100. Nous avons subdivisé la tâche d’annotation en sous-tâches : identifier les participants actants et circonstants et attribuer des rôles sémantiques uniquement aux actants qui contribuent au sens de la lexie verbale. Nous avons soumis les phrases de notre corpus à l’analyseur syntaxique Syntex afin d’extraire les informations syntaxiques qui décrivent les différents participants d’une lexie verbale dans une phrase. Ces informations ont servi de traits (features) dans notre modèle d’apprentissage. Nous avons proposé deux techniques pour l’identification des participants : une technique à base de règles où nous avons extrait une trentaine de règles et une autre technique basée sur l’apprentissage machine. Ces mêmes techniques ont été utilisées pour la tâche de distinguer les actants des circonstants. Nous avons proposé pour la tâche d’attribuer des rôles sémantiques aux actants, une méthode de partitionnement (clustering) semi supervisé des instances que nous avons comparée à la méthode de classification de rôles sémantiques. Nous avons utilisé CHAMÉLÉON, un algorithme hiérarchique ascendant.
Resumo:
Cette recherche porte sur l’interface entre la sémantique lexicale et la syntaxe, et elle s’inscrit dans le cadre du projet de base lexicale DiCo (acronyme pour Dictionnaire de combinatoire) à l’Observatoire de Linguistique Sens-Texte [OLST] de l’Université de Montréal. Le projet découle d'une volonté d'inscrire de façon concise et complète, à même le dictionnaire, le comportement syntaxique typique à chaque unité lexicale. Dans cette optique, nous encodons la cooccurrence des lexies nominales du DiCo avec leurs actants à l'intérieur d'un tableau de régime lexical (aussi connu sous le nom de schéma valenciel, structure argumentale, cadre de sous-catégorisation, structure prédicats-arguments, etc.), en notant entre autres les dépendances syntaxiques de surface impliquées. Dans ce mémoire, nous présentons les propriétés syntaxiques d'une dépendance nominale du français, celle que nous avons nommée attributive adnominale, de façon à exposer une méthodologie d'identification et de caractérisation des dépendances syntaxiques de surface. Nous donnons également la liste des dépendances nominales régies identifiées au cours de ce travail. Par la suite, nous exposons la création d'une base de données de régimes généralisés du français nommée CARNAVAL. Finalement, nous discutons des applications possibles de notre travail, particulièrement en ce qui a trait à la création d'une typologie des régimes lexicaux du français.
Resumo:
Almost all texts contain some complex lexical units, belonging to the phraseology of the language of a specialized field or of the general language. The translator must first identify this phraseologism, and then understand its meaning. However, it is not enough to propose an explanation in the target language: the translator has to establish its phraseologically equivalent lexical unit in meaning and in phraseological formulations.
Resumo:
Pós-graduação em Estudos Linguísticos - IBILCE
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
This study aims to investigate the Idioms (IEs) or combinations related to the Italian lexical units testa and capo, in comparison to the Portuguese lexical unit cabeça. Since they have come from two completely different etyma, they are not perfect synonyms; on the contrary, they gave rise to several expressions that are common to just one lexical units. Corpus selection was made in monolingual Italian general dictionaries and then the data was classified according to each typology: idioms that are common only with the unit capo; idioms just with head; idioms that are synonyms with both; IEs whose translations refer to other parts of the body. As a result, we found that most of the IEs with capo or testa have common semes, but most of them also are specific to one or other lexical unit exclusively, confirming the difference in semantic features between them as well as non-univocity between languages.
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
This paper presents a proposal for a recognition model for the appraisal value of sentences. It is based on splitting the text into independent sentences (full stops) and then analysing the appraisal elements contained in each sentence according to the previous value in the appraisal lexicon. In this lexicon, positive words are assigned a positive coefficient (+1) and negative words a negative coefficient (-1). We take into account word such as ?too?, ?little? (when it is not ?a bit?), ?less?, and ?nothing? than can modify the polarity degree of lexical unit when appear in the nearby environment. If any of these elements are present, then the previous coefficient will be multiplied by (-1), that is, they will change their sign. Our results show a nearly theoretical effectiveness of 90%, despite not achieving the recognition (or misrecognition) of implicit elements. These elements represent approximately 4% of the total of sentences analysed for appraisal and include the errors in the recognition of coordinated sentences. On the one hand, we found that 3.6 % of the sentences could not be recognized because they use different connectors than those included in the model; on the other hand, we found that in 8.6% of the sentences despite using some of the described connectors could not be applied the rules we have developed. The percentage relative to the whole group of appraisal sentences in the corpus was approximately of 5%.
Resumo:
Esta investigación se enmarca dentro de los denominados lenguajes de especialidad que para esta tesis será el de las Tecnologías de la Información y la Comunicación (TIC). De todos los aspectos relacionados con el estudio de estos lenguajes que pudieran tener interés lingüístico ha primado el análisis del componente terminológico. Tradicionalmente la conceptualización de un campo del saber se representaba mayoritariamente a través del elemento nominal, así lo defiende la Teoría General de la Terminología (Wüster, 1968). Tanto la lexicología como la lexicografía han aportado importantes contribuciones a los estudios terminológicos para la identificación del componente léxico a través del cual se transmite la información especializada. No obstante esos primeros estudios terminológicos que apuntaban al sustantivo como elmentos denominativo-conceptual, otras teorías más recientes, entre las que destacamos la Teoría Comunicativa de la Terminología (Cabré, 1999) identifican otras estructuras morfosintácticas integradas por otros elementos no nominales portadores igualmente de esa carga conceptual. A partir de esta consideración, hemos seleccionado para este estudio el adjetivo relacional en tanto que representa otra categoría gramatical distinta al sustantivo y mantiene un vínculo con éste debido a su procedencia. Todo lo cual puede suscitar cierto interés terminológico. A través de esta investigación, nos hemos propuesto demostrar las siguientes hipótesis: 1. El adjetivo relacional aporta contenido especializado en su asociación con el componente nominal. 2. El adjetivo relacional es portador de un valor semántico que hace posible identificar con más precisión la relación conceptual de los elementos -adjetivo y sustantivo - de la combinación léxica resultante, especialmente en algunas formaciones ambiguas. 3. El adjetivo relacional, como modificador natural del sustantivo al que acompaña, podría imponer cierta restricción en sus combinaciones y, por tanto, hacer una selección discriminada de los integrantes de la combinación léxica especializada. Teniendo en cuenta las anteriores hipótesis, esta investigación ha delimitado y caracterizado el segmento léxico objeto de estudio: la ‘combinación léxica especializada (CLE)’ formalmente representada por la estructura sintáctica [adjR+n], en donde adjR es el adjetivo y n el sustantivo al que acompaña. De igual forma hemos descrito el marco teórico desde el que abordar nuestro análisis. Se trata de la teoría del Lexicón Generatvio (LG) y de la representación semántica (Pustojovsky, 1995) que propone como explicación de la generación de significados. Hemos analizado las distintas estructuras de representación léxica y en especial la estructura qualia a través de la cual hemos identificado la relación semántica que mantienen los dos ítems léxicos [adjR+n] de la estructura sintáctica de nuestro estudio. El estudio semántico de las dos piezas léxicas ha permitido, además, comprobar el valor denominativo del adjetivo en la combinación. Ha sido necesario elaborar un corpus de textos escritos en inglés y español pertenecientes al discurso de especialidad de las TIC. Este material ha sido procesado para nuestros fines utilizando distintas herramientas electrónicas. Se ha hecho uso de lexicones electrónicos, diccionarios online generales y de especialidad y corpus de referencia online, estos últimos para poder eventualmente validad nuetros datos. Asimismo se han utilizado motores de búsqueda, entre ellos WordNet Search 3.1, para obtener la información semántica de nuestros elementos léxicos. Nuestras conclusiones han corroborado las hipótesis que se planteaban en esta tesis, en especial la referente al valor denominativo-conceptual del adjetivo relacional el cual, junto con el sustantivo al que acompaña, forma parte de la representación cognitiva del lenguaje de especialidad de las TIC. Como continuación a este estudio se proponen sugerencias sobre líneas futuras de investigación así como el diseño de herramientas informáticas que pudieran incorporar estos datos semánticos como complemento de los ítems léxicos dotados de valor denominativo-conceptual. ABSTRACT This research falls within the field of the so-called Specialized Languages which for the purpose of this study is the Information and Communication Technology (ICT) discourse. Considering their several distinguishing features terminology concentrates our interest from the point of view of linguistics. It is broadly assumed that terms represent concepts of a subject field. For the classical view of terminology (Wüster, 1968) these terms are formally represented by nouns. Both lexicology and terminology have made significant contributions to the study of terms. Later research as well as other theories on Terminology such as the Communicative Theory of Terminology (Cabré, 1993) have shown that other lexical units can also represent knowledge organization. On these bases, we have focused our research on the relational adjective which represents a functional unit different from a noun while still connected to the noun by means of its nominal root. This may have a potential terminological interest. Therefore the present research is based on the next hypotheses: 1. The relational adjective conveys specialized information when combined with the noun. 2. The relational adjective has a semantic meaning which helps understand the conceptual relationship between the adjective and the noun being modified and disambiguate certain senses of the resulting lexical combination. 3. The relational adjective may impose some restrictions when choosing the nouns it modifies. Considering the above hypotheses, this study has identified and described a multi-word lexical unit pattern [Radj+n] referred to as a Specialized Lexical Combination (SLC) linguistically realized by a relational adjective, Radj, and a noun, n. The analysis of such a syntactic pattern is addressed from the framework of the Generative Lexicon (Pustojovsky, 1995). Such theory provides several levels of semantic description which help lexical decomposition performed generatively. These levels of semantic representation are connected through generative operations or generative devices which account for the compositional interpretation of any linguistic utterance in a given context. This study analyses these different levels and focuses on one of them, i.e. the qualia structure since it may encode the conceptual meaning of the syntactic pattern [Radj+n]. The semantic study of these two lexical items has ultimately confirmed the conceptual meaning of the relational adjective. A corpus made of online ICT articles from magazines written in English and Spanish – some being their translations - has been used for the word extraction. For this purpose some word processing software packages have been employed. Moreover online general language and specialized language dictionaries have been consulted. Search engines, namely WordNet Search 3.1, have been also exploited to find the semantic information of our lexical units. Online reference corpora in English and Spanish have been used for a contrastive analysis of our data. Finally our conclusions have confirmed our initial hypotheses, i.e. relational adjectives are specialized lexical units which together with the nouns are part of the knowledge representation of the ICT subject field. Proposals for new research have been made together with some other suggestions for the design of computer applications to visually show the conceptual meaning of certain lexical units.
Resumo:
El trabajo releva la presencia de las universidades argentinas en la prensa de alcance nacional. Sistematizaremos la mención de ellas en la versión digital de los diarios para observar frecuencia de aparición y contenidos que se asocian. Partimos de la hipótesis de que los medios hegemónicos sólo registran los eventos de universidades del “interior del país” cuando se vinculan con hechos curiosos o violentos. Interesa indagar en qué casos particulares la referencia a dichas universidades se vincula con la producción y difusión académica del conocimiento científico, entre otras. Constituimos el corpus con fragmentos de textos periodísticos que permiten examinar estructuras gramaticales para develar las representaciones que la prensa naturaliza a través de esas formas. Seguimos consideraciones teórico-metodológicas de la Lingüística Crítica sobre el nivel sintáctico: cuando aparece una distorsión en la estructura superficial existe una manipulación ideológica del sentido. Incluimos parte de la teoría de las transformaciones: en el relato de un acontecimiento, cualquier alteración del esquema causa-consecuencia construye una ideología por la cual percibimos los hechos de otra manera. Asimismo, para establecer correlaciones semánticas en lo discursivo, analizaremos los términos en función de tipos de significado y valor eufórico/disfórico de la base léxica. Por otra parte, desde la Teoría de la Enunciación, distinguiremos palabras objetivas y subjetivas: los hechos enunciativos están constituidos por las huellas lingüísticas de la presencia del locutor en su enunciado, denominadas subjetivemas. Además de los pronombres, rasgos espaciales y temporales como marcas enunciativas, seleccionaremos unidades léxicas como sustantivos, adjetivos, verbos y adverbios. La existencia de los subjetivemas se fundamenta en que toda unidad léxica implica una interpretación del mundo, esto es, una ideología en el sentido que asumimos en nuestro trabajo desde el Análisis Crítico del Discurso. Como complemento de las perspectivas interpretativas mencionadas, realizaremos la cuantificación previa de referencias mediante el Análisis de Contenido como apoyo metodológico. A tal fin, el estudio prevé la utilización de una herramienta tecnológica diseñada ad hoc, para el conteo de frecuencia, relevamiento de contextos lingüísticos de aparición y asociaciones léxicas en la frase, como sistematización previa al trabajo cualitativo de valoración de estructuras léxico-semánticas y morfo-sintácticas. The research reflects the presence of Argentine universities in nationwide media. We will systematize their mention on online version of La Nación and Clarín newspapers to observe their frequency of appearance and the contents they are related to. With the hypothesis that hegemonic media only register events in universities from inside the country when they are related to curious or violent events, we will investigate in which particular cases the reference to such universities is related to the production and academic diffusion of scientific knowledge. We built up the corpus with fragments from articles which allow us to examine the deep grammatical structures to reveal the representation press naturalizes through these forms. We follow theoretical-methodological considerations of Critical Linguistics (CL) about the syntactical level: when a distortion in the superficial structure appears, there exists an ideological manipulation of meaning. Also, to establish semantic correlations on the discursive field, we will analyze terms as regards types of meaning and euphoric/disphoric value of lexical basis. From the Theory of Enunciation, we will distinguish between objective and subjective words (“subjectivemes”): lexical unit implies an interpretation of the world, that is, an ideology in the way we assume in our work from Critical Discourse Analysis (CDA). As a complement to CL and CDA, we will carry out a previous quantification using a technological tool designed ad hoc.
Resumo:
The percentage of subjects recalling each unit in a list or prose passage is considered as a dependent measure. When the same units are recalled in different tasks, processing is assumed to be the same; when different units are recalled, processing is assumed to be different. Two collections of memory tasks are presented, one for lists and one for prose. The relations found in these two collections are supported by an extensive reanalysis of the existing prose memory literature. The same set of words were learned by 13 different groups of subjects under 13 different conditions. Included were intentional free-recall tasks, incidental free recall following lexical decision, and incidental free recall following ratings of orthographic distinctiveness and emotionality. Although the nine free-recall tasks varied widely with regard to the amount of recall, the relative probability of recall for the words was very similar among the tasks. Imagery encoding and recognition produced relative probabilities of recall that were different from each other and from the free-recall tasks. Similar results were obtained with a prose passage. A story was learned by 13 different groups of subjects under 13 different conditions. Eight free-recall tasks, which varied with respect to incidental or intentional learning, retention interval, and the age of the subjects, produced similar relative probabilities of recall, whereas recognition and prompted recall produced relative probabilities of recall that were different from each other and from the free-recall tasks. A review of the prose literature was undertaken to test the generality of these results. Analysis of variance is the most common statistical procedure in this literature. If the relative probability of recall of units varied across conditions, a units by condition interaction would be expected. For the 12 studies that manipulated retention interval, an average of 21% of the variance was accounted for by the main effect of retention interval, 17% by the main effect of units, and only 2% by the retention interval by units interaction. Similarly, for the 12 studies that varied the age of the subjects, 6% of the variance was accounted for by the main effect of age, 32% by the main effect of units, and only 1% by the interaction of age by units.(ABSTRACT TRUNCATED AT 400 WORDS)
Resumo:
Relatório da prática de ensino supervisionada, Mestrado em Ensino do Espanhol Língua Estrangeira, Universidade de Lisboa, 2011