996 resultados para Corpora (Linguistics)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Pós-graduação em Estudos Linguísticos - IBILCE
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Pós-graduação em Estudos Linguísticos - IBILCE
Resumo:
Pós-graduação em Estudos Linguísticos - IBILCE
Resumo:
[EN]Qualitative and quantitative research approaches are often considered as incompatible, and when they are brought together in a study, the analyses often stay within the realm of the same research field. The study at hand aims at combining the two methods from the perspectives of different disciplines and tries to determine to which degree a corpus-based analysis might support the traditional content-focused approach to qualitative data and render additional results.
Resumo:
[EN]This paper is a proposal for teaching pragmatics following a corpus-based approach. Corpora have had a high impact on how linguistics is looked at these days. However, teaching linguistics is still traditional in its scope and stays away from a growing tendency of incorporating authentic samples in the theoretical classroom, and so lecturers perpetuate the presentation of the same canonical examples students may find in their textbooks or in other introductory monographs. Our view is that using corpus linguistics, especially corpora freely available in the World Wide Web, will result in a more engaging and fresh look at the course of Pragmatics, while promoting early research in students. This way, they learn the concepts but most importantly how to later identify pragmatic phenomena in real text. Here, we raise our concern with the methodology, presenting clear examples of corpus-based pragmatic activities, and one clear result is the fact that students learn also how to be autonomous in their analysis o f data. In our proposal, we move from more controlled tasks to autonomy. This proposal focuses on students enrolled in the course Pragmática de la Lengua inglesa, currently part of the curriculum in Lenguas Modernas, Universidad de Las Palmas de Gran Canaria.
Resumo:
Con questa tesi abbiamo messo a punto una metodologia per l'applicazione del "corpus-based approach" allo studio dell'interpretazione simultanea, creando DIRSI-C, un corpus elettronico parallelo (italiano-inglese) e allineato di trascrizioni di registrazioni tratte da convegni medici, mediati da interpreti simultaneisti. Poiché gli interpreti professionisti coinvolti hanno lavorato dalla lingua straniera alla loro lingua materna e viceversa, il fattore direzionalità è il parametro di analisi delle prestazioni degli interpreti secondo i metodi di indagine della linguistica dei corpora. In this doctoral thesis a methodology was developed to fully apply the corpus-based approach to simultaneous interpreting research. DIRSI-C is a parallel (Italian-English/English-Italian) and aligned electronic corpus, containing transcripts of recorded medical international conferences with professional simultaneous interpreters working both from and into their foreign language. Against this backdrop, directionality represents the research parameter used to analyze interpreters' performance by means of corpus linguistics tools.
Resumo:
This thesis is concerned with the role played by software tools in the analysis and dissemination of linguistic corpora and their contribution to a more widespread adoption of corpora in different fields. Chapter 1 contains an overview of some of the most relevant corpus analysis tools available today, presenting their most interesting features and some of their drawbacks. Chapter 2 begins with an explanation of the reasons why none of the available tools appear to satisfy the requirements of the user community and then continues with technical overview of the current status of the new system developed as part of this work. This presentation is followed by highlights of features that make the system appealing to users and corpus builders (i.e. scholars willing to make their corpora available to the public). The chapter concludes with an indication of future directions for the projects and information on the current availability of the software. Chapter 3 describes the design of an experiment devised to evaluate the usability of the new system in comparison to another corpus tool. Usage of the tool was tested in the context of a documentation task performed on a real assignment during a translation class in a master's degree course. In chapter 4 the findings of the experiment are presented on two levels of analysis: firstly a discussion on how participants interacted with and evaluated the two corpus tools in terms of interface and interaction design, usability and perceived ease of use. Then an analysis follows of how users interacted with corpora to complete the task and what kind of queries they submitted. Finally, some general conclusions are drawn and areas for future work are outlined.
Resumo:
La tesi si articola in quattro parti. La prima, di stampo femminista, propone una panoramica sul femminicidio come fenomeno sociale e sulla relativa situazione giuridica internazionale. La seconda tratta in generale la stampa di qualità, supporto mediatico prescelto per l'analisi linguistica. La terza parte propone un micro-corpus di stampa italiana sul tema del femminicidio e la quarta un micro-corpus di stampa francese sull' "Affaire DSK", entrambe corredate di un' analisi del componente lessicale e discorsivo (Analyse du discours). E' un lavoro comparativo, i cui risultati hanno permesso di mettere in evidenza e provare come la stampa di qualità italiana e francese tendano a veicolare in modo implicito un'immagine sessista, sterotipata e discriminatoria del femminicidio e della vittima di violenza.
Resumo:
This thesis concerns artificially intelligent natural language processing systems that are capable of learning the properties of lexical items (properties like verbal valency or inflectional class membership) autonomously while they are fulfilling their tasks for which they have been deployed in the first place. Many of these tasks require a deep analysis of language input, which can be characterized as a mapping of utterances in a given input C to a set S of linguistically motivated structures with the help of linguistic information encoded in a grammar G and a lexicon L: G + L + C → S (1) The idea that underlies intelligent lexical acquisition systems is to modify this schematic formula in such a way that the system is able to exploit the information encoded in S to create a new, improved version of the lexicon: G + L + S → L' (2) Moreover, the thesis claims that a system can only be considered intelligent if it does not just make maximum usage of the learning opportunities in C, but if it is also able to revise falsely acquired lexical knowledge. So, one of the central elements in this work is the formulation of a couple of criteria for intelligent lexical acquisition systems subsumed under one paradigm: the Learn-Alpha design rule. The thesis describes the design and quality of a prototype for such a system, whose acquisition components have been developed from scratch and built on top of one of the state-of-the-art Head-driven Phrase Structure Grammar (HPSG) processing systems. The quality of this prototype is investigated in a series of experiments, in which the system is fed with extracts of a large English corpus. While the idea of using machine-readable language input to automatically acquire lexical knowledge is not new, we are not aware of a system that fulfills Learn-Alpha and is able to deal with large corpora. To instance four major challenges of constructing such a system, it should be mentioned that a) the high number of possible structural descriptions caused by highly underspeci ed lexical entries demands for a parser with a very effective ambiguity management system, b) the automatic construction of concise lexical entries out of a bulk of observed lexical facts requires a special technique of data alignment, c) the reliability of these entries depends on the system's decision on whether it has seen 'enough' input and d) general properties of language might render some lexical features indeterminable if the system tries to acquire them with a too high precision. The cornerstone of this dissertation is the motivation and development of a general theory of automatic lexical acquisition that is applicable to every language and independent of any particular theory of grammar or lexicon. This work is divided into five chapters. The introductory chapter first contrasts three different and mutually incompatible approaches to (artificial) lexical acquisition: cue-based queries, head-lexicalized probabilistic context free grammars and learning by unification. Then the postulation of the Learn-Alpha design rule is presented. The second chapter outlines the theory that underlies Learn-Alpha and exposes all the related notions and concepts required for a proper understanding of artificial lexical acquisition. Chapter 3 develops the prototyped acquisition method, called ANALYZE-LEARN-REDUCE, a framework which implements Learn-Alpha. The fourth chapter presents the design and results of a bootstrapping experiment conducted on this prototype: lexeme detection, learning of verbal valency, categorization into nominal count/mass classes, selection of prepositions and sentential complements, among others. The thesis concludes with a review of the conclusions and motivation for further improvements as well as proposals for future research on the automatic induction of lexical features.
Resumo:
L’obiettivo della presente dissertazione è quello di creare un nuovo linguaggio controllato, denominato Español Técnico Simplificado (ETS). Basato sulla specifica tecnica del Simplified Technical English (STE), ufficialmente conosciuta come ASD-STE100, lo spagnolo controllato ETS si presenta come un documento metalinguistico in grado di fornire ad un redattore o traduttore tecnico alcune regole specifiche per produrre un documento tecnico. La strategia di implementazione conduce allo studio preliminare di alcuni linguaggi controllati simili all’inglese STE, quali il Français Rationalisé e il Simplified Technical Spanish. Attraverso un approccio caratteristico della linguistica dei corpora, la soluzione proposta fornisce il nuovo linguaggio controllato mediante l’estrazione di informazioni specifiche da un corpus ad-hoc di lingua spagnola appositamente creato ed interrogato. I risultati evidenziano un metodo linguistico (controllato) utile a produrre documentazione tecnica priva di ogni eventuale ambiguità. Il sistema ETS, infatti, si fonda sul concetto della intelligibilità in quanto condizione necessaria da soddisfare nell’ambito della produzione di un testo controllato. E, attraverso la sua macrostruttura, il documento ETS fornisce gli strumenti necessari per rendere il testo controllato univoco. Infatti, tale struttura bipartita suddivide in maniera logica i dettami: una prima parte riguarda e contiene regole sintattiche e stilistiche; una seconda parte riguarda e contiene un dizionario di un numero limitato di lemmi opportunamente selezionati. Il tutto a favore del principio della biunivocità dei segni, in questo caso, della lingua spagnola. Il progetto, nel suo insieme, apre le porte ad un linguaggio nuovo in alternativa a quelli presenti, totalmente creato in accademia, che vale come prototipo a cui far seguire altri progetti di ricerca.
Resumo:
Contacts between languages have always led to mutual influence. Today, the position of authority of the English language affects Italian in many ways, especially in the scientific and technical fields. When new studies conceived in the English-speaking world reach the Italian public, we are faced not only with the translation of texts, but most importantly the rendition of theoretical constructs that do not always have a suitable rendering in the target language. That is why we often find anglicisms in Italian texts. This work aims to show their frequency in a specific field, underlying how and when they are used, and sometimes preferred to the Italian corresponding word. This dissertation looks at a sample of essays from the specialised magazine “Lavoro Sociale”, published by Edizioni Centro Studi Erickson, searching for borrowings from English and discussing their use in order to make hypotheses on the reasons of this phenomenon, against the wider background of translation studies and translation universals research. What I am more interested in is the understanding of the similarities and differences in the use of anglicisms by authors of Italian texts and translators from English into Italian, so that I can figure out what the main dynamics and tendencies are. The whole paper is has four parts. Chapter 1 briefly explains the theoretical background on translation studies, and introduces and discusses the notion of translation universals. After that, the research methodology and theoretical background on linguistic borrowings (especially anglicisms) in Italian are summarized. Chapter 2 presents the study, explaining the organisation of the material, the methodology used and the object of interest. Chapter 3 is the core of the dissertation because it contains the qualitative and quantitative data taken from the texts and the examination of the dynamics of the use of anglicisms. Finally, Chapter 4 compares the conclusions drawn from the previous chapter with the opinions of authors, translators and proof-readers, whom I asked to answer a questionnaire written specifically to investigate the mechanisms and choices behind their use of anglicisms.
Le collocazioni in traduzione e interpretazione tra italiano e francese: Uno studio su eptic_01_2011
Resumo:
This dissertation aims at investigating differences in phraseological patterns in translated and interpreted language, on the basis of the intermodal corpus EPTIC_01_2011 and focusing on Italian and French. First of all, an overview is offered of the main studies and theories about corpus linguistics and collocations: the notion of corpus is defined and a typology (focusing on intermodal corpora) is presented, before moving on to the linguistic phenomenon of collocation and its investigation through corpus linguistics methods. Second, the general structure of EPTIC_01_2011 is presented, including the ways in which its texts have been assembled, edited through ad hoc conventions and enriched with metadata. The methodology proposed by Durrant and Schmitt (2009), slightly edited to fit the present study, has been used to extract and compare noun+adjective/adjective+noun bigrams from a quantitative point of view. A subset of these data have then been extracted and analysed manually. The results of the study are presented through graphs and examples, with an in-depth discussion of the bigrams considered. Lastly, the data collected are analysed and categorised in terms of shifts occurring in translation and in interpreting, potential causes are discussed and ideas for further research and for the development of the EPTIC corpus are sketched.