819 resultados para terminologia finanziaria, variazione linguistica, analisi corpus-based
Resumo:
Il punto di partenza del presente lavoro di ricerca terminologica è stato il soggiorno formativo presso la Direzione generale della Traduzione (DGT) della Commissione Europea di Lussemburgo. Il progetto di tirocinio, ovvero l’aggiornamento e la revisione di schede IATE afferenti al dominio finanziario, e gli aspetti problematici riscontrati durante la compilazione di tali schede hanno portato alla definizione della presente tesi. Lo studio si prefigge di analizzare la ricezione della terminologia precipua della regolamentazione di Basilea 3, esaminando il fenomeno della variazione linguistica in corpora italiani e tedeschi. Nel primo capitolo si descrive brevemente l’esperienza di tirocinio svolto presso la DGT, si presenta la banca dati IATE, l’attività terminologica eseguita e si illustrano le considerazioni che hanno portato allo sviluppo del progetto di tesi. Nel secondo capitolo si approfondisce il dominio investigato, descrivendo a grandi linee la crisi finanziaria che ha portato alla redazione della nuova normativa di Basilea 3, e si presentano i punti fondamentali degli Accordi di Basilea 3. Il terzo capitolo offre una panoramica sulle caratteristiche del linguaggio economico-finanziario e sulle conseguenze della nuova regolamentazione dal punto di vista linguistico, sottolineando le peculiarità della terminologia analizzata. Nel quarto capitolo si descrivono la metodologia seguita e le risorse utilizzate per il progetto di tesi, ovvero corpora ad hoc in lingua italiana e tedesca per l’analisi dei termini e le relative schede terminologiche. Il quinto capitolo si concentra sul fenomeno della variazione linguistica, fornendo un quadro teorico dei diversi approcci alla terminologia, cui segue l’analisi dei corpora e il commento dei risultati ottenuti; si considerano quindi le riflessioni teoriche alla luce di quanto emerso dalla disamina dei corpora. Infine, nell'appendice sono riportate le schede terminologiche IATE compilate durante il periodo di tirocinio e le schede terminologiche redatte a seguito dell’analisi del presente elaborato.
Resumo:
La crisis financiera y económica que estalló en Europa en 2008 ha sido un auténtico campo de pruebas para recurrir a todo tipo de recursos expresivos, a la hora de explicar la compleja naturaleza de la coyuntura económica. De esta manera, lejos de ser un artificio de la ornamentación retórica, la metáfora adquiere un sentido simbólico que permite expresar conceptos de forma natural y a veces descontada. El objetivo de este estudio es analizar la utilización de las metáforas en la prensa económica española e italiana en el discurso sobre la crisis en Europa, a través de un análisis basado en corpus. Además, se intenta descubrir semejanzas y diferencias a la vez que se destacan aspectos originales en ambos idiomas. La hipótesis general, de acuerdo con investigaciones previas sobre las metáforas económicas, es que cabe esperar una frecuente utilización de las metáforas. La tésis está dividida en cuatro capítulos. En el primer capítulo se desglosan las principales teorías elaboradas acerca de la metáfora, desde un enfoque clásico hasta una visión cognitivista que se desarrolla a partir de la Teoría de la Metáfora Conceptual propuesta por Lakoff y Johnson (1980). El segundo capítulo se centra en la metáfora económica y los estudios basados en corpus sobre la metáfora, para comprender los diferentes dominios fuente utilizados en el discurso económico y financiero. En el tercer capítulo se proporciona toda información relativa al método de creación y a la metodología de análisis de los corpus, llevada a cabo a través de una minuciosa recolección de artículos de la prensa italiana y española en los diarios Il Sole 24 Ore y Expansión. Los artículos han sido analizados a través del programa informático AntConc. Finalmente, en el cuarto capítulo se presenta el análisis de las metáforas detectadas en los dos corpus periodísticos y los resultados obtenidos, en lo que respecta a semejanzas y diferencias en los dos idiomas. Por esta razón, se seleccionan cuatro dominios fuente: dos más frecuentes (medicina; naturaleza) y dos menos frecuentes (construcción; mecanismos). Además, se hace hincapié en algunos ejemplos del uso de metáforas muy utilizadas, o bien de casos novedosos o creativos en ambos idiomas, con miras a descubrir los principales dominios fuente y los contextos de uso en la cobertura de la crisis. Los resultados que surgen a raíz del análisis muestran una profunda difusión del dominio fuente de la medicina y de la naturaleza en ambos idiomas y una consiguiente creatividad metafórica. En general, el corpus italiano presenta una mayor creatividad metafórica y cadenas metafóricas 138 con respecto al corpus español, a raíz de los hapaxes detectados. Asimismo, el corpus español presenta más casos creativos en el dominio de la construcción con respecto al italiano. El análisis ha permitido confirmar de forma empírica la hipótesis de gran uniformidad en el uso de la metáfora en los distintos periódicos, aun considerando algunas diferencias entre los dos idiomas. Cabe reconocer que, debido al tamaño limitado de los corpus creados, dicho estudio representa un campo de pruebas ampliable y los resultados presentados abren una puerta a nuevas oportunidades de investigación en el discurso económico y financiero o incluso en la enseñanza de la economía.
Resumo:
This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.
Resumo:
In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.
Resumo:
The present study provides a usage-based account of how three grammatical structures, declarative content clauses, interrogative content clause and as-predicative constructions, are used in academic research articles. These structures may be used in both knowledge claims and citations, and they often express evaluative meanings. Using the methodology of quantitative corpus linguistics, I investigate how the culture of the academic discipline influences the way in which these constructions are used in research articles. The study compares the rates of occurrence of these grammatical structures and investigates their co-occurrence patterns in articles representing four different disciplines (medicine, physics, law, and literary criticism). The analysis is based on a purpose-built 2-million-word corpus, which has been part-of-speech tagged. The analysis demonstrates that the use of these grammatical structures varies between disciplines, and further shows that the differences observed in the corpus data are linked with differences in the nature of knowledge and the patterns of enquiry. The constructions in focus tend to be more frequently used in the soft disciplines, law and literary criticism, where their co-occurrence patterns are also more varied. This reflects both the greater variety of topics discussed in these disciplines, and the higher frequency of references to statements made by other researchers. Knowledge-building in the soft fields normally requires a careful contextualisation of the arguments, giving rise to statements reporting earlier research employing the constructions in focus. In contrast, knowledgebuilding in the hard fields is typically a cumulative process, based on agreed-upon methods of analysis. This characteristic is reflected in the structure and contents of research reports, which offer fewer opportunities for using these constructions.
Resumo:
Tämän pro gradu -lopputyön aiheena on englannin kielen modaalisten apuverbien ns. ydinjoukko: will, would, can, could, shall, should, may, might ja must. Semantiikan kannalta nämä apuverbit ovat erityisen kompleksisia: niiden tulkinnassa on usein huomattavaa monivivahteisuutta, vaikka perinteiset kieliopit antavat ymmärtää niillä olevan kaksi tai kolme toisistaan selkeästi erillään olevaa merkitystä. Ne asettavatkin vieraan kielen oppimisympäristössä erityisiä haasteita. Viimeaikainen kehitys korpuslingvistiikan metodeissa on tuottanut entistä tarkempia kuvauksia siitä, miten modaalisia apuverbejä nykyenglannissa käytetään ja mihin suuntaan niiden kehitys on lyhyenkin ajan sisällä kulkenut. Tämän tutkielman tavoitteena on ollut verrata näiden uusien tutkimusten tuloksia siihen todellisuuteen, jonka englannin kielen lukiotasoinen oppimateriaali Suomessa opiskelijalle tarjoaa. Lähdin siitä, että opetussuunnitelman vaatima autenttisuus ja kommunikaativisuus kieltenopetuksessa tulisi näkyä tasapuolisena modaalisten apuverbien kohteluna. Alkuperäinen hypoteesini kuitenkin oli, että siinä miten modaalisuus ilmenee autenttisessa ympäristössä ja siinä miten se esitetään oppikirjoissa, on poikkeavuuksia. Lähestymistapani tähän tutkielmaan oli korpuslähtöinen. Valitsin kahdesta lukion kirjasarjasta ne kirjat, joissa modaaliset apuverbit mainittiin eksplisiittisesti. Skannasin jokaisen neljästä eri kirjasta löytyvän (kokonaisen) tekstin ja rakensin näistä aineksista pienen korpuksen. Tästä korpuksesta hain korpusanalyyseihin tarkoitetulla ohjelmalla kaikki lauseet, joissa esiintyi modaalisia apuverbejä. Tämän jälkeen analysoin jokaisen modaalisen apuverbin semanttisesti lauseyhteydessään. Tämän analyysin tuloksena pystyin rakentamaan taulukoita ja vertailemaan tuloksia uusimpien tutkimusten tuloksiin. Tämän tutkielman perusteella poikkeavuuksia on olemassa. Yleisesti ottaen modaalisten apuverbien keskinäinen frekvenssi oli oikean suuntainen: mitään apuverbiä ei ollut käytetty merkittävästi enemmän tai vähemmän kuin mitä viimeaikaisen tutkimuksen valossa olisi suotavaa. Sen sijaan apuverbien semanttisessa jakaumassa oli paikoin suuriakin eroja siinä, mitkä merkitykset oppikirjoissa painottuivat ja mitkä taas nykyenglannissa vaikuttaisivat olevan frekvensseiltään suurempia. Erityisesti can ja must erottuivat joukosta siinä, että oppikirjojen tarjoama kuva niiden käytöstä on päinvastainen kuin mitä voisi odottaa: can-verbin käyttö painottui selvästi tarkoittamaan ’kykyä’ eikä ’mahdollisuutta’, joka nykytutkimuksen valossa on sen pääasiallinen käyttötapa. Toisaalta must tarkoitti aineistossa ylikorostuneesti ’pakkoa’, kun se useimmiten nykyään tarkoittaa yhtä usein ’johtopäätöstä’ kuin ’pakkoa’. Lisäksi ’lupaa’ pyydettiin aineistossa merkillisen harvoin. Tulosten perusteella esitän, että oppikirjojen tekijät yleisellä tasolla luopuisivat kielioppikirjojen luutuneista käsityksistä ja uskaltaisivat altistaa opiskelijat koko modaalisten apuverbien merkityskirjolle.
Resumo:
Temporal dynamics and speaker characteristics are two important features of speech that distinguish speech from noise. In this paper, we propose a method to maximally extract these two features of speech for speech enhancement. We demonstrate that this can reduce the requirement for prior information about the noise, which can be difficult to estimate for fast-varying noise. Given noisy speech, the new approach estimates clean speech by recognizing long segments of the clean speech as whole units. In the recognition, clean speech sentences, taken from a speech corpus, are used as examples. Matching segments are identified between the noisy sentence and the corpus sentences. The estimate is formed by using the longest matching segments found in the corpus sentences. Longer speech segments as whole units contain more distinct dynamics and richer speaker characteristics, and can be identified more accurately from noise than shorter speech segments. Therefore, estimation based on the longest recognized segments increases the noise immunity and hence the estimation accuracy. The new approach consists of a statistical model to represent up to sentence-long temporal dynamics in the corpus speech, and an algorithm to identify the longest matching segments between the noisy sentence and the corpus sentences. The algorithm is made more robust to noise uncertainty by introducing missing-feature based noise compensation into the corpus sentences. Experiments have been conducted on the TIMIT database for speech enhancement from various types of nonstationary noise including song, music, and crosstalk speech. The new approach has shown improved performance over conventional enhancement algorithms in both objective and subjective evaluations.
Resumo:
Computational models of meaning trained on naturally occurring text successfully model human performance on tasks involving simple similarity measures, but they characterize meaning in terms of undifferentiated bags of words or topical dimensions. This has led some to question their psychological plausibility (Murphy, 2002; Schunn, 1999). We present here a fully automatic method for extracting a structured and comprehensive set of concept descriptions directly from an English part-of-speech-tagged corpus. Concepts are characterized by weighted properties, enriched with concept-property types that approximate classical relations such as hypernymy and function. Our model outperforms comparable algorithms in cognitive tasks pertaining not only to concept-internal structures (discovering properties of concepts, grouping properties by property type) but also to inter-concept relations (clustering into superordinates), suggesting the empirical validity of the property-based approach. Copyright © 2009 Cognitive Science Society, Inc. All rights reserved.
Resumo:
This paper presents a new approach to single-channel speech enhancement involving both noise and channel distortion (i.e., convolutional noise). The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise. Third, we present an iterative algorithm for improved speech estimates. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement. Index Terms: corpus-based speech model, longest matching segment, speech enhancement, speech recognition
Resumo:
Dissertação de mest., Natural Language Processing & Human Language Technology, Faculdade de Ciências Humanas e Sociais, Univ. do Algarve, 2011