867 resultados para corpus, collocations, corpus linguistics, EPTIC
Resumo:
Little is known about Ancient Arabia before the arrival of Islam as it was an area with few inhabited settlements and it was mostly a passageway for traders. In those inhabited settlements we could find some settled Arabs, but the prevailing life style was that of the rest of the population, nomadic Bedouin Arabs who travelled from place to place looking for water and pasture for their cattle, which they lived off. The desert was their natural habitat, a hostile environment full of danger where life was not easy. Camel taming made it possible for them to live that nomadic lifestyle, and the Bedouins became inseparable from their camels and from their horses and cattle. In order to make a living they worked as hunters, transported caravans, and plundered too. In the pre-Islamic era, knowledge was transmitted by oral communication, so very little written information about that time and place remains. One thing that has been handed down are proverbs, which after the 8th Century started to be collected by several writers in various written works. Given the characteristics of those proverbs, which are conserved almost intact from their origins, we can learn much about the lifestyle in Ancient Arabia. What is to be investigated within this thesis is whether through Paremiology it is possible to learn more about this area at this historic moment that precedes the arrival of Islam, and the first years of this religion. To learn about history, we usually rely on historians and palaeontologists, but this work will demonstrate that through Paremiology it is possible to know other aspects of culture, their knowledge, the way of life, thinking, society, etc...
Resumo:
A distinct metonymic pattern was discovered in the course of conducting a corpus-based study of figurative uses of WORD. The pattern involved examples such as Not one word of it made any sense and I agree with every word. It was labelled ‘hyperbolic synecdoche’, defined as a case in which a lexeme which typically refers to part of an entity (a) is used to stand for the whole entity and (b) is described with reference to the end point on a scale. Specifically, the speaker/writer selects the perspective of a lower-level unit (such as word for ‘utterance’), which is quantified as NOTHING or ALL, thus forming a subset of ‘extreme case formulations’. Hyperbolic synecdoche was found to exhibit a restricted range of lexicogrammatical patterns involving word, with the negated NOTHING patterns being considerably more common than the ALL patterns. The phenomenon was shown to be common in metonymic uses in general, constituting one-fifth of all cases of metonymy in word. The examples of hyperbolic synecdoche were found not to be covered by the oftquoted ‘abbreviation’ rationale for metonymy; instead, they represent a more roundabout way of expression. It is shown that other cases of hyperbolic synecdoche exist outside of word and the domain of communication (such as ‘time’ and ‘money’).
Resumo:
In this paper we analyse a 600,000 word corpus comprised of policy statements produced within supranational, national, state and local legislatures about the nature and causes of(un)employment. We identify significant rhetorical and discursive features deployed by third sector (un)employment policy authors that function to extend their legislative grasp to encompass the most intimate aspects of human association.
Resumo:
In this article I outline and demonstrate a synthesis of the methods developed by Lemke (1998) and Martin (2000) for analyzing evaluations in English. I demonstrate the synthesis using examples from a 1.3-million-word technology policy corpus drawn from institutions at the local, state, national, and supranational levels. Lemke's (1998) critical model is organized around the broad 'evaluative dimensions' that are deployed to evaluate propositions and proposals in English. Martin's (2000) model is organized with a more overtly systemic-functional orientation around the concept of 'encoded feeling'. In applying both these models at different times, whilst recognizing their individual usefulness and complementarity, I found specific limitations that led me to work towards a synthesis of the two approaches. I also argue for the need to consider genre, media, and institutional aspects more explicitly when claiming intertextual and heteroglossic relations as the basis for inferred evaluations. A basic assertion made in this article is that the perceived Desirability of a process, person, circumstance, or thing is identical to its 'value'. But the Desirability of anything is a socially and thus historically conditioned attribution that requires significant amounts of institutional inculcation of other 'types' of value-appropriateness, importance, beauty, power, and so on. I therefore propose a method informed by critical discourse analysis (CDA) that sees evaluation as happening on at least four interdependent levels of abstraction.
Resumo:
In this paper we argue that the term “capitalism” is no longer useful for understanding the current system of political economic relations in which we live. Rather, we argue that the system can be more usefully characterised as neofeudal corporatism. Using examples drawn from a 300,000 word corpus of public utterances by three political leaders from the “coalition of the willing”— George W. Bush, Tony Blair, and John Howard—we show some defining characteristics of this relatively new system and how they are manifest in political language about the invasion of Iraq.
Resumo:
In computational linguistics, information retrieval and applied cognition, words and concepts are often represented as vectors in high dimensional spaces computed from a corpus of text. These high dimensional spaces are often referred to as Semantic Spaces. We describe a novel and efficient approach to computing these semantic spaces via the use of complex valued vector representations. We report on the practical implementation of the proposed method and some associated experiments. We also briefly discuss how the proposed system relates to previous theoretical work in Information Retrieval and Quantum Mechanics and how the notions of probability, logic and geometry are integrated within a single Hilbert space representation. In this sense the proposed system has more general application and gives rise to a variety of opportunities for future research.
Resumo:
Models of word meaning, built from a corpus of text, have demonstrated success in emulating human performance on a number of cognitive tasks. Many of these models use geometric representations of words to store semantic associations between words. Often word order information is not captured in these models. The lack of structural information used by these models has been raised as a weakness when performing cognitive tasks. This paper presents an efficient tensor based approach to modelling word meaning that builds on recent attempts to encode word order information, while providing flexible methods for extracting task specific semantic information.
Resumo:
This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.
Resumo:
This paper outlines a novel approach for modelling semantic relationships within medical documents. Medical terminologies contain a rich source of semantic information critical to a number of techniques in medical informatics, including medical information retrieval. Recent research suggests that corpus-driven approaches are effective at automatically capturing semantic similarities between medical concepts, thus making them an attractive option for accessing semantic information. Most previous corpus-driven methods only considered syntagmatic associations. In this paper, we adapt a recent approach that explicitly models both syntagmatic and paradigmatic associations. We show that the implicit similarity between certain medical concepts can only be modelled using paradigmatic associations. In addition, the inclusion of both types of associations overcomes the sensitivity to the training corpus experienced by previous approaches, making our method both more effective and more robust. This finding may have implications for researchers in the area of medical information retrieval.
Resumo:
A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
Resumo:
This paper presents our system to address the CogALex-IV 2014 shared task of identifying a single word most semantically related to a group of 5 words (queries). Our system uses an implementation of a neural language model and identifies the answer word by finding the most semantically similar word representation to the sum of the query representations. It is a fully unsupervised system which learns on around 20% of the UkWaC corpus. It correctly identifies 85 exact correct targets out of 2,000 queries, 285 approximate targets in lists of 5 suggestions.
Resumo:
This article begins with the premise that morality is an intrinsic, although often invisible, aspect of everyday social action. Drawn from a corpus of fifty audiorecorded telephone calls to Kids Helpline, an Australian helpline for children and young people, we examine one call to show how the young caller and counsellor co-construct ‘morality-in-action’. Ethnomethodological understandings and, in particular, Sacks’ (1992) description of ‘Class 2’ rules and infractions show how an adolescent caller and counsellor collaboratively assemble moral versions of the caller. In puzzling out possible motives, the caller and counsellor can be seen to be attending to the implications of different moral versions of the caller. This attribution of motives is moral work in action, with motives contingently assembled, displayed and evaluated, with such work understood as displays of moral reasoning. The counselling call makes visible the counsellor’s interactional work to support and empower the client. Analysis such as this offers counsellors ways of understanding and making visible their interactional and moral work within helpline call interactions.
Resumo:
Valency Realization in Short Excerpts of News Text. A Pragmatics-funded analysis This dissertation is a study of the so-called pragmatic valency. The aim of the study is to examine the phenomenon both theoretically by discussing the research literature and empirically based on evidence from a text corpus consisting of 218 short excerpts of news text from the German newspaper Frankfurter Allgemeine Zeitung. In the theoretical part of the study, the central concepts of the valency and the pragmatic valency are discussed. In the research literature, the valency denotes the relation among the verb and its obligatory and optional complements. The pragmatic valency can be defined as modification of the so-called system valency in the parole, including non-realization of an obligatory complement, non- realization of an optional complement and realization of an optional complement. Furthermore, the investigation of the pragmatic valency includes the role of the adjuncts, elements that are not defined by the valency, in the concrete valency realization. The corpus study investigates the valency behaviour of German verbs in a corpus of about 1500 sentences combining the methodology and concepts of valency theory, semantics and text linguistics. The analysis is focused on the about 600 sentences which show deviations from the system valency, providing over 800 examples for the modification of the system valency as codified in the (valency) dictionaries. The study attempts to answer the following primary question: Why is the system valency modified in the parole? To answer the question, the concept of modification types is entered. The modification types are recognized using distinctive feature bundles in which each feature with a negative or a positive value refers to one reason for the modification treated in the research literature. For example, the features of irrelevance and relevance, focus, world and text type knowledge, text theme, theme-rheme structure and cohesive chains are applied. The valency approach appears in a new light when explored through corpus-based investigation; both the optionality of complements and the distinction between complements and adjuncts as defined in the present valency approach seem in some respects defective. Furthermore, the analysis indicates that the adjuncts outside the valency domain play a central role in the concrete realization of the valency. Finally, the study suggests a definition of pragmatic valency, based on the modification types introduced in the study and tested in the corpus analysis.
Resumo:
This dissertation discusses the relation between lexis, grammar and textual organisation. The major premise adopted here is that grammatical structures are motivated both by semantic potential of words and by text-pragmatic demands. In other words, it is argued that grammatical structures form the interface between lexis and textual organisation, and that linguistic analysis should not concentrate on analysing grammatical structures in isolation, independent of context. From this point of view, grammatical structures are said to be 'well-formed' only in relation to the context they occur in. This study is based on a corpus of three million words of recent Finnish fiction from which all the occurrences of the coordinated verb pairs ([V ja V] -pairs]) containing one of the intransitive motion verbs 'lähteä' (to go), 'mennä' (to go), 'päästä' (to get into), 'nousta' (to get up), and 'laskea' (to go down), were extracted. This set of verbs was established using methods described in earlier work by Lagus & Airola (2001, and 2005). The quantitative analysis of the [V ja V] -pairs was used to carry out a qualitative analysis of individual texts. In analysing the texts, an analogy was made between musical and textual structure. The results show among others that individual verbs specialise in different functions when occurring in coordinated verb pairs. One aspect was that those verb pairs including the verb 'nousta' tend to function as markers of textual boundaries and thus reflect the organisation of narrative substance. The verb 'mennä' has weakened literal meanings, but strengthened modal meanings when occurring in [V ja V] -pairs, and, in many cases, the verb 'lähteä' in [V ja V] -pairs function as an aspectual marker rather than a pure verb of motion. That there is a gradient from the concrete sense of motion into more differentiated senses of a verb in [V ja V] -pairs alongside the structure-creating potential of the [V ja V] -pairs themselves suggest an ongoing grammaticalisation process of the patterns discussed.