974 resultados para Corpus Linguistic


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Based on a corpus of English, German, and Polish spoken academic discourse, this article analyzes the distribution and function of humor in academic research presentations. The corpus is the result of a European research cooperation project consisting of 300,000 tokens of spoken academic language, focusing on the genres research presentation, student presentation, and oral examination. The article investigates difference between the German and English research cultures as expressed in the genre of specialist research presentations, and the role of humor as a pragmatic device in their respective contexts. The data is analyzed according to the paradigms of corpus-assisted discourse studies (CADS). The findings show that humor is used in research presentations as an expression of discourse reflexivity. They also reveal a considerable difference in the quantitative distribution of humor in research presentations depending on the educational, linguistic, and cultural background of the presenters, thus confirming the notion of different research cultures. Such research cultures nurture distinct attitudes to genres of academic language: whereas in one of the cultures identified researchers conform with the constraints and structures of the genre, those working in another attempt to subvert them, for example by the application of humor. © 2012 Elsevier B.V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The focus of this paper is on the doctoral research training experienced by one of the authors and the ways in which the diverse linguistic and disciplinary perspectives of her two supervisors (co-authors of this paper) mediated the completion of her study. The doctoral candidate is a professional translator/interpreter and translation teacher. The paper describes why and how she identified her research area and then focused on the major research questions in collaboration with her two supervisors, who brought their differing perspectives from the field of linguistics to this translation research, even though they are not translators by profession or disciplinary background and do not speak Korean. In addition, the discussion considers the focus, purpose and theoretical orientation of the research itself (which addressed questions of readability in translated English-Korean texts through detailed analysis of a corpus and implications for professional translator training) as well as the supervisory and conceptual processes and practices involved. The authors contend that doctoral research of this kind can be seen as a mutual learning process and that inter-disciplinary research can make a contribution not only to the development of rigorous research in the field of translation studies but also to the other disciplinary fields involved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’. The corpus represents his concern with the nature of linguistic evidence. Lexicography is for him the canonical mode of language description at the lexical level. And his belief that the corpus should ‘drive’ the description is reflected in his constant attempts to utilize the emergent computer technologies to automate the initial stages of analysis and defer the intuitive, interpretative contributions of linguists to increasingly later stages in the process. Sinclair’s model of corpus-driven lexicography has spread far beyond its initial implementation at Cobuild, to most EFL dictionaries, to native-speaker dictionaries (e.g. the New Oxford Dictionary of English, and many national language dictionaries in emerging or re-emerging speech communities) and bilingual dictionaries (e.g. Collins, Oxford-Hachette).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The concept of plagiarism is not uncommonly associated with the concept of intellectual property, both for historical and legal reasons: the approach to the ownership of ‘moral’, nonmaterial goods has evolved to the right to individual property, and consequently a need was raised to establish a legal framework to cope with the infringement of those rights. The solution to plagiarism therefore falls most often under two categories: ethical and legal. On the ethical side, education and intercultural studies have addressed plagiarism critically, not only as a means to improve academic ethics policies (PlagiarismAdvice.org, 2008), but mainly to demonstrate that if anything the concept of plagiarism is far from being universal (Howard & Robillard, 2008). Even if differently, Howard (1995) and Scollon (1994, 1995) argued, and Angèlil-Carter (2000) and Pecorari (2008) later emphasised that the concept of plagiarism cannot be studied on the grounds that one definition is clearly understandable by everyone. Scollon (1994, 1995), for example, claimed that authorship attribution is particularly a problem in non-native writing in English, and so did Pecorari (2008) in her comprehensive analysis of academic plagiarism. If among higher education students plagiarism is often a problem of literacy, with prior, conflicting social discourses that may interfere with academic discourse, as Angèlil-Carter (2000) demonstrates, we then have to aver that a distinction should be made between intentional and inadvertent plagiarism: plagiarism should be prosecuted when intentional, but if it is part of the learning process and results from the plagiarist’s unfamiliarity with the text or topic it should be considered ‘positive plagiarism’ (Howard, 1995: 796) and hence not an offense. Determining the intention behind the instances of plagiarism therefore determines the nature of the disciplinary action adopted. Unfortunately, in order to demonstrate the intention to deceive and charge students with accusations of plagiarism, teachers necessarily have to position themselves as ‘plagiarism police’, although it has been argued otherwise (Robillard, 2008). Practice demonstrates that in their daily activities teachers will find themselves being required a command of investigative skills and tools that they most often lack. We thus claim that the ‘intention to deceive’ cannot inevitably be dissociated from plagiarism as a legal issue, even if Garner (2009) asserts that generally plagiarism is immoral but not illegal, and Goldstein (2003) makes the same severance. However, these claims, and the claim that only cases of copyright infringement tend to go to court, have recently been challenged, mainly by forensic linguists, who have been actively involved in cases of plagiarism. Turell (2008), for instance, demonstrated that plagiarism is often connoted with an illegal appropriation of ideas. Previously, she (Turell, 2004) had demonstrated by comparison of four translations of Shakespeare’s Julius Caesar to Spanish that the use of linguistic evidence is able to demonstrate instances of plagiarism. This challenge is also reinforced by practice in international organisations, such as the IEEE, to whom plagiarism potentially has ‘severe ethical and legal consequences’ (IEEE, 2006: 57). What plagiarism definitions used by publishers and organisations have in common – and which the academia usually lacks – is their focus on the legal nature. We speculate that this is due to the relation they intentionally establish with copyright laws, whereas in education the focus tends to shift from the legal to the ethical aspects. However, the number of plagiarism cases taken to court is very small, and jurisprudence is still being developed on the topic. In countries within the Civil Law tradition, Turell (2008) claims, (forensic) linguists are seldom called upon as expert witnesses in cases of plagiarism, either because plagiarists are rarely taken to court or because there is little tradition of accepting linguistic evidence. In spite of the investigative and evidential potential of forensic linguistics to demonstrate the plagiarist’s intention or otherwise, this potential is restricted by the ability to identify a text as being suspect of plagiarism. In an era with such a massive textual production, ‘policing’ plagiarism thus becomes an extraordinarily difficult task without the assistance of plagiarism detection systems. Although plagiarism detection has attracted the attention of computer engineers and software developers for years, a lot of research is still needed. Given the investigative nature of academic plagiarism, plagiarism detection has of necessity to consider not only concepts of education and computational linguistics, but also forensic linguistics. Especially, if intended to counter claims of being a ‘simplistic response’ (Robillard & Howard, 2008). In this paper, we use a corpus of essays written by university students who were accused of plagiarism, to demonstrate that a forensic linguistic analysis of improper paraphrasing in suspect texts has the potential to identify and provide evidence of intention. A linguistic analysis of the corpus texts shows that the plagiarist acts on the paradigmatic axis to replace relevant lexical items with a related word from the same semantic field, i.e. a synonym, a subordinate, a superordinate, etc. In other words, relevant lexical items were replaced with related, but not identical, ones. Additionally, the analysis demonstrates that the word order is often changed intentionally to disguise the borrowing. On the other hand, the linguistic analysis of linking and explanatory verbs (i.e. referencing verbs) and prepositions shows that these have the potential to discriminate instances of ‘patchwriting’ and instances of plagiarism. This research demonstrates that the referencing verbs are borrowed from the original in an attempt to construct the new text cohesively when the plagiarism is inadvertent, and that the plagiarist has made an effort to prevent the reader from identifying the text as plagiarism, when it is intentional. In some of these cases, the referencing elements prove being able to identify direct quotations and thus ‘betray’ and denounce plagiarism. Finally, we demonstrate that a forensic linguistic analysis of these verbs is critical to allow detection software to identify them as proper paraphrasing and not – mistakenly and simplistically – as plagiarism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research focuses on Native Language Identification (NLID), and in particular, on the linguistic identifiers of L1 Persian speakers writing in English. This project comprises three sub-studies; the first study devises a coding system to account for interlingual features present in a corpus of L1 Persian speakers blogging in English, and a corpus of L1 English blogs. Study One then demonstrates that it is possible to use interlingual identifiers to distinguish authorship by L1 Persian speakers. Study Two examines the coding system in relation to the L1 Persian corpus and a corpus of L1 Azeri and L1 Pashto speakers. The findings of this section indicate that the NLID method and features designed are able to discriminate between L1 influences from different languages. Study Three focuses on elicited data, in which participants were tasked with disguising their language to appear as L1 Persian speakers writing in English. This study indicated that there was a significant difference between the features in the L1 Persian corpus, and the corpus of disguise texts. The findings of this research indicate that NLID and the coding system devised have a very strong potential to aid forensic authorship analysis in investigative situations. Unlike existing research, this project focuses predominantly on blogs, as opposed to student data, making the findings more appropriate to forensic casework data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Corpora—large collections of written and/or spoken text stored and accessed electronically—provide the means of investigating language that is of growing importance academically and professionally. Corpora are now routinely used in the following fields: The production of dictionaries and other reference materials; The development of aids to translation; Language teaching materials; The investigation of ideologies and cultural assumptions; Natural language processing; and The investigation of all aspects of linguistic behaviour, including vocabulary, grammar and pragmatics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates whether the position of adverb phrases in sentences is regionally patterned in written Standard American English, based on an analysis of a 25 million word corpus of letters to the editor representing the language of 200 cities from across the United States. Seven measures of adverb position were tested for regional patterns using the global spatial autocorrelation statistic Moran’s I and the local spatial autocorrelation statistic Getis-Ord Gi*. Three of these seven measures were indentified as exhibiting significant levels of spatial autocorrelation, contrasting the language of the Northeast with language of the Southeast and the South Central states. These results demonstrate that continuous regional grammatical variation exists in American English and that regional linguistic variation exists in written Standard English.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study uses a purpose-built corpus to explore the linguistic legacy of Britain’s maritime history found in the form of hundreds of specialised ‘Maritime Expressions’ (MEs), such as TAKEN ABACK, ANCHOR and ALOOF, that permeate modern English. Selecting just those expressions commencing with ’A’, it analyses 61 MEs in detail and describes the processes by which these technical expressions, from a highly specialised occupational discourse community, have made their way into modern English. The Maritime Text Corpus (MTC) comprises 8.8 million words, encompassing a range of text types and registers, selected to provide a cross-section of ‘maritime’ writing. It is analysed using WordSmith analytical software (Scott, 2010), with the 100 million-word British National Corpus (BNC) as a reference corpus. Using the MTC, a list of keywords of specific salience within the maritime discourse has been compiled and, using frequency data, concordances and collocations, these MEs are described in detail and their use and form in the MTC and the BNC is compared. The study examines the transformation from ME to figurative use in the general discourse, in terms of form and metaphoricity. MEs are classified according to their metaphorical strength and their transference from maritime usage into new registers and domains such as those of business, politics, sports and reportage etc. A revised model of metaphoricity is developed and a new category of figurative expression, the ‘resonator’, is proposed. Additionally, developing the work of Lakov and Johnson, Kovesces and others on Conceptual Metaphor Theory (CMT), a number of Maritime Conceptual Metaphors are identified and their cultural significance is discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper presents our considerations related to the creation of a digital corpus of Bulgarian dialects. The dialectological archive of Bulgarian language consists of more than 250 audio tapes. All tapes were recorded between 1955 and 1965 in the course of regular dialectological expeditions throughout the country. The records typically contain interviews with inhabitants of small villages in Bulgaria. The topics covered are usually related to such issues as birth, everyday life, marriage, family relationship, death, etc. Only a few tapes contain folk songs from different regions of the country. Taking into account the progressive deterioration of the magnetic media and the realistic prospects of data loss, the Institute for Bulgarian Language at the Academy of Sciences launched in 1997 a project aiming at restoration and digital preservation of the dialectological archive. Within the framework of this project more than the half of the records was digitized, de-noised and stored on digital recording media. Since then restoration and digitization activities are done in the Institute on a regular basis. As a result a large collection of sound files has been gathered. Our further efforts are aimed at the creation of a digital corpus of Bulgarian dialects, which will be made available for phonological and linguistic research. Such corpora typically include besides the sound files two basic elements: a transcription, aligned with the sound file, and a set of standardized metadata that defines the corpus. In our work we will present considerations on how these tasks could be realized in the case of the corpus of Bulgarian dialects. Our suggestions will be based on a comparative analysis of existing methods and techniques to build such corpora, and by selecting the ones that fit closer to the particular needs. Our experience can be used in similar institutions storing folklore archives, history related spoken records etc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, I concentrate on court cases with litigants in person (lay people who act on their own behalf in legal proceedings without a counsel or solicitor) and discuss the challenges of building a corpus of courtroom discourse where it is crucial to distinguish between speakers due to their distinct institutional roles. The corpus incorporates seven sub-corpora of verbatim transcripts from different court cases with litigants in person and comprises over eleven-million tokens. The focus of this paper is on the interplay between the legal and lay discourse types and how judges project their institutional roles through well-initiated turns directed at litigants in person and counsels. As a versatile discourse marker, well provides a good opportunity to explore how judges have to adapt their roles to ensure lay litigants in person receive the necessary support and that their lack of competence does not impede on the fairness of the proceedings. Given the breadth and importance of the topic of litigation in person, I discuss how the tools and approaches of corpus linguistics can be helpful in this multi-disciplinary area where multiple functions and uses of individual linguistic features need to be explored in depth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While language use has been argued to reflect gender asymmetry, increasing parity has been evidenced in official settings (Holmes, 2000; Dister and Moreau, 2006). Our hypothesis is that the French national press has developed a norm of equal linguistic treatment of men and women. In a corpus of articles from Libération, Le Monde, and Le Figaro, we examine the treatment of Arlette Laguiller, the female leader of the French extreme-left 'Worker's Struggle' Party (Lutte Ouvrière), during the run-up to the 2007 presidential elections. How Laguiller is referred to and described in comparison with her male counterparts evidences no asymmetry. Breaches to parity are only found in the right-wing Figaro newspaper. The ideological distance between the newspaper and the candidate suggests that power struggles are a primary source of asymmetrical treatments. The discursive functions of such treatments can be understood through an investigation based on a portable corpus linguistics methodology for the measure of discrimination. © 2011 Elsevier B.V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article I argue that the study of the linguistic aspects of epistemology has become unhelpfully focused on the corpus-based study of hedging and that a corpus-driven approach can help to improve upon this. Through focusing on a corpus of texts from one discourse community (that of genetics) and identifying frequent tri-lexical clusters containing highly frequent lexical items identified as keywords, I undertake an inductive analysis identifying patterns of epistemic significance. Several of these patterns are shown to be hedging devices and the whole corpus frequencies of the most salient of these, candidate and putative, are then compared to the whole corpus frequencies for comparable wordforms and clusters of epistemic significance. Finally I interviewed a ‘friendly geneticist’ in order to check my interpretation of some of the terms used and to get an expert interpretation of the overall findings. In summary I argue that the highly unexpected patterns of hedging found in genetics demonstrate the value of adopting a corpus-driven approach and constitute an advance in our current understanding of how to approach the relationship between language and epistemology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research investigated the nasality of vowels in the spontaneous speech of inhabitants of the quilombola communities of Brejo dos Crioulos and Poções (MG). As a theoretical framework, we based on the assumptions of Phonetics and Phonology, in renowned scholars on the investigation of nasality (CAGLIARI, 1977; CÂMARA JR., 1984, 2013; BISOL, 2013; ABAURRE; PAGOTTO, 1996; SILVA, 2015), with subsidies of the Corpus Linguistics. Its general goal was to investigate the occurrence of nasality, in the dialect of these quilombola communities, and their linguistic behavior, considering the linguistic factors that can interfere in the phenomenon. Specifically it was aimed to a) detect the occurrence of nasalized vowels with the help of the resources that the Corpus Linguistics provides (Praat and WorldSmith Tolls); b) discriminate the different types of occurring contexts of nasalized vowels; c) make quantitative and qualitative analyzes of the nasalized vowels in the study corpus; d) describe and analyze the behavior of nasalized vowels and; e) contrast the values of F1 and F2 of the oral and nasalized vowels. It was hypothesized that the nasality happens because it is conditioned by the nasal segment following the nasalized vowel - phonological process of “assimilation” - its position as the primary stress and grammatical category. It was believed that the quilombolas communities of Brejo dos Crioulos and Poções produce nasalized vowels in their speech and this linguistic phenomenon is favored by the adjacent presence of consonants or nasal vowels. Furthermore, it was hypothesized that the values of F1 and F2 of oral and nasalized vowels in these communities are distinct. The following research questions were elaborated: (i) is the presence of nasalized vowels in the speech of these quilombola communities conditioned to the presence of a nasal sound segment? (ii) does the nasal sound segment following the nasalized vowel favor the occurrence of the nasality phenomenon? is there a difference between the values of F1 and F2 of the oral and nasalized vowels in both quilombola communities considered? To compose our corpus, 24 interviews recordings were used (12 female speakers and 12 male speakers), a total of 24 participants. It was found that the following nasal sound segment tends to condition the nasalized vowel. In general, it assimilates the lowering of the soft palate of nasal consonant segment immediately following, but there are cases of nasal vowel segment - regressive assimilation; the stressed syllable tends to favor the nasality, but it occurs in pretonic and postonic position as well; F1 and F2 values of oral and nasalized vowels in the quilombola communities of Poções and Brejo dos Crioulos are distinct: the group of Brejo dos Crioulos tends to produce the F1 of oral and nasalized vowels more lowered than the group of Poções and the F2, in a more anterior position. The nasality tends to occur in verbs and nouns, although it is not specific to a grammatical category. This research found cases of spurious nasalization, confirming previous studies. In turn, it revealed cases of lexical items with favorable context for nasalization, but with its non-occurrence. This last case, considered as the lowering of the uniform soft palate in PB, presented pronounced vowels without the soft palate lowering. That is, it was detected variation in the phenomenon of nasalization in PB. With this work, it was promoted the discussion about nasality, in order to contribute to the linguistic studies about the functioning of Brazilian Portuguese in this geographical context.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El estudio de las combinaciones léxicas según su grado de fijación y su distinción en combinaciones libres, colocaciones y locuciones ha sido realizado desde la perspectiva sincrónica. Planteamos la posibilidad de aplicar las pautas para distinguir estos tipos de estructuras en materiales de tipo diacrónico. Concretamente, nos basamos en los documentos que componen el Corpus del Español del Reino de Granada (CORDEREGRA) para valorar los materiales de este corpus histórico-lingüístico y comprobar si los criterios sincrónicos se pueden aplicar al estudio de documentos de otros siglos.