996 resultados para Corpora (Linguistics)


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Communication in Forensic Contexts provides in-depth coverage of the complex area of communication in forensic situations. Drawing on expertise from forensic psychology, linguistics and law enforcement worldwide, the text bridges the gap between these fields in a definitive guide to best practice. •Offers best practice for understanding and improving communication in forensic contexts, including interviewing of victims, witnesses and suspects, discourse in courtrooms, and discourse via interpreters •Bridges the knowledge gaps between forensic psychology, forensic linguistics and law enforcement, with chapters written by teams bringing together expertise from each field •Published in collaboration with the International Investigative Interviewing Research Group, dedicated to furthering evidence-based practice and practice-based research amongst researchers and practitioners •International, cross-disciplinary team includes contributors from North America, Europe and Asia Pacific, and from psychology, linguistics and forensic practice

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article uses a research project into the online conversations of sex offenders and the children they abuse to further the arguments for the acceptability of experimental work as a research tool for linguists. The research reported here contributes to the growing body of work within linguistics that has found experimental methods to be useful in answering questions about representation and constraints on linguistic expression (Hemforth 2013). The wider project examines online identity assumption in online paedophile activity and the policing of such activity, and involves dealing with the linguistic analysis of highly sensitive sexual grooming transcripts. Within the linguistics portion of the project, we examine theories of idiolect and identity through analysis of the ‘talk’ of perpetrators of online sexual abuse, and of the undercover officers that must assume alternative identities in order to investigate such crimes. The essential linguistic question in this article is methodological and concerns the applicability of experimental work to exploration of online identity and identity disguise. Although we touch on empirical questions, such as the sufficiency of linguistic description that will enable convincing identity disguise, we do not explore the experimental results in detail. In spite of the preference within a range of discourse analytical paradigms for ‘naturally occurring’ data, we argue that not only does the term prove conceptually problematic, but in certain contexts, and particularly in the applied forensic context described, a rejection of experimentally elicited data would limit the possible types and extent of analyses. Thus, it would restrict the contribution that academic linguistics can make in addressing a serious social problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article briefly reviews multilingual language resources for Bulgarian, developed in the frame of some international projects: the first-ever annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus, Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian corpus. These resources are valuable multilingual dataset for language engineering research and development for Bulgarian language. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus. The corpus is collected and developed as results of the collaboration in the frameworks of the joint research project between Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, and Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures. This bilingual corpus will be widely applicable to the contrastive studies of the both Slavic languages, will also be useful resource for language engineering research and development, especially in machine translation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper describes three software packages - the main components of a software system for processing and web-presentation of Bulgarian language resources – parallel corpora and bilingual dictionaries. The author briefly presents current versions of the core components “Dictionary” and “Corpus” as well as the recently developed component “Connection” that links both “Dictionary” and “Corpus”. The components main functionalities are described as well. Some examples of the usage of the system’s web-applications are included.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Relatively little research on dialect variation has been based on corpora of naturally occurring language. Instead, dialect variation has been studied based primarily on language elicited through questionnaires and interviews. Eliciting dialect data has several advantages, including allowing for dialectologists to select individual informants, control the communicative situation in which language is collected, elicit rare forms directly, and make high-quality audio recordings. Although far less common, a corpus-based approach to data collection also has several advantages, including allowing for dialectologists to collect large amounts of data from a large number of informants, observe dialect variation across a range of communicative situations, and analyze quantitative linguistic variation in large samples of natural language. Although both approaches allow for dialect variation to be observed, they provide different perspectives on language variation and change. The corpus- based approach to dialectology has therefore produced a number of new findings, many of which challenge traditional assumptions about the nature of dialect variation. Most important, this research has shown that dialect variation involves a wider range of linguistic variables and exists across a wider range of language varieties than has previously been assumed. The goal of this chapter is to introduce this emerging approach to dialectology. The first part of this chapter reviews the growing body of research that analyzes dialect variation in corpora, including research on variation across nations, regions, genders, ages, and classes, in both speech and writing, and from both a synchronic and diachronic perspective, with a focus on dialect variation in the English language. Although collections of language data elicited through interviews and questionnaires are now commonly referred to as corpora in sociolinguistics and dialectology (e.g. see Bauer 2002; Tagliamonte 2006; Kretzschmar et al. 2006; D'Arcy 2011), this review focuses on corpora of naturally occurring texts and discourse. The second part of this chapter presents the results of an analysis of variation in not contraction across region, gender, and time in a corpus of American English letters to the editor in order to exemplify a corpus-based approach to dialectology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, I concentrate on court cases with litigants in person (lay people who act on their own behalf in legal proceedings without a counsel or solicitor) and discuss the challenges of building a corpus of courtroom discourse where it is crucial to distinguish between speakers due to their distinct institutional roles. The corpus incorporates seven sub-corpora of verbatim transcripts from different court cases with litigants in person and comprises over eleven-million tokens. The focus of this paper is on the interplay between the legal and lay discourse types and how judges project their institutional roles through well-initiated turns directed at litigants in person and counsels. As a versatile discourse marker, well provides a good opportunity to explore how judges have to adapt their roles to ensure lay litigants in person receive the necessary support and that their lack of competence does not impede on the fairness of the proceedings. Given the breadth and importance of the topic of litigation in person, I discuss how the tools and approaches of corpus linguistics can be helpful in this multi-disciplinary area where multiple functions and uses of individual linguistic features need to be explored in depth.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Starting with a description of the software and hardware used for corpus linguistics in the late 1980s to early 1990s, this contribution discusses difficulties faced by the software designer when attempting to allow users to study text. Future human-machine interfaces may develop to be much more sophisticated, and certainly the aspects of text which can be studied will progress beyond plain text without images. Another area which will develop further is the study of patternings involving not just single words but word-relations across large stretches of text.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper introduces a quantitative method for identifying newly emerging word forms in large time-stamped corpora of natural language and then describes an analysis of lexical emergence in American social media using this method based on a multi-billion word corpus of Tweets collected between October 2013 and November 2014. In total 29 emerging word forms, which represent various semantic classes, grammatical parts-of speech, and word formations processes, were identified through this analysis. These 29 forms are then examined from various perspectives in order to begin to better understand the process of lexical emergence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The distribution of variants used to express future temporal reference has been the object of many studies, focused on conversational speech or on written data. This article sheds new light on the issue by studying future markers in a communicative setting which consists of prepared speech (the televised weather forecast) from a diatopic perspective (comparison of French and Québécois corpora). The distributional analysis points to a distribution of variants specific to this discursive setting. Furthermore, the Goldvarb X multivariate analysis reveals diatopic variation and the influence of some linguistic factors, most notably the type of verb, as well as the effect of constraints specific to the two speech communities under study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction: The research and teaching of French linguistics in UK higher education (HE) institutions have a venerable history; a number of universities have traditionally offered philology or history of the language courses, which complement literary study. A deeper understanding of the way that the phonology, syntax and semantics of the French language have evolved gives students linguistic insights that dovetail with their study of the Roman de Renart, Rabelais, Racine or the nouveau roman. There was, in the past, some coverage of contemporary French phonetics but little on sociolinguistic issues. More recently, new areas of research and teaching have been developed, with a particular focus on contemporary spoken French and on sociolinguistics. Well supported by funding councils, UK researchers are also making an important contribution in other areas: phonetics and phonology, syntax, pragmatics and second-language acquisition. A fair proportion of French linguistics research occurs outside French sections in psychology or applied linguistics departments. In addition, the UK plays a particular role in bringing together European and North American intellectual traditions and methodologies and in promoting the internationalisation of French linguistics research through the strength of its subject associations, and that of the Journal of French Language Studies. The following sections treat each of these areas in turn. History of the French Language There is a long and distinguished tradition in Britain of teaching and research on the history of the French language, particularly, but by no means exclusively, at the universities of Cambridge, Manchester and Oxford.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.