11 resultados para linguistic variation
em Aston University Research Archive
Resumo:
This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.
Resumo:
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.
Resumo:
This paper investigates whether the position of adverb phrases in sentences is regionally patterned in written Standard American English, based on an analysis of a 25 million word corpus of letters to the editor representing the language of 200 cities from across the United States. Seven measures of adverb position were tested for regional patterns using the global spatial autocorrelation statistic Moran’s I and the local spatial autocorrelation statistic Getis-Ord Gi*. Three of these seven measures were indentified as exhibiting significant levels of spatial autocorrelation, contrasting the language of the Northeast with language of the Southeast and the South Central states. These results demonstrate that continuous regional grammatical variation exists in American English and that regional linguistic variation exists in written Standard English.
Resumo:
Relatively little research on dialect variation has been based on corpora of naturally occurring language. Instead, dialect variation has been studied based primarily on language elicited through questionnaires and interviews. Eliciting dialect data has several advantages, including allowing for dialectologists to select individual informants, control the communicative situation in which language is collected, elicit rare forms directly, and make high-quality audio recordings. Although far less common, a corpus-based approach to data collection also has several advantages, including allowing for dialectologists to collect large amounts of data from a large number of informants, observe dialect variation across a range of communicative situations, and analyze quantitative linguistic variation in large samples of natural language. Although both approaches allow for dialect variation to be observed, they provide different perspectives on language variation and change. The corpus- based approach to dialectology has therefore produced a number of new findings, many of which challenge traditional assumptions about the nature of dialect variation. Most important, this research has shown that dialect variation involves a wider range of linguistic variables and exists across a wider range of language varieties than has previously been assumed. The goal of this chapter is to introduce this emerging approach to dialectology. The first part of this chapter reviews the growing body of research that analyzes dialect variation in corpora, including research on variation across nations, regions, genders, ages, and classes, in both speech and writing, and from both a synchronic and diachronic perspective, with a focus on dialect variation in the English language. Although collections of language data elicited through interviews and questionnaires are now commonly referred to as corpora in sociolinguistics and dialectology (e.g. see Bauer 2002; Tagliamonte 2006; Kretzschmar et al. 2006; D'Arcy 2011), this review focuses on corpora of naturally occurring texts and discourse. The second part of this chapter presents the results of an analysis of variation in not contraction across region, gender, and time in a corpus of American English letters to the editor in order to exemplify a corpus-based approach to dialectology.
Resumo:
The goal of this study is to determine if various measures of contraction rate are regionally patterned in written Standard American English. In order to answer this question, this study employs a corpus-based approach to data collection and a statistical approach to data analysis. Based on a spatial autocorrelation analysis of the values of eleven measures of contraction across a 25 million word corpus of letters to the editor representing the language of 200 cities from across the contiguous United States, two primary regional patterns were identified: easterners tend to produce relatively few standard contractions (not contraction, verb contraction) compared to westerners, and northeasterners tend to produce relatively few non-standard contractions (to contraction, non-standard not contraction) compared to southeasterners. These findings demonstrate that regional linguistic variation exists in written Standard American English and that regional linguistic variation is more common than is generally assumed.
Resumo:
This article presents a new method for data collection in regional dialectology based on site-restricted web searches. The method measures the usage and determines the distribution of lexical variants across a region of interest using common web search engines, such as Google or Bing. The method involves estimating the proportions of the variants of a lexical alternation variable over a series of cities by counting the number of webpages that contain the variants on newspaper websites originating from these cities through site-restricted web searches. The method is evaluated by mapping the 26 variants of 10 lexical variables with known distributions in American English. In almost all cases, the maps based on site-restricted web searches align closely with traditional dialect maps based on data gathered through questionnaires, demonstrating the accuracy of this method for the observation of regional linguistic variation. However, unlike collecting dialect data using traditional methods, which is a relatively slow process, the use of site-restricted web searches allows for dialect data to be collected from across a region as large as the United States in a matter of days.
Resumo:
There are several unresolved problems in forensic authorship profiling, including a lack of research focusing on the types of texts that are typically analysed in forensic linguistics (e.g. threatening letters, ransom demands) and a general disregard for the effect of register variation when testing linguistic variables for use in profiling. The aim of this dissertation is therefore to make a first step towards filling these gaps by testing whether established patterns of sociolinguistic variation appear in malicious forensic texts that are controlled for register. This dissertation begins with a literature review that highlights a series of correlations between language use and various social factors, including gender, age, level of education and social class. This dissertation then presents the primary data set used in this study, which consists of a corpus of 287 fabricated malicious texts from 3 different registers produced by 96 authors stratified across the 4 social factors listed above. Since this data set is fabricated, its validity was also tested through a comparison with another corpus consisting of 104 naturally occurring malicious texts, which showed that no important differences exist between the language of the fabricated malicious texts and the authentic malicious texts. The dissertation then reports the findings of the analysis of the corpus of fabricated malicious texts, which shows that the major patterns of sociolinguistic variation identified in previous research are valid for forensic malicious texts and that controlling register variation greatly improves the performance of profiling. In addition, it is shown that through regression analysis it is possible to use these patterns of linguistic variation to profile the demographic background of authors across the four social factors with an average accuracy of 70%. Overall, the present study therefore makes a first step towards developing a principled model of forensic authorship profiling.
Resumo:
This thesis is part of a project whose overall aim is to assist participants on an MSc TESOL course who wish to begin to publish articles in the field to do so. The project, which is undertaken within a naturalistic paradigm, has two intimately related and mutually constitutive strands: one descriptive, one interventionist. The descriptive strand consists of an analytical model of the TESOL article genre, and it is instantiated in this thesis. The interventionist strand consists of a series of pedagogic interactions and materials intended to assist project participants formulate a text suitable for publication within the target genre, and it is reported on in this thesis. I begin the thesis by looking in detail at the research approach which characterises the project. I then attempt to explain the situational context of the work and to position it within the context of other research in the areas of discourse community membership, academic genres, genre learning and academic enculturation. Having thus contextualised the work, I next attempt a detailed exploration of the problems of postgraduate students in TESOL when first attempting to write in the TESOL article genre: this exploration is undertaken from both a linguistic and a pedagogic perspective. Then in subsequent chapters, both a linguistic and a pedagogic response to these problems are proposed: the first consisting of an analytical model of the target genre, the second consisting of a series of pedagogic interactions and materials. The relationships between the two lines of response are also examined in some detail. Then in the final part of the thesis, I report feedback from the interventionist strand and attempt to conduct an evaluation of the whole project to date. Criteria for evaluation are proposed and examined in some detail in the context of the research approach of the project. The concluding chapter is a brief discussion of future directions for this work.
Resumo:
This paper presents a statistical comparison of regional phonetic and lexical variation in American English. Both the phonetic and lexical datasets were first subjected to separate multivariate spatial analyses in order to identify the most common dimensions of spatial clustering in these two datasets. The dimensions of phonetic and lexical variation extracted by these two analyses were then correlated with each other, after being interpolated over a shared set of reference locations, in order to measure the similarity of regional phonetic and lexical variation in American English. This analysis shows that regional phonetic and lexical variation are remarkably similar in Modern American English.
Resumo:
The first study of its kind, Regional Variation in Written American English takes a corpus-based approach to map over a hundred grammatical alternation variables across the United States. A multivariate spatial analysis of these maps shows that grammatical alternation variables follow a relatively small number of common regional patterns in American English, which can be explained based on both linguistic and extra-linguistic factors. Based on this rigorous analysis of extensive data, Grieve identifies five primary modern American dialect regions, demonstrating that regional variation is far more pervasive and complex in natural language than is generally assumed. The wealth of maps and data and the groundbreaking implications of this volume make it essential reading for students and researchers in linguistics, English language, geography, computer science, sociology and communication studies.