7 resultados para linguistic variables,

em Aston University Research Archive


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Requirements-aware systems address the need to reason about uncertainty at runtime to support adaptation decisions, by representing quality of services (QoS) requirements for service-based systems (SBS) with precise values in run-time queryable model specification. However, current approaches do not support updating of the specification to reflect changes in the service market, like newly available services or improved QoS of existing ones. Thus, even if the specification models reflect design-time acceptable requirements they may become obsolete and miss opportunities for system improvement by self-adaptation. This articles proposes to distinguish "abstract" and "concrete" specification models: the former consists of linguistic variables (e.g. "fast") agreed upon at design time, and the latter consists of precise numeric values (e.g. "2ms") that are dynamically calculated at run-time, thus incorporating up-to-date QoS information. If and when freshly calculated concrete specifications are not satisfied anymore by the current service configuration, an adaptation is triggered. The approach was validated using four simulated SBS that use services from a previously published, real-world dataset; in all cases, the system was able to detect unsatisfied requirements at run-time and trigger suitable adaptations. Ongoing work focuses on policies to determine recalculation of specifications. This approach will allow engineers to build SBS that can be protected against market-caused obsolescence of their requirements specifications. © 2012 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There are several unresolved problems in forensic authorship profiling, including a lack of research focusing on the types of texts that are typically analysed in forensic linguistics (e.g. threatening letters, ransom demands) and a general disregard for the effect of register variation when testing linguistic variables for use in profiling. The aim of this dissertation is therefore to make a first step towards filling these gaps by testing whether established patterns of sociolinguistic variation appear in malicious forensic texts that are controlled for register. This dissertation begins with a literature review that highlights a series of correlations between language use and various social factors, including gender, age, level of education and social class. This dissertation then presents the primary data set used in this study, which consists of a corpus of 287 fabricated malicious texts from 3 different registers produced by 96 authors stratified across the 4 social factors listed above. Since this data set is fabricated, its validity was also tested through a comparison with another corpus consisting of 104 naturally occurring malicious texts, which showed that no important differences exist between the language of the fabricated malicious texts and the authentic malicious texts. The dissertation then reports the findings of the analysis of the corpus of fabricated malicious texts, which shows that the major patterns of sociolinguistic variation identified in previous research are valid for forensic malicious texts and that controlling register variation greatly improves the performance of profiling. In addition, it is shown that through regression analysis it is possible to use these patterns of linguistic variation to profile the demographic background of authors across the four social factors with an average accuracy of 70%. Overall, the present study therefore makes a first step towards developing a principled model of forensic authorship profiling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article considers how conscious use of dialect in writing is an intentional act and can be accounted for through the notion of enregisterment. It does this by exploring the value of dialect in social and ideological contexts in relation to a regional dialect of British speech, that of the Black Country in the West Midlands region of England. The article provides a summary of recent directions in sociolinguistic research and an overview of the Black Country speech community, including a summary of its distinctive linguistic variables. This description is then used as an external evaluation of the authenticity of written representations of Black Country speech and the items enregistered in writing. Analysis of three written texts taken from three different genres across a time span of 30 years reveals the extent to which identified linguistic features are drawn upon in each one of the three texts and the extent to which any one is enregisterd across all three. The article discusses the social and linguistic contexts within which the writing occurs by way of accounting for their enregisterment as markers of identity linked to region and place. It also considers the ways in which the texts juxtapose norms and values of those "within" the community with those from "outside" the community in ways that subvert traditional notions of linguistic hierarchy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Relatively little research on dialect variation has been based on corpora of naturally occurring language. Instead, dialect variation has been studied based primarily on language elicited through questionnaires and interviews. Eliciting dialect data has several advantages, including allowing for dialectologists to select individual informants, control the communicative situation in which language is collected, elicit rare forms directly, and make high-quality audio recordings. Although far less common, a corpus-based approach to data collection also has several advantages, including allowing for dialectologists to collect large amounts of data from a large number of informants, observe dialect variation across a range of communicative situations, and analyze quantitative linguistic variation in large samples of natural language. Although both approaches allow for dialect variation to be observed, they provide different perspectives on language variation and change. The corpus- based approach to dialectology has therefore produced a number of new findings, many of which challenge traditional assumptions about the nature of dialect variation. Most important, this research has shown that dialect variation involves a wider range of linguistic variables and exists across a wider range of language varieties than has previously been assumed. The goal of this chapter is to introduce this emerging approach to dialectology. The first part of this chapter reviews the growing body of research that analyzes dialect variation in corpora, including research on variation across nations, regions, genders, ages, and classes, in both speech and writing, and from both a synchronic and diachronic perspective, with a focus on dialect variation in the English language. Although collections of language data elicited through interviews and questionnaires are now commonly referred to as corpora in sociolinguistics and dialectology (e.g. see Bauer 2002; Tagliamonte 2006; Kretzschmar et al. 2006; D'Arcy 2011), this review focuses on corpora of naturally occurring texts and discourse. The second part of this chapter presents the results of an analysis of variation in not contraction across region, gender, and time in a corpus of American English letters to the editor in order to exemplify a corpus-based approach to dialectology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We compared reading acquisition in English and Italian children up to late primary school analyzing RTs and errors as a function of various psycholinguistic variables and changes due to experience. Our results show that reading becomes progressively more reliant on larger processing units with age, but that this is modulated by consistency of the language. In English, an inconsistent orthography, reliance on larger units occurs earlier on and it is demonstrated by faster RTs, a stronger effect of lexical variables and lack of length effect (by fifth grade). However, not all English children are able to master this mode of processing yielding larger inter-individual variability. In Italian, a consistent orthography, reliance on larger units occurs later and it is less pronounced. This is demonstrated by larger length effects which remain significant even in older children and by larger effects of a global factor (related to speed of orthographic decoding) explaining changes of performance across ages. Our results show the importance of considering not only overall performance, but inter-individual variability and variability between conditions when interpreting cross-linguistic differences.