Biblioteca Digital

6 resultados para Chagatai language--Dictionaries--Persian

em Aston University Research Archive

Corpus-driven lexicography

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’. The corpus represents his concern with the nature of linguistic evidence. Lexicography is for him the canonical mode of language description at the lexical level. And his belief that the corpus should ‘drive’ the description is reflected in his constant attempts to utilize the emergent computer technologies to automate the initial stages of analysis and defer the intuitive, interpretative contributions of linguists to increasingly later stages in the process. Sinclair’s model of corpus-driven lexicography has spread far beyond its initial implementation at Cobuild, to most EFL dictionaries, to native-speaker dictionaries (e.g. the New Oxford Dictionary of English, and many national language dictionaries in emerging or re-emerging speech communities) and bilingual dictionaries (e.g. Collins, Oxford-Hachette).

Veja mais

Language corpora on computer and dictionaries on CD-Rom

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Authority in the classroom: dictionaries and corpora, COBUILD and the Bank of English

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Language learners ask a variety of questions about words and their meanings and uses: “What does X mean? What is the word for X in English? Can you say X? When do you use X and when do you use Y (e.g. synonyms, grammatical structures, prepositional choices, variant phrases, etc)?”

Veja mais

Linguistic identifiers of L1 Persian speakers writing in English:NLID for authorship analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research focuses on Native Language Identification (NLID), and in particular, on the linguistic identifiers of L1 Persian speakers writing in English. This project comprises three sub-studies; the first study devises a coding system to account for interlingual features present in a corpus of L1 Persian speakers blogging in English, and a corpus of L1 English blogs. Study One then demonstrates that it is possible to use interlingual identifiers to distinguish authorship by L1 Persian speakers. Study Two examines the coding system in relation to the L1 Persian corpus and a corpus of L1 Azeri and L1 Pashto speakers. The findings of this section indicate that the NLID method and features designed are able to discriminate between L1 influences from different languages. Study Three focuses on elicited data, in which participants were tasked with disguising their language to appear as L1 Persian speakers writing in English. This study indicated that there was a significant difference between the features in the L1 Persian corpus, and the corpus of disguise texts. The findings of this research indicate that NLID and the coding system devised have a very strong potential to aid forensic authorship analysis in investigative situations. Unlike existing research, this project focuses predominantly on blogs, as opposed to student data, making the findings more appropriate to forensic casework data.

Veja mais

Language as chunks, not words

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many people think of language as words. Words are small, convenient units, especially in written English, where they are separated by spaces. Dictionaries seem to reinforce this idea, because entries are arranged as a list of alphabetically-ordered words. Traditionally, linguists and teachers focused on grammar and treated words as self-contained units of meaning, which fill the available grammatical slots in a sentence. More recently, attention has shifted from grammar to lexis, and from words to chunks. Dictionary headwords are convenient points of access for the user, but modern dictionary entries usually deal with chunks, because meanings often do not arise from individual words, but from the chunks in which the words occur. Corpus research confirms that native speakers of a language actually work with larger “chunks” of language. This paper will show that teachers and learners will benefit from treating language as chunks rather than words.

Veja mais

Native language identification (NLID) for forensic authorship analysis of weblogs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This chapter introduces Native Language Identification (NLID) and considers the casework applications with regard to authorship analysis of online material. It presents findings from research identifying which linguistic features were the best indicators of native (L1) Persian speakers blogging in English, and analyses how these features cope at distinguishing between native influences from languages that are linguistically and culturally related. The first chapter section outlines the area of Native Language Identification, and demonstrates its potential for application through a discussion of relevant case history. The next section discusses a development of methodology for identifying influence from L1 Persian in an anonymous blog author, and presents findings. The third part discusses the application of these features to casework situations as well as how the features identified can form an easily applicable model and demonstrates the application of this to casework. The research presented in this chapter can be considered a case study for the wider potential application of NLID.

Veja mais

6 resultados para Chagatai language--Dictionaries--Persian

em Aston University Research Archive

Filtro por publicador

Corpus-driven lexicography

Language corpora on computer and dictionaries on CD-Rom

Authority in the classroom: dictionaries and corpora, COBUILD and the Bank of English

Linguistic identifiers of L1 Persian speakers writing in English:NLID for authorship analysis

Language as chunks, not words

Native language identification (NLID) for forensic authorship analysis of weblogs