2 resultados para Corpus Lexicography
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
The Leaving Certificate (LC) is the national, standardised state examination in Ireland necessary for entry to third level education – this presents a massive, raw corpus of data with the potential to yield invaluable insight into the phenomena of learner interlanguage. With samples of official LC Spanish examination data, this project has compiled a digitised corpus of learner Spanish comprised of the written and oral production of 100 candidates. This corpus was then analysed using a specific investigative corpus technique, Computer-aided Error Analysis (CEA, Dagneaux et al, 1998). CEA is a powerful apparatus in that it greatly facilitates the quantification and analysis of a large learner corpus in digital format. The corpus was both compiled and analysed with the use of UAM Corpus Tool (O’Donnell 2013). This Tool allows for the recording of candidate-specific variables such as grade, examination level, task type and gender, therefore allowing for critical analysis of the corpus as one unit, as separate written and oral sub corpora and also of performance per task, level and gender. This is an interdisciplinary work combining aspects of Applied Linguistics, Learner Corpus Research and Foreign Language (FL) Learning. Beginning with a review of the context of FL learning in Ireland and Europe, I go on to discuss the disciplinary context and theoretical framework for this work and outline the methodology applied. I then perform detailed quantitative and qualitative analyses before going on to combine all research findings outlining principal conclusions. This investigation does not make a priori assumptions about the data set, the LC Spanish examination, the context of FLs or of any aspect of learner competence. It undertakes to provide the linguistic research community and the domain of Spanish language learning and pedagogy in Ireland with an empirical, descriptive profile of real learner performance, characterising learner difficulty.
Resumo:
Users seeking information may not find relevant information pertaining to their information need in a specific language. But information may be available in a language different from their own, but users may not know that language. Thus users may experience difficulty in accessing the information present in different languages. Since the retrieval process depends on the translation of the user query, there are many issues in getting the right translation of the user query. For a pair of languages chosen by a user, resources, like incomplete dictionary, inaccurate machine translation system may exist. These resources may be insufficient to map the query terms in one language to its equivalent terms in another language. Also for a given query, there might exist multiple correct translations. The underlying corpus evidence may suggest a clue to select a probable set of translations that could eventually perform a better information retrieval. In this paper, we present a cross language information retrieval approach to effectively retrieve information present in a language other than the language of the user query using the corpus driven query suggestion approach. The idea is to utilize the corpus based evidence of one language to improve the retrieval and re-ranking of news documents in the other language. We use FIRE corpora - Tamil and English news collections in our experiments and illustrate the effectiveness of the proposed cross language information retrieval approach.