37 resultados para Albanian language
Resumo:
In this dissertation I study language complexity from a typological perspective. Since the structuralist era, it has been assumed that local complexity differences in languages are balanced out in cross-linguistic comparisons and that complexity is not affected by the geopolitical or sociocultural aspects of the speech community. However, these assumptions have seldom been studied systematically from a typological point of view. My objective is to define complexity so that it is possible to compare it across languages and to approach its variation with the methods of quantitative typology. My main empirical research questions are: i) does language complexity vary in any systematic way in local domains, and ii) can language complexity be affected by the geographical or social environment? These questions are studied in three articles, whose findings are summarized in the introduction to the dissertation. In order to enable cross-language comparison, I measure complexity as the description length of the regularities in an entity; I separate it from difficulty, focus on local instead of global complexity, and break it up into different types. This approach helps avoid the problems that plagued earlier metrics of language complexity. My approach to grammar is functional-typological in nature, and the theoretical framework is basic linguistic theory. I delimit the empirical research functionally to the marking of core arguments (the basic participants in the sentence). I assess the distributions of complexity in this domain with multifactorial statistical methods and use different sampling strategies, implementing, for instance, the Greenbergian view of universals as diachronic laws of type preference. My data come from large and balanced samples (up to approximately 850 languages), drawn mainly from reference grammars. The results suggest that various significant trends occur in the marking of core arguments in regard to complexity and that complexity in this domain correlates with population size. These results provide evidence that linguistic patterns interact among themselves in terms of complexity, that language structure adapts to the social environment, and that there may be cognitive mechanisms that limit complexity locally. My approach to complexity and language universals can therefore be successfully applied to empirical data and may serve as a model for further research in these areas.
Resumo:
Language Documentation and Description as Language Planning Working with Three Signed Minority Languages Sign languages are minority languages that typically have a low status in society. Language planning has traditionally been controlled from outside the sign-language community. Even though signed languages lack a written form, dictionaries have played an important role in language description and as tools in foreign language learning. The background to the present study on sign language documentation and description as language planning is empirical research in three dictionary projects in Finland-Swedish Sign Language, Albanian Sign Language, and Kosovar Sign Language. The study consists of an introductory article and five detailed studies which address language planning from different perspectives. The theoretical basis of the study is sociocultural linguistics. The research methods used were participant observation, interviews, focus group discussions, and document analysis. The primary research questions are the following: (1) What is the role of dictionary and lexicographic work in language planning, in research on undocumented signed language, and in relation to the language community as such? (2) What factors are particular challenges in the documentation of a sign language and should therefore be given special attention during lexicographic work? (3) Is a conventional dictionary a valid tool for describing an undocumented sign language? The results indicate that lexicographic work has a central part to play in language documentation, both as part of basic research on undocumented sign languages and for status planning. Existing dictionary work has contributed new knowledge about the languages and the language communities. The lexicographic work adds to the linguistic advocacy work done by the community itself with the aim of vitalizing the language, empowering the community, receiving governmental recognition for the language, and improving the linguistic (human) rights of the language users. The history of signed languages as low status languages has consequences for language planning and lexicography. One challenge that the study discusses is the relationship between the sign-language community and the hearing sign linguist. In order to make it possible for the community itself to take the lead in a language planning process, raising linguistic awareness within the community is crucial. The results give rise to questions of whether lexicographic work is of more importance for status planning than for corpus planning. A conventional dictionary as a tool for describing an undocumented sign language is criticised. The study discusses differences between signed and spoken/written languages that are challenging for lexicographic presentations. Alternative electronic lexicographic approaches including both lexicon and grammar are also discussed. Keywords: sign language, Finland-Swedish Sign Language, Albanian Sign Language, Kosovar Sign Language, language documentation and description, language planning, lexicography
Resumo:
Researchers and developers in academia and industry would benefit from a facility that enables them to easily locate, licence and use the kind of empirical data they need for testing and refining their hypotheses and to deposit and disseminate their data e.g. to support replication and validation of reported scientific experiments. To answer these needs initially in Finland, there is an ongoing project at University of Helsinki and its collaborators to create a user-friendly web service for researchers and developers in Finland and other countries. In our talk, we describe ongoing work to create a palette of extensive but easily available Finnish language resources and technologies for the research community, including lexical resources, wordnets, morphologically tagged corpora, dependency syntactic treebanks and parsebanks, open-source finite state toolkits and libraries and language models to support text analysis and processing at customer site. Also first publicly available results are presented.
Resumo:
This paper introduces the META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. The goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described. Moreover, results achieved in first five months of the project, i.e. language whitepapers, metadata specification and IPR, are presented.
Resumo:
In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology tools, and demonstrate rapid building of Northern Sámi and English spell checkers from tools and resources available from the Internet.