979 resultados para Finno-Ugric languages.
Resumo:
Presentation at "Soome-ugri keelte andmebaasid ja e-leksikograafia" at Eesti Keele Instituut (Institution of Estonian Languages) in Tallnn on the 18th of November 2014.
Resumo:
Presentation of Jussi-Pekka Hakkarainen, held at the Emtacl15 conference on the 20th of April 2015 in Trondheim, Norway.
Resumo:
The emerging technologies have recently challenged the libraries to reconsider their role as a mere mediator between the collections, researchers, and wider audiences (Sula, 2013), and libraries, especially the nationwide institutions like national libraries, haven’t always managed to face the challenge (Nygren et al., 2014). In the Digitization Project of Kindred Languages, the National Library of Finland has become a node that connects the partners to interplay and work for shared goals and objectives. In this paper, I will be drawing a picture of the crowdsourcing methods that have been established during the project to support both linguistic research and lingual diversity. The National Library of Finland has been executing the Digitization Project of Kindred Languages since 2012. The project seeks to digitize and publish approximately 1,200 monograph titles and more than 100 newspapers titles in various, and in some cases endangered Uralic languages. Once the digitization has been completed in 2015, the Fenno-Ugrica online collection will consist of 110,000 monograph pages and around 90,000 newspaper pages to which all users will have open access regardless of their place of residence. The majority of the digitized literature was originally published in the 1920s and 1930s in the Soviet Union, and it was the genesis and consolidation period of literary languages. This was the era when many Uralic languages were converted into media of popular education, enlightenment, and dissemination of information pertinent to the developing political agenda of the Soviet state. The ‘deluge’ of popular literature in the 1920s to 1930s suddenly challenged the lexical orthographic norms of the limited ecclesiastical publications from the 1880s onward. Newspapers were now written in orthographies and in word forms that the locals would understand. Textbooks were written to address the separate needs of both adults and children. New concepts were introduced in the language. This was the beginning of a renaissance and period of enlightenment (Rueter, 2013). The linguistically oriented population can also find writings to their delight, especially lexical items specific to a given publication, and orthographically documented specifics of phonetics. The project is financially supported by the Kone Foundation in Helsinki and is part of the Foundation’s Language Programme. One of the key objectives of the Kone Foundation Language Programme is to support a culture of openness and interaction in linguistic research, but also to promote citizen science as a tool for the participation of the language community in research. In addition to sharing this aspiration, our objective within the Language Programme is to make sure that old and new corpora in Uralic languages are made available for the open and interactive use of the academic community as well as the language societies. Wordlists are available in 17 languages, but without tokenization, lemmatization, and so on. This approach was verified with the scholars, and we consider the wordlists as raw data for linguists. Our data is used for creating the morphological analyzers and online dictionaries at the Helsinki and Tromsø Universities, for instance. In order to reach the targets, we will produce not only the digitized materials but also their development tools for supporting linguistic research and citizen science. The Digitization Project of Kindred Languages is thus linked with the research of language technology. The mission is to improve the usage and usability of digitized content. During the project, we have advanced methods that will refine the raw data for further use, especially in the linguistic research. How does the library meet the objectives, which appears to be beyond its traditional playground? The written materials from this period are a gold mine, so how could we retrieve these hidden treasures of languages out of the stack that contains more than 200,000 pages of literature in various Uralic languages? The problem is that the machined-encoded text (OCR) contains often too many mistakes to be used as such in research. The mistakes in OCRed texts must be corrected. For enhancing the OCRed texts, the National Library of Finland developed an open-source code OCR editor that enabled the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary to implement, since these rare and peripheral prints did often include already perished characters, which are sadly neglected by the modern OCR software developers, but belong to the historical context of kindred languages and thus are an essential part of the linguistic heritage (van Hemel, 2014). Our crowdsourcing tool application is essentially an editor of Alto XML format. It consists of a back-end for managing users, permissions, and files, communicating through a REST API with a front-end interface—that is, the actual editor for correcting the OCRed text. The enhanced XML files can be retrieved from the Fenno-Ugrica collection for further purposes. Could the crowd do this work to support the academic research? The challenge in crowdsourcing lies in its nature. The targets in the traditional crowdsourcing have often been split into several microtasks that do not require any special skills from the anonymous people, a faceless crowd. This way of crowdsourcing may produce quantitative results, but from the research’s point of view, there is a danger that the needs of linguists are not necessarily met. Also, the remarkable downside is the lack of shared goal or the social affinity. There is no reward in the traditional methods of crowdsourcing (de Boer et al., 2012). Also, there has been criticism that digital humanities makes the humanities too data-driven and oriented towards quantitative methods, losing the values of critical qualitative methods (Fish, 2012). And on top of that, the downsides of the traditional crowdsourcing become more imminent when you leave the Anglophone world. Our potential crowd is geographically scattered in Russia. This crowd is linguistically heterogeneous, speaking 17 different languages. In many cases languages are close to extinction or longing for language revitalization, and the native speakers do not always have Internet access, so an open call for crowdsourcing would not have produced appeasing results for linguists. Thus, one has to identify carefully the potential niches to complete the needed tasks. When using the help of a crowd in a project that is aiming to support both linguistic research and survival of endangered languages, the approach has to be a different one. In nichesourcing, the tasks are distributed amongst a small crowd of citizen scientists (communities). Although communities provide smaller pools to draw resources, their specific richness in skill is suited for complex tasks with high-quality product expectations found in nichesourcing. Communities have a purpose and identity, and their regular interaction engenders social trust and reputation. These communities can correspond to research more precisely (de Boer et al., 2012). Instead of repetitive and rather trivial tasks, we are trying to utilize the knowledge and skills of citizen scientists to provide qualitative results. In nichesourcing, we hand in such assignments that would precisely fill the gaps in linguistic research. A typical task would be editing and collecting the words in such fields of vocabularies where the researchers do require more information. For instance, there is lack of Hill Mari words and terminology in anatomy. We have digitized the books in medicine, and we could try to track the words related to human organs by assigning the citizen scientists to edit and collect words with the OCR editor. From the nichesourcing’s perspective, it is essential that altruism play a central role when the language communities are involved. In nichesourcing, our goal is to reach a certain level of interplay, where the language communities would benefit from the results. For instance, the corrected words in Ingrian will be added to an online dictionary, which is made freely available for the public, so the society can benefit, too. This objective of interplay can be understood as an aspiration to support the endangered languages and the maintenance of lingual diversity, but also as a servant of ‘two masters’: research and society.
Resumo:
The article-based doctoral dissertation deals with adult individuals in Western societies who were born into multilingual and multicultural families and have parents of different nationalities. The study’s participants grew up outside their parents’ countries of origin and relate to a multitude of bonds that link them across various cultures, languages and places. The study explores the social dimension of cultural belonging and examines diverse approaches that enable the participants to create notions of belonging and identification despite possessing at times contradictory transnational allegiances. The works offers new perspectives on transnational belonging and makes a timely contribution to discussions in the fields of cultural heritage studies, ethnology and transnational studies. The dissertation combines qualitative research methods with an insider perspective. The empirical material is based on semi-structured interviews with fifteen participants, among which are also the author’s siblings. The study addresses the relevance of the author’s personal situatedness and her multi-faceted roles as well as ethical concerns related to the methodological approach of insider research. The social dimension of cultural identities affect both the participants’ identification with their multiple attachments and language use in everyday life. The key research findings present interrelated discussions of the participants’ notion of being a mixture, the importance of family bonds and multilingualism, a specific mixed family lifestyle, the notion of non-belonging and the study participants’ sense of otherness as a means of creating communality with others. The study discusses the participants’ various life strategies of flexible relativising, juggling with multiple affiliations, the approach of “blending in” and their sense of ironic nation-ness for constructing a coherent sense of belonging. The author argues that multicultural belonging is inextricably connected to an association with multiple languages, cultures and places. Multicultural belonging is relational and depends on the context, social relationships and locations. The study proposes that multicultural belonging creates a tolerant understanding of membership and enables experiences of cosmopolitanism and selected notions of allegiance.
Resumo:
There are more than 7000 languages in the world, and many of these have emerged through linguistic divergence. While questions related to the drivers of linguistic diversity have been studied before, including studies with quantitative methods, there is no consensus as to which factors drive linguistic divergence, and how. In the thesis, I have studied linguistic divergence with a multidisciplinary approach, applying the framework and quantitative methods of evolutionary biology to language data. With quantitative methods, large datasets may be analyzed objectively, while approaches from evolutionary biology make it possible to revisit old questions (related to, for example, the shape of the phylogeny) with new methods, and adopt novel perspectives to pose novel questions. My chief focus was on the effects exerted on the speakers of a language by environmental and cultural factors. My approach was thus an ecological one, in the sense that I was interested in how the local environment affects humans and whether this human-environment connection plays a possible role in the divergence process. I studied this question in relation to the Uralic language family and to the dialects of Finnish, thus covering two different levels of divergence. However, as the Uralic languages have not previously been studied using quantitative phylogenetic methods, nor have population genetic methods been previously applied to any dialect data, I first evaluated the applicability of these biological methods to language data. I found the biological methodology to be applicable to language data, as my results were rather similar to traditional views as to both the shape of the Uralic phylogeny and the division of Finnish dialects. I also found environmental conditions, or changes in them, to be plausible inducers of linguistic divergence: whether in the first steps in the divergence process, i.e. dialect divergence, or on a large scale with the entire language family. My findings concerning Finnish dialects led me to conclude that the functional connection between linguistic divergence and environmental conditions may arise through human cultural adaptation to varying environmental conditions. This is also one possible explanation on the scale of the Uralic language family as a whole. The results of the thesis bring insights on several different issues in both a local and a global context. First, they shed light on the emergence of the Finnish dialects. If the approach used in the thesis is applied to the dialects of other languages, broader generalizations may be drawn as to the inducers of linguistic divergence. This again brings us closer to understanding the global patterns of linguistic diversity. Secondly, the quantitative phylogeny of the Uralic languages, with estimated times of language divergences, yields another hypothesis as to the shape and age of the language family tree. In addition, the Uralic languages can now be added to the growing list of language families studied with quantitative methods. This will allow broader inferences as to global patterns of language evolution, and more language families can be included in constructing the tree of the world’s languages. Studying history through language, however, is only one way to illuminate the human past. Therefore, thirdly, the findings of the thesis, when combined with studies of other language families, and those for example in genetics and archaeology, bring us again closer to an understanding of human history.
Resumo:
The major hypothesis of this paper is that any deviance in syntax present in oral language will be evident in oral r eading behaviour. Using Lee and Canter's Developmental i 1 Sentence Scoring technique (1971) and Y. Goodman and Burke's Reading Miscue Inventory (1972) linguistic competence was established in t hree male children. ages 10 to 11. patterns of strengths and weaknesses in reading were determined. and the relationships t hat were established, were examined. Results of the study i ndicate that oral language behaviour is closely tied to oral r eading behaviour. This type of approach can be used as a basis for a diagnosis of a reading difficulty and then a prescription for language and reading skills.
Resumo:
UANL
Resumo:
UANL
Resumo:
Thèse diffusée initialement dans le cadre d'un projet pilote des Presses de l'Université de Montréal/Centre d'édition numérique UdeM (1997-2008) avec l'autorisation de l'auteur.
Resumo:
Problème: Ma thèse porte sur l’identité individuelle comme interrogation sur les enjeux personnels et sur ce qui constitue l’identification hybride à l’intérieur des notions concurrentielles en ce qui a trait à l’authenticité. Plus précisément, j’aborde le concept des identifications hybrides en tant que zones intermédiaires pour ce qui est de l’alternance de codes linguistiques et comme négociation des espaces continuels dans leur mouvement entre les cultures et les langues. Une telle négociation engendre des tensions et/ou apporte le lien créatif. Les tensions sont inhérentes à n’importe quelle construction d’identité où les lignes qui définissent des personnes ne sont pas spécifiques à une culture ou à une langue, où des notions de l’identité pure sont contestées et des codes communs de l’appartenance sont compromis. Le lien créatif se produit dans les exemples où l’alternance de code linguistique ou la négociation des espaces produit le mouvement ouvert et fluide entre les codes de concurrence des références et les différences à travers les discriminations raciales, la sexualité, la culture et la langue. Les travaux que j’ai sélectionnés représentent une section transversale de quelques auteurs migrants provenant de la minorité en Amérique du Nord qui alternent les codes linguistiques de cette manière. Les travaux détaillent le temps et l’espace dans leur traitement de l’identité et dans la façon dont ils cernent l’hybridité dans les textes suivants : The Woman Warrior de Maxine Hong Kingston (1975-76), Hunger of Memory de Richard Rodriguez (1982), Comment faire l’amour avec un nègre sans se fatiguer de Dany Laferrière (1985), Borderlands/La Frontera de Gloria Anzalduá (1987), Lost in Translation de Eva Hoffman (1989), Avril ou l’anti-passion de Antonio D’Alfonso (1990) et Chorus of Mushrooms de Hiromi Goto (1994). Enjeux/Questions La notion de l’identification hybride est provocante comme sujet. Elle met en question l’identité pure. C’est un sujet qui a suscité beaucoup de discussions tant en ce qui a trait à la littérature, à la politique, à la société, à la linguistique, aux communications, qu’au sein même des cercles philosophiques. Ce sujet est compliqué parce qu’il secoue la base des espaces fixes et structurés de l’identité dans sa signification culturelle et linguistique. Par exemple, la notion de patrie n’a pas les représentations exclusives du pays d’origine ou du pays d’accueil. De même, les notions de race, d’appartenance ethnique, et d’espaces sexuels sont parfois négativement acceptées si elles proviennent des codes socialement admis et normalisés de l’extérieur. De tels codes de la signification sont souvent définis par l’étiquette d’identification hétérosexuelle et blanche. Dans l’environnement généralisé d’aujourd’hui, plus que jamais, une personne doit négocier qui elle est, au sens de son appartenance à soi, en tant qu’individu et ce, face aux modèles locaux, régionaux, nationaux, voire même globaux de la subjectivité. Nous pouvons interpréter ce mouvement comme une série de couches superposées de la signification. Quand nous rencontrons une personne pour la première fois, nous ne voyons que la couche supérieure. D’ailleurs, son soi intérieur est caché par de nombreuses couches superposées (voir Joseph D. Straubhaar). Toutefois, sous cette couche supérieure, on retrouve beaucoup d’autres couches et tout comme pour un oignon, on doit les enlever une par une pour que l’individualité complète d’une personne soit révélée et comprise. Le noyau d’une personne représente un point de départ crucial pour opposer qui elle était à la façon dont elle se transforme sans cesse. Sa base, ou son noyau, dépend du moment, et comprend, mais ne s’y limite pas, ses origines, son environnement et ses expériences d’enfance, son éducation, sa notion de famille, et ses amitiés. De plus, les notions d’amour-propre et d’amour pour les autres, d’altruisme, sont aussi des points importants. Il y a une relation réciproque entre le soi et l’autre qui établit notre degré d’estime de soi. En raison de la mondialisation, notre façon de comprendre la culture, en fait, comment on consomme et définit la culture, devient rapidement un phénomène de déplacement. À l’intérieur de cette arène de culture généralisée, la façon dont les personnes sont à l’origine chinoises, mexicaines, italiennes, ou autres, et poursuivent leur évolution culturelle, se définit plus aussi facilement qu’avant. Approche Ainsi, ma thèse explore la subjectivité hybride comme position des tensions et/ou des relations créatrices entre les cultures et les langues. Quoique je ne souhaite aucunement simplifier ni le processus, ni les questions de l’auto-identification, il m’apparaît que la subjectivité hybride est aujourd’hui une réalité croissante dans l’arène généralisée de la culture. Ce processus d’échange est particulièrement complexe chez les populations migrantes en conflit avec leur désir de s’intégrer dans les nouveaux espaces adoptés, c’est-à-dire leur pays d’accueil. Ce réel désir d’appartenance peut entrer en conflit avec celui de garder les espaces originels de la culture définie par son pays d’origine. Ainsi, les références antérieures de l’identification d’une personne, les fondements de son individualité, son noyau, peuvent toujours ne pas correspondre à, ou bien fonctionner harmonieusement avec, les références extérieures et les couches d’identification changeantes, celles qu’elle s’approprie du pays d’accueil. Puisque nos politiques, nos religions et nos établissements d’enseignement proviennent des représentations nationales de la culture et de la communauté, le processus d’identification et la création de son individualité extérieure sont formées par le contact avec ces établissements. La façon dont une personne va chercher l’identification entre les espaces personnels et les espaces publics détermine ainsi le degré de conflit et/ou de lien créatif éprouvé entre les modes et les codes des espaces culturels et linguistiques. Par conséquent, l’identification des populations migrantes suggère que la « community and culture will represent both a hybridization of home and host cultures » (Straubhaar 27). Il y a beaucoup d’écrits au sujet de l’hybridité et des questions de l’identité et de la patrie, toutefois cette thèse aborde la valeur créative de l’alternance de codes culturels et linguistiques. Ce que la littérature indiquera Par conséquent, la plate-forme à partir de laquelle j’explore mon sujet de l’hybridité flotte entre l’interprétation postcoloniale de Homi Bhabha concernant le troisième espace hybride; le modèle d’hétéroglossie de Mikhail Bakhtine qui englobent plusieurs de mes exemples; la représentation de Roland Barthes sur l’identité comme espace transgressif qui est un modèle de référence et la contribution de Chantal Zabus sur le palimpseste et l’alternance de codes africains. J’utilise aussi le modèle de Sherry Simon portant sur l’espace urbain hybride de Montréal qui établit un lien important avec la valeur des échanges culturels et linguistiques, et les analyses de Janet Paterson. En effet, la façon dont elle traite la figure de l’Autre dans les modèles littéraires au Québec fournisse un aperçu régional et national de l’identification hybride. Enfin, l’exploration du bilinguisme de Doris Sommer comme espace esthétique et même humoristique d’identification situe l’hybridité dans une espace de rencontre créative. Conséquence Mon approche dans cette thèse ne prétend pas résoudre les problèmes qui peuvent résulter des plates-formes de la subjectivité hybride. Pour cette raison, j’évite d’aborder toute approche politique ou nationaliste de l’identité qui réfute l’identification hybride. De la même façon, je n’amène pas de discussion approfondie sur les questions postcoloniales. Le but de cette thèse est de démontrer à quel point la subjectivité hybride peut être une zone de relation créatrice lorsque l’alternance de codes permet des échanges de communication plus intimes entre les cultures et les langues. C’est un espace qui devient créateur parce qu’il favorise une attitude plus ouverte vis-à-vis les différents champs qui passent par la culture, aussi bien la langue, que la sexualité, la politique ou la religion. Les zones hybrides de l’identification nous permettent de contester les traditions dépassées, les coutumes, les modes de communication et la non-acceptation, toutes choses dépassées qui emprisonnent le désir et empêchent d’explorer et d’adopter des codes en dehors des normes et des modèles de la culture contenus dans le discours blanc, dominant, de l’appartenance culturelle et linguistique mondialisée. Ainsi, il appert que ces zones des relations multi-ethniques exigent plus d’attention des cercles scolaires puisque la population des centres urbains à travers l’Amérique du Nord devient de plus en plus nourrie par d’autres types de populations. Donc, il existe un besoin réel d’établir une communication sincère qui permettrait à la population de bien comprendre les populations adoptées. C’est une invitation à stimuler une relation plus intime de l’un avec l’autre. Toutefois, il est évident qu’une communication efficace à travers les frontières des codes linguistiques, culturels, sexuels, religieux et politiques exige une négociation continuelle. Mais une telle négociation peut stimuler la compréhension plus juste des différences (culturelle ou linguistique) si des institutions académiques offrent des programmes d’études intégrant davantage les littératures migrantes. Ma thèse vise à illustrer (par son choix littéraire) l’identification hybride comme une réalité importante dans les cultures généralisées qui croissent toujours aujourd’hui. Les espaces géographiques nous gardent éloignés les uns des autres, mais notre consommation de produits exotiques, qu’ils soient culturels ou non, et même notre consommation de l’autre, s’est rétrécie sensiblement depuis les deux dernières décennies et les indicateurs suggèrent que ce processus n’est pas une tendance, mais plutôt une nouvelle manière d’éprouver la vie et de connaître les autres. Ainsi les marqueurs qui forment nos frontières externes, aussi bien que ces marqueurs qui nous définissent de l’intérieur, exigent un examen minutieux de ces enjeux inter(trans)culturels, surtout si nous souhaitons nous en tenir avec succès à des langues et des codes culturels présents, tout en favorisant la diversité culturelle et linguistique. MOTS-CLÉS : identification hybride, mouvement ouvert, alternance de code linguistique, négociation des espaces, tensions, connectivité créative
Resumo:
This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised
Resumo:
Few major Research works are going in the field of Handwriting Word Recognition (HWR) of Indian languages. This paper surveys the major works of offline/online handwritten word recognition. Techniques involved in word recognition are also discussed. Major works carried out in Bangla, Urdu, Tamil and Hindi are mentioned in this paper. Advancement towards HWR in other Indian languages are also discussed. Application of offline HWR is also discussed
Resumo:
In this publication, we report on an online survey that was carried out among parallel programmers. More than 250 people worldwide have submitted answers to our questions, and their responses are analyzed here. Although not statistically sound, the data we provide give useful insights about which parallel programming systems and languages are known and in actual use. For instance, the collected data indicate that for our survey group MPI and (to a lesser extent) C are the most widely used parallel programming system and language, respectively.