948 resultados para Modern language
Resumo:
Hypertexts are digital texts characterized by interactive hyperlinking and a fragmented textual organization. Increasingly prominent since the early 1990s, hypertexts have become a common text type both on the Internet and in a variety of other digital contexts. Although studied widely in disciplines like hypertext theory and media studies, formal linguistic approaches to hypertext continue to be relatively rare. This study examines coherence negotiation in hypertext with particularly reference to hypertext fiction. Coherence, or the quality of making sense, is a fundamental property of textness. Proceeding from the premise that coherence is a subjectively evaluated property rather than an objective quality arising directly from textual cues, the study focuses on the processes through which readers interact with hyperlinks and negotiate continuity between hypertextual fragments. The study begins with a typological discussion of textuality and an overview of the historical and technological precedents of modern hypertexts. Then, making use of text linguistic, discourse analytical, pragmatic, and narratological approaches to textual coherence, the study takes established models developed for analyzing and describing conventional texts, and examines their applicability to hypertext. Primary data derived from a collection of hyperfictions is used throughout to illustrate the mechanisms in practice. Hypertextual coherence negotiation is shown to require the ability to cognitively operate between local and global coherence by means of processing lexical cohesion, discourse topical continuities, inferences and implications, and shifting cognitive frames. The main conclusion of the study is that the style of reading required by hypertextuality fosters a new paradigm of coherence. Defined as fuzzy coherence, this new approach to textual sensemaking is predicated on an acceptance of the coherence challenges readers experience when the act of reading comes to involve repeated encounters with referentially imprecise hyperlinks and discourse topical shifts. A practical application of fuzzy coherence is shown to be in effect in the way coherence is actively manipulated in hypertext narratives.
Resumo:
This dissertation is a descriptive grammar of Ternate Chabacano, a Spanish-lexifier Creole spoken by 3.000 people in the town of Ternate, Philippines. The dissertation offers an analysis of the phonological, morphological, and syntactic system of the language. It includes an overview of the historical background, the current situation of the speech community and a collection of annotated texts. Ternate Chabacano shares many characteristics with its main adstrate language Tagalog as well as the dialectal varieties of Spanish. At present, English also exerts an influence, nevertheless mainly affecting its lexicon. The description offered is based on fieldwork conducted in Ternate. Spoken language collected through thematic interviews forms the main type of the material analysed. Information regarding the informants and text types is included in the examples. Ternate Chabacano has a five-vowel system and 17 consonant phonemes. The morphology of the language is largely isolating. Clitics are used extensively for expressing adverbial relations. The verbal system is based on the preverbal markers that express the category of tense, modality and aspect, among which aspect is the main dimension. Complex predicates and verbal chains are used in order to further distinguish aspect and modality, as well as changes of voice and valency. Intransitive verbs express motion, states, and reflexive actions, even though the majority of verbs can occur in both intransitive and transitive clauses. Ternate Chabacano is a nominative-accusative type language but the typological configuration of the Philippine languages influences the marking of its constituents. A case in point is constituted by the nominal determination system. The basic constituent order in a clause is VSO. Equative and attibutive clauses are formed by juxtaposition while the locative clauses feature a copula. Indefinite terms are expressed through existential constructions. The negation of existential clauses differs from standard negation but both are intensified in the same way. In spoken discourse, tag-questions are common. Pragmatic elements and social formulas reflect largely the corresponding Tagalog expressions. Coordination and subordination occur typically without overt markers but a variety of markers exists for expressing different relations, especially those made explicit by adverbial clauses. Verbal chains form a continuum from serial verbs to complementation and ultimately to coordination.
Resumo:
Language Documentation and Description as Language Planning Working with Three Signed Minority Languages Sign languages are minority languages that typically have a low status in society. Language planning has traditionally been controlled from outside the sign-language community. Even though signed languages lack a written form, dictionaries have played an important role in language description and as tools in foreign language learning. The background to the present study on sign language documentation and description as language planning is empirical research in three dictionary projects in Finland-Swedish Sign Language, Albanian Sign Language, and Kosovar Sign Language. The study consists of an introductory article and five detailed studies which address language planning from different perspectives. The theoretical basis of the study is sociocultural linguistics. The research methods used were participant observation, interviews, focus group discussions, and document analysis. The primary research questions are the following: (1) What is the role of dictionary and lexicographic work in language planning, in research on undocumented signed language, and in relation to the language community as such? (2) What factors are particular challenges in the documentation of a sign language and should therefore be given special attention during lexicographic work? (3) Is a conventional dictionary a valid tool for describing an undocumented sign language? The results indicate that lexicographic work has a central part to play in language documentation, both as part of basic research on undocumented sign languages and for status planning. Existing dictionary work has contributed new knowledge about the languages and the language communities. The lexicographic work adds to the linguistic advocacy work done by the community itself with the aim of vitalizing the language, empowering the community, receiving governmental recognition for the language, and improving the linguistic (human) rights of the language users. The history of signed languages as low status languages has consequences for language planning and lexicography. One challenge that the study discusses is the relationship between the sign-language community and the hearing sign linguist. In order to make it possible for the community itself to take the lead in a language planning process, raising linguistic awareness within the community is crucial. The results give rise to questions of whether lexicographic work is of more importance for status planning than for corpus planning. A conventional dictionary as a tool for describing an undocumented sign language is criticised. The study discusses differences between signed and spoken/written languages that are challenging for lexicographic presentations. Alternative electronic lexicographic approaches including both lexicon and grammar are also discussed. Keywords: sign language, Finland-Swedish Sign Language, Albanian Sign Language, Kosovar Sign Language, language documentation and description, language planning, lexicography
Resumo:
HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.
Resumo:
FinnWordNet is a wordnet for Finnish that complies with the format of the Princeton WordNet (PWN) (Fellbaum, 1998). It was built by translating the PrincetonWordNet 3.0 synsets into Finnish by human translators. It is open source and contains 117000 synsets. The Finnish translations were inserted into the PWN structure resulting in a bilingual lexical database. In natural language processing (NLP), wordnets have been used for infusing computers with semantic knowledge assuming that humans already have a sufficient amount of this knowledge. In this paper we present a case study of using wordnets as an electronic dictionary. We tested whether native Finnish speakers benefit from using a wordnet while completing English sentence completion tasks. We found that using either an English wordnet or a bilingual English Finnish wordnet significantly improves performance in the task. This should be taken into account when setting standards and comparing human and computer performance on these tasks.
Resumo:
The EU Directive harmonising copyright, Directive 2001/29/EC, has been implemented in all META-NORD countries. The licensing schemas of open content/open source and META-SHARE as well as CLARIN are discussed shortly. The status of the licensing of tools and resources available at the consortium partners are outlined. The aim of the article is to compare a set of open content and open source license and provide some guidance on the optimal use of licenses provided by META-NET and CLARIN for licensing the tools and resources for the benefit of the language technology community.
Resumo:
We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.
Resumo:
One of the most challenging tasks in building language resources is the copyright license management. There are several reasons for this. First of all, the current European copyright system is designed to a large extent to satisfy the commercial actors, e.g. publishers, record companies etc. This means that the scope and duration of the rights are very extensive and there are even certain forms of protection that do not exist elsewhere in the world, e.g. database right. On the other hand, the exceptions for research and teaching are typically very narrow.
Resumo:
Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.
Resumo:
Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.