849 resultados para Suda lexicon.
Resumo:
This dissertation is a descriptive grammar of Ternate Chabacano, a Spanish-lexifier Creole spoken by 3.000 people in the town of Ternate, Philippines. The dissertation offers an analysis of the phonological, morphological, and syntactic system of the language. It includes an overview of the historical background, the current situation of the speech community and a collection of annotated texts. Ternate Chabacano shares many characteristics with its main adstrate language Tagalog as well as the dialectal varieties of Spanish. At present, English also exerts an influence, nevertheless mainly affecting its lexicon. The description offered is based on fieldwork conducted in Ternate. Spoken language collected through thematic interviews forms the main type of the material analysed. Information regarding the informants and text types is included in the examples. Ternate Chabacano has a five-vowel system and 17 consonant phonemes. The morphology of the language is largely isolating. Clitics are used extensively for expressing adverbial relations. The verbal system is based on the preverbal markers that express the category of tense, modality and aspect, among which aspect is the main dimension. Complex predicates and verbal chains are used in order to further distinguish aspect and modality, as well as changes of voice and valency. Intransitive verbs express motion, states, and reflexive actions, even though the majority of verbs can occur in both intransitive and transitive clauses. Ternate Chabacano is a nominative-accusative type language but the typological configuration of the Philippine languages influences the marking of its constituents. A case in point is constituted by the nominal determination system. The basic constituent order in a clause is VSO. Equative and attibutive clauses are formed by juxtaposition while the locative clauses feature a copula. Indefinite terms are expressed through existential constructions. The negation of existential clauses differs from standard negation but both are intensified in the same way. In spoken discourse, tag-questions are common. Pragmatic elements and social formulas reflect largely the corresponding Tagalog expressions. Coordination and subordination occur typically without overt markers but a variety of markers exists for expressing different relations, especially those made explicit by adverbial clauses. Verbal chains form a continuum from serial verbs to complementation and ultimately to coordination.
Resumo:
The representation of morphologically complex words in the mental lexicon and their neurocognitive processing has been a vigorously debated topic in psycholinguistics and the cognitive neuroscience of language. This thesis investigates the effect of stimulus modality on morphological processing, the spatiotemporal dynamics of the neural processing of inflected (e.g., work+ed ) and derived (e.g., work+er ) words and their interaction, using the Finnish language. Overall, the results suggest that the constituent morphemes of isolated written and spoken inflected words are accessed separately, whereas spoken derived words activate both their full form and the constituent morphemes. The processing of both spoken and written inflected words elicited larger N400 responses than monomorphemic words (Study I), whereas the responses to spoken derived words did not differ from those to monomorphemic words (Study IV). Spoken inflected words elicited a larger left-lateralized negativity and greater source strengths in the left temporal cortices than derived words (Study IV). Thus, the results suggest different cortical processing for derived and inflected words. Moreover, the neural mechanisms underlying inflection and derivation seem to be not only different, but also independent as indexed by the linear summation of the responses to derived and inflected stimuli in a combined (derivation+inflection) condition (Study III). Furthermore, the processing of meaningless, spoken derived pseudowords was more difficult than for existing derived words, indexed by a larger N400-type effect for the pseudowords. However, no differences were observed between meaningful derived pseudowords and existing derived words (Study II). The results of Study II suggest that semantic compatibility between morphemes seems to have a crucial role in a successful morphological analysis. As a methodological note, time-locking the auditory event-related potentials/fields (ERP/ERF) to the suffix onset revealed the processes related to morphological analysis more precisely (Studies II and IV), which also enables comparison of the neural processes in different modalities (Study I).
Resumo:
Language Documentation and Description as Language Planning Working with Three Signed Minority Languages Sign languages are minority languages that typically have a low status in society. Language planning has traditionally been controlled from outside the sign-language community. Even though signed languages lack a written form, dictionaries have played an important role in language description and as tools in foreign language learning. The background to the present study on sign language documentation and description as language planning is empirical research in three dictionary projects in Finland-Swedish Sign Language, Albanian Sign Language, and Kosovar Sign Language. The study consists of an introductory article and five detailed studies which address language planning from different perspectives. The theoretical basis of the study is sociocultural linguistics. The research methods used were participant observation, interviews, focus group discussions, and document analysis. The primary research questions are the following: (1) What is the role of dictionary and lexicographic work in language planning, in research on undocumented signed language, and in relation to the language community as such? (2) What factors are particular challenges in the documentation of a sign language and should therefore be given special attention during lexicographic work? (3) Is a conventional dictionary a valid tool for describing an undocumented sign language? The results indicate that lexicographic work has a central part to play in language documentation, both as part of basic research on undocumented sign languages and for status planning. Existing dictionary work has contributed new knowledge about the languages and the language communities. The lexicographic work adds to the linguistic advocacy work done by the community itself with the aim of vitalizing the language, empowering the community, receiving governmental recognition for the language, and improving the linguistic (human) rights of the language users. The history of signed languages as low status languages has consequences for language planning and lexicography. One challenge that the study discusses is the relationship between the sign-language community and the hearing sign linguist. In order to make it possible for the community itself to take the lead in a language planning process, raising linguistic awareness within the community is crucial. The results give rise to questions of whether lexicographic work is of more importance for status planning than for corpus planning. A conventional dictionary as a tool for describing an undocumented sign language is criticised. The study discusses differences between signed and spoken/written languages that are challenging for lexicographic presentations. Alternative electronic lexicographic approaches including both lexicon and grammar are also discussed. Keywords: sign language, Finland-Swedish Sign Language, Albanian Sign Language, Kosovar Sign Language, language documentation and description, language planning, lexicography
Resumo:
We use parallel weighted finite-state transducers to implement a part-of-speech tagger, which obtains state-of-the-art accuracy when used to tag the Europarl corpora for Finnish, Swedish and English. Our system consists of a weighted lexicon and a guesser combined with a bigram model factored into two weighted transducers. We use both lemmas and tag sequences in the bigram model, which guarantees reliable bigram estimates.
Resumo:
Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.
Resumo:
N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.
Resumo:
We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.
Resumo:
Resumen: El presente artículo realiza un relevo teórico de las definiciones conceptuales de bilingüismo y de los principales modelos que procuran explicar la relación y los procesos cognitivos que se actualizan entre el nivel semántico y el nivel léxico de hablantes bilingües. Esta recensión se presenta con el objeto de analizar qué grado de influencia e interacción se produce entre el léxico de cada una de las lenguas y la red semántica que posee una persona bilingüe, ilustrado con rusoparlantes. Este desarrollo se propone constituir un aporte para las intervenciones sociales y educativas que involucran población bilingüe, cada vez más frecuentes en virtud de los crecientes movimientos migratorios
Resumo:
Participantes en el proyecto Nerthus: Javier Martín Arista (Universidad de La Rioja, Investigador principal), Laboratorio de Documentación Geométrica del Patrimonio (Universidad del País Vasco UPV/EHU).-- Sitio web del proyecto: http://www.nerthusproject.com/
Resumo:
[ES] El interés de los estudiosos modernos por el léxico especializado en latín empezó ya en las primeras décadas del siglo xx. Sin embargo, el tratamiento sistemático de los términos científico-técnicos tanto desde el punto de vista teórico como práctico tardó más de medio siglo en alcanzar un cierto grado de desarrollo porque no disponían de instrumentos adecuados para progresar adecuadamente. La llegada de las modernas tecnologías electrónicas para el tratamiento masivo de la información, así como el desarrollo teórico de una ciencia cognitiva de la comunicación han proporcionado a los investigadores los medios para elaborar potentes instrumentos lexicográficos que son capaces de dar satisfacción en buena medida a las necesidades que tenía el gran desarrollo alcanzado por la investigación a lo largo de las últimas décadas en todos los campos de la ciencia. El decotgrel, en tanto que diccionario concordado, es un buen ejemplo de las posibilidades y retos que tiene ante sí la lexicografía y la terminología del siglo XXI.
Resumo:
[ES] El léxico griego relativo a los animales marinos constituye un campo muy extenso, al que en pocas ocasiones han dedicado su interés los estudiosos de la antigüedad. En el presente artículo ofrecemos un acercamiento a uno de los grupos que lo constituyen, el de los moluscos, junto con un intento de identificación de todas aquellas especies que lo permiten, partiendo de la obra de Ateneo de Náucratis y completando su información con otros datos obtenidos de Aristóteles, Eliano, Opiano y Plinio el viejo.
Resumo:
ENGLISH (pgs. 267-283): In the spring of 1963, the senior author, who is a member of the staff of the Nankai Regional Fisheries Research. Laboratory, Fisheries Agency, Japanese Government, came to the Institute of Marine Resources of the University of California as a visiting investigator, bringing with him catch statistical data from the fishery in the eastern Pacific, which had been collected at the Nankai Regional Fisheries Research Laboratory (NRFRL) through September 1962, in order to conduct studies of these data in collaboration with the junior author, and with investigators of the InterAmerican Tropical Tuna Commission. A general review of the long-line fishery, based on the catch statistics of the commercial fishing fleet has been published by Suda and Schaefer (1965). In this paper we present an analysis of data respecting the size-composition of yellowfin tuna taken on long-line gear throughout the eastern Pacific between 1958 and 1962, and make some comparisons with data on size-composition of yellowfin tuna taken in the near-surface fishery, by bait boats and purse-seiners, in waters adjacent to the American coast. As has been shown by Suda and Schaefer (1965), the long-line fishery in the eastern Pacific is primarily directed toward the capture of bigeye tuna. However, considerable quantities of yellowfin tuna are also taken on this gear, and, in addition, there are substantial catches of albacore and of several species of spearfishes in some parts of the range of this fishery. Information respecting the catch rates of yellowfin tuna, and information respecting the size-composition of the stock of yellowfin tuna being exploited by the long-line fishery, is of particular interes~" because the yellowfin tuna population of the eastern Pacific is also subject to an intensive fishery by baitboats and purse-seiners which capture this species, together with skipjack, near the surface along the coast of the Americas, and around the outlying islands, in the region of California to northern Chile. SPANISH (pgs. 311-329): En la primavera de 1963, el autor principal, quien es miembro del personal del Nankai Regional Fisheries Research Laboratory, Fisheries Agency del gobierno japonés, vino al Institute of Marine Resources de la Universidad de California en calidad de investigador visitante y trajo consigo datos estadísticos de las capturas de la pesquería en el Pacífico oriental, que habían sido recolectados en el Nankai Regional Fisheries Research Laboratory (NRFRL) hasta septiembre de 1962, con el fin de hacer estudios de esos datos en colaboración con el coautor y con investigadores de la Comisión Interamericana del Atún Tropical. Una revisión general de la pesquería con palangre, basada sobre las estadísticas de captura de la flota pesquera comercial, ha sido publicada por Suda y Schaefer (1965). En este trabajo presentamos un análisis de los datos correspondientes a la composición de tamaños del atún aleta amarilla capturado con equipo palangrero en todo el Pacífico oriental, entre 1958 y 1962, y hacemos algunas comparaciones con los datos sobre la composición de tamaños del atún aleta amarilla cogido en la pesquería superficial cercana, por barcos de carnada y rederos en aguas adyacentes a la costa americana. Como ha sido demostrado por Suda y Schaefer (1965) la pesquería con palangre en el Pacífico oriental tiene como principal objeto la captura del atún ojo grande. Sin embargo, considerables cantidades de atún aleta amarilla son capturadas también por este equipo y, además, hay también considerables capturas de albacora y de diversas especies de peces-espada en algunas partes de la región que abarca esta pesquería. La información respecto a las tasas de captura del atún aleta amarilla, y la relativa a la composición de tamaños del stock de esta especie que explota la pesquería con palangre, es de particular interés, a causa de que la población de atún aleta amarilla del Pacífico oriental es también objeto de una pesca intensiva por barcos de carnada y rederos que capturan esta especie, junto con el barrilete, cerca de la superficie a 10 largo de la costa de las Américas y alrededor de las islas mar afuera, en la región desde California hasta el norte de Chile.
Resumo:
In this article we describe the methodology developed for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level following the PropBank-VerbNet model. The methodology presented is the product of detailed theoretical study of the semantic nature of verbs in Basque and of their similarities and differences with verbs in other languages. As part of the proposed methodology, we are creating a Basque lexicon on the PropBank-VerbNet model that we have named the Basque Verb Index (BVI). Our work thus dovetails the general trend toward building lexicons from tagged corpora that is clear in work conducted for other languages. EPEC-RolSem and BVI are two important resources for the computational semantic processing of Basque; as far as the authors are aware, they are also the first resources of their kind developed for Basque. In addition, each entry in BVI is linked to the corresponding verb-entry in well-known resources like PropBank, VerbNet, WordNet, Levin’s Classification and FrameNet. We have also implemented several automatic processes to aid in creating and annotating the BVI, including processes designed to facilitate the task of manual annotation.