20 resultados para corpora allata


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dels criteris proposats per la bibliogra a per identi car les perífrasis verbals (PV) de l'espanyol, només alguns ajuden a diferenciar amb e càcia les PV de les construccions no perifràstiques (CNP). L'objectiu d'aquest article és revisar aquests criteris i avaluar-los per tal de determinar quins són realment vàlids per identi car les PV. L'avaluació s'ha dut a terme amb un grup de 15 conjunts verbals. A més, s'ha realitzat un estudi experimental amb corpus per determinar la productivitat de les PV detectades.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present ClInt (Clinical Interview), a bilingual Spanish-Catalan spoken corpus that contains 15 hours of clinical interviews. It consists of audio files aligned with multiple-level transcriptions comprising orthographic, phonetic and morphological information, as well as linguistic and extralinguistic encoding. This is a previously non-existent resource for these languages and it offers a wide-ranging exploitation potential in a broad variety of disciplines such as Linguistics, Natural Language Processing and related fields.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article introduces EsPal: a Web-accessible repository containing a comprehensive set of properties of Spanish words. EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database. Properties available include word frequency, orthographic structure and neighborhoods, phonological structure and neighborhoods, and subjective ratings such as imageability. Subword structure properties are also available in terms of bigrams and trigrams, bi-phones, and bi-syllables. Lemma and part-of-speech information and their corresponding frequencies are also indexed. The website enables users to either upload a set of words to receive their properties, or to receive a set of words matching constraints on the properties. The properties themselves are easily extensible and will be added over time as they become available. It is freely available from the following website: http://www.bcbl.eu/databases/espal

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El objetivo de este trabajo es reflexionar acerca del empleo de los corpus informatizados. El caso que presentamos está vinculado a un proyecto de I+D sobre la gramaticalización de perífrasis verbales (GRAPEVERBA). Para llevar a cabo este estudio, hemos extraído las ocurrencias de los dos corpus académicos, CORDE and CREA. La falta de una lematización y de un etiquetado en ambos corpus nos ha planteado un problema de difícil solución, puesto que el número de ejemplos obtenido resulta excesivamente elevado. Otro problema tiene que ver con las ediciones textuales de las obras vertidas en los corpus de la Academia, de manera especial en el CORDE. Con cierta frecuencia, estas ediciones no son contemporáneas de los manuscritos originales, lo que compromete seriamente las conclusiones que se extraen acerca de la gramaticalización de algunas perífrasis verbales, por ejemplo de tener + (a/de) + infinitivo.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

During the process of language development, one of the most important tasks that children must face is that of identifying the grammatical category to which words in their language belong. This is essential in order to be able to form grammatically correct utterances. How do children proceed in order to classify words in their language and assign them to their corresponding grammatical category? The present study investigates the usefulness of phonological information for the categorization of nouns in English, given the fact that it is phonology the first source of information that might be available to prelinguistic infants who lack access to semantic information or complex morphosyntactic information. We analyse four different corpora containing linguistic samples of English speaking mothers addressing their children in order to explore the reliability with which words are represented in mothers’ speech based on several phonological criteria. The results of the analysis confirm the prediction that most of the words to which English learning infants are exposed during the first two years of life can be accounted for in terms of their phonological resemblance