958 resultados para scomputational linguistics
Resumo:
Aquest document conté el text POL2, una "sessió parlamentària" que forma part del Corpus Oral de Registres (COR). El COR és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit Digital de la UB (http://diposit.ub.edu) o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté la transcripció fonètica, la fonoortogràfica i l'arxiu de so d'un fragment de conversa lliure amb un informant de Móra d'Ebre que forma part del Corpus Oral Dialectal (COD). El COD és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit UB o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté la transcripció fonètica, la fonoortogràfica i l'arxiu de so d'un fragment de conversa lliure amb un informant de Tamarit de Llitera que forma part del Corpus Oral Dialectal (COD). El COD és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit UB o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté la transcripció fonètica, la fonoortogràfica i l'arxiu de so d'un fragment de conversa lliure amb un informant de Tàrrega que forma part del Corpus Oral Dialectal (COD). El COD és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit UB o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté el text POL3, un "ple d'ajuntament" que forma part del Corpus Oral de Registres (COR). El COR és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit Digital de la UB (http://diposit.ub.edu) o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
In this article, I address epistemological questions regarding the status of linguistic rules and the pervasive--though seldom discussed--tension that arises between theory-driven object perception by linguists on the one hand, and ordinary speakers' possible intuitive knowledge on the other hand. Several issues will be discussed using examples from French verb morphology, based on the 6500 verbs from Le Petit Robert dictionary (2013).
Resumo:
The purpose of this PhD thesis is to investigate a semantic relation present in the connection of sentences (more specifically: propositional units). This relation, which we refer to as contrast, includes the traditional categories of adversatives - predominantly represented by the connector but in English and pero in Modern Spanish - and concessives, prototypically verbalised through although / aunque. The aim is to describe, analyse and - as far as possible - to explain the emergence and evolution of different syntactic schemes marking contrast during the first three centuries of Spanish (also referred to as Castilian) as a literary language, i.e., from the 13th to the 15th century. The starting point of this question is a commonplace in syntax, whereby the semantic and syntactic complexity of clause linkage correlates with the degree of textual elaboration. In historical linguistics, i.e., applied to the phylogeny of a language, it is commonly referred to as the parataxis hypothesis A crucial part of the thesis is dedicated by the definition of contrast as a semantic relation. Although the label contrast has been used in this sense, mainly in functional grammar and text linguistics, mainstream grammaticography and linguistics remain attached to the traditional categories adversatives and concessives. In opposition to this traditional view, we present our own model of contrast, based on a pragma-semantic description proposed for the analysis of adversatives by Oswald Ducrot and subsequently adopted by Ekkehard König for the analysis of concessives. We refine and further develop this model in order for it to accommodate all, not just the prototypical instances of contrast in Spanish, arguing that the relationship between adversatives and concessives is a marked opposition, i.e., that the higher degree of semantic and syntactic integration of concessives restricts some possible readings that the adversatives may have, but that this difference is almost systematically neutralised by contextual factors, thus justifying the assumption of contrast as a comprehensive onomasiological category. This theoretical focus is completed by a state-of-the-question overview attempting to account for all relevant forms in which contrast is expressed in Medieval Spanish, with the aid of lexicographic and grammaticographical sources, and an empirical study investigating the expression of corpus in a corpus study on the textual functions of contrast in nine Medieval Spanish texts: Cantar de Mio Cid, Libro de Alexandre, Milagros de Nuestra Sehora, Estoria de Espana, Primera Partida, Lapidario, Libro de buen amor, Conde Lucanor, and Corbacho. This corpus is analysed using quantitative and qualitative tools, and the study is accompanied by a series of methodological remarks on how to investigate a pragma-semantic category in historical linguistics. The corpus study shows that the parataxis hypothesis fails to prove from a statistical viewpoint, although a qualitative analysis shows that the use of subordination does increase over time in some particular contexts.
Resumo:
[spa] El artículo plantea una breve revisión de la idea de ironía desde el punto de vista de la retórica y sus derivaciones en algunos de los estudios lingüísticos del siglo XX. Se parte de la clasificación tradicional de la ironía socrática, la ironía retórica y la ironía romántica para centrar el análisis en aspectos básicos del fenómeno irónico tales como la oposición, la verosimilitud, la complicidad con el intérprete o el papel desempeñado por el contexto. [eng] This article is a brief review of the concept of irony from the point of view of rhetoric and its influences in some of the twentieth century linguistic studies. The review begins with the traditional classification of Socratic irony, rhetoric irony and romantic irony in order to focus the analysis on some fundamental elements of ironic phenomenon such as opposition, verisimilitude, camaraderie with the interpreter or the role played by the context.
Resumo:
In this paper we present ClInt (Clinical Interview), a bilingual Spanish-Catalan spoken corpus that contains 15 hours of clinical interviews. It consists of audio files aligned with multiple-level transcriptions comprising orthographic, phonetic and morphological information, as well as linguistic and extralinguistic encoding. This is a previously non-existent resource for these languages and it offers a wide-ranging exploitation potential in a broad variety of disciplines such as Linguistics, Natural Language Processing and related fields.
Resumo:
CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition.
Resumo:
This article introduces EsPal: a Web-accessible repository containing a comprehensive set of properties of Spanish words. EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database. Properties available include word frequency, orthographic structure and neighborhoods, phonological structure and neighborhoods, and subjective ratings such as imageability. Subword structure properties are also available in terms of bigrams and trigrams, bi-phones, and bi-syllables. Lemma and part-of-speech information and their corresponding frequencies are also indexed. The website enables users to either upload a set of words to receive their properties, or to receive a set of words matching constraints on the properties. The properties themselves are easily extensible and will be added over time as they become available. It is freely available from the following website: http://www.bcbl.eu/databases/espal
Resumo:
This article examines the mainstream categorical definition of coreference as "identity of reference." It argues that coreference is best handled when identity is treated as a continuum, ranging from full identity to non-identity, with room for near-identity relations to explain currently problematic cases. This middle ground is needed to account for those linguistic expressions in real text that stand in relations that are neither full coreference nor non-coreference, a situation that has led to contradictory treatment of cases in previous coreference annotation efforts. We discuss key issues for coreference such as conceptual categorization, individuation, criteria of identity, and the discourse model construct. We redefine coreference as a scalar relation between two (or more) linguistic expressions that refer to discourse entities considered to be at the same granularity level relevant to the linguistic and pragmatic context. We view coreference relations in terms of mental space theory and discuss a large number of real life examples that show near-identity at different degrees.
Resumo:
Lexical diversity measures are notoriously sensitive to variations of sample size and recent approaches to this issue typically involve the computation of the average variety of lexical units in random subsamples of fixed size. This methodology has been further extended to measures of inflectional diversity such as the average number of wordforms per lexeme, also known as the mean size of paradigm (MSP) index. In this contribution we argue that, while random sampling can indeed be used to increase the robustness of inflectional diversity measures, using a fixed subsample size is only justified under the hypothesis that the corpora that we compare have the same degree of lexematic diversity. In the more general case where they may have differing degrees of lexematic diversity, a more sophisticated strategy can and should be adopted. A novel approach to the measurement of inflectional diversity is proposed, aiming to cope not only with variations of sample size, but also with variations of lexematic diversity. The robustness of this new method is empirically assessed and the results show that while there is still room for improvement, the proposed methodology considerably attenuates the impact of lexematic diversity discrepancies on the measurement of inflectional diversity.
Resumo:
Aquest treball pretén ampliar els estudis relacionats amb la lingüística cognitiva en la llengua catalana, en aquest cas en el camp d'experiència de la publicitat televisiva, i complementar els existents sobre el llenguatge publicitari i la comunicació dels mitjans audiovisuals.