995 resultados para Text Corpus
Resumo:
Aquest document conté "conversa 05", un "xerrada al menjador després d'un partit de futbol" que forma part del Corpus Oral de Conversa Col·loquial (COC). El COC és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d¿Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit Digital de la UB (http://diposit.ub.edu) o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté "conversa 06", un "sopar i cafè al menjador" que formen part del Corpus Oral de Conversa Col·loquial (COC). El COC és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d¿Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit Digital de la UB (http://diposit.ub.edu) o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté "conversa 07", una conversa "abans de dinar al menjador" que forma part del Corpus Oral de Conversa Col·loquial (COC). El COC és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit Digital de la UB (http://diposit.ub.edu) o a través del web del CCCUB (http://www.ub.edu/cccu
Resumo:
Clàudia Pons, Joaquim Viaplana (ed.) Corpus oral dialectal (COD). Textos orals del balear.PROJECTE: Explotació d'un corpus oral dialectal: anàlisi de la variació lingüística i desenvolupament d'aplicacions informàtiques per a la transcripció automatitzada (ECOD) [HUM2007 65531/FILO]
Resumo:
Aquest document conté la transcripció fonètica, la fonoortogràfica i l'arxiu de so d'un fragment de conversa lliure amb un informant de Móra d'Ebre que forma part del Corpus Oral Dialectal (COD). El COD és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit UB o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté la transcripció fonètica, la fonoortogràfica i l'arxiu de so d'un fragment de conversa lliure amb un informant de Tamarit de Llitera que forma part del Corpus Oral Dialectal (COD). El COD és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit UB o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté la transcripció fonètica, la fonoortogràfica i l'arxiu de so d'un fragment de conversa lliure amb un informant de Sort que forma part del Corpus Oral Dialectal (COD). El COD és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit UB o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Aquest document conté la transcripció fonètica, la fonoortogràfica i l'arxiu de so d'un fragment de conversa lliure amb un informant de Tàrrega que forma part del Corpus Oral Dialectal (COD). El COD és un component del Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB), un arxiu de corpus de llengua catalana oral contemporània que ha estat confegit pel grup de recerca Grup d'Estudi de la Variació (GEV) amb la finalitat de contribuir a l'estudi de la variació dialectal, social i funcional en la llengua catalana. Aquest i altres materials del CCCUB són accessibles directament al Dipòsit UB o a través del web del CCCUB (http://www.ub.edu/cccub).
Resumo:
Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.
Resumo:
The purpose of this PhD thesis is to investigate a semantic relation present in the connection of sentences (more specifically: propositional units). This relation, which we refer to as contrast, includes the traditional categories of adversatives - predominantly represented by the connector but in English and pero in Modern Spanish - and concessives, prototypically verbalised through although / aunque. The aim is to describe, analyse and - as far as possible - to explain the emergence and evolution of different syntactic schemes marking contrast during the first three centuries of Spanish (also referred to as Castilian) as a literary language, i.e., from the 13th to the 15th century. The starting point of this question is a commonplace in syntax, whereby the semantic and syntactic complexity of clause linkage correlates with the degree of textual elaboration. In historical linguistics, i.e., applied to the phylogeny of a language, it is commonly referred to as the parataxis hypothesis A crucial part of the thesis is dedicated by the definition of contrast as a semantic relation. Although the label contrast has been used in this sense, mainly in functional grammar and text linguistics, mainstream grammaticography and linguistics remain attached to the traditional categories adversatives and concessives. In opposition to this traditional view, we present our own model of contrast, based on a pragma-semantic description proposed for the analysis of adversatives by Oswald Ducrot and subsequently adopted by Ekkehard König for the analysis of concessives. We refine and further develop this model in order for it to accommodate all, not just the prototypical instances of contrast in Spanish, arguing that the relationship between adversatives and concessives is a marked opposition, i.e., that the higher degree of semantic and syntactic integration of concessives restricts some possible readings that the adversatives may have, but that this difference is almost systematically neutralised by contextual factors, thus justifying the assumption of contrast as a comprehensive onomasiological category. This theoretical focus is completed by a state-of-the-question overview attempting to account for all relevant forms in which contrast is expressed in Medieval Spanish, with the aid of lexicographic and grammaticographical sources, and an empirical study investigating the expression of corpus in a corpus study on the textual functions of contrast in nine Medieval Spanish texts: Cantar de Mio Cid, Libro de Alexandre, Milagros de Nuestra Sehora, Estoria de Espana, Primera Partida, Lapidario, Libro de buen amor, Conde Lucanor, and Corbacho. This corpus is analysed using quantitative and qualitative tools, and the study is accompanied by a series of methodological remarks on how to investigate a pragma-semantic category in historical linguistics. The corpus study shows that the parataxis hypothesis fails to prove from a statistical viewpoint, although a qualitative analysis shows that the use of subordination does increase over time in some particular contexts.
Resumo:
In this paper we present ClInt (Clinical Interview), a bilingual Spanish-Catalan spoken corpus that contains 15 hours of clinical interviews. It consists of audio files aligned with multiple-level transcriptions comprising orthographic, phonetic and morphological information, as well as linguistic and extralinguistic encoding. This is a previously non-existent resource for these languages and it offers a wide-ranging exploitation potential in a broad variety of disciplines such as Linguistics, Natural Language Processing and related fields.
Resumo:
The advertising poster's main characteristic is the ability to convey a commercial message quickly and publicly thanks to its straightforward image and text. The young people, being the tobacco industry's principal target, are particularly exposed to these messages. This kind of advertisement becomes a mean of counterstroke as soon as the cigarette's harmfulness is acknowledged. Some of the cigarette industry's strategies can be revealed by the historical analysis of a 253 posters corpus selected among the main cigarette manufacturers in Switzerland at the time of the post-war economic boom. With the misuse of sport's theme, the overvaluation of the filter's efficiency, the use of a vocabulary that implies lightness and by erasing the image of smoke in its advertisement, the industry tries to reassure the smoker wrongly.
Resumo:
BACKGROUND: Shared Decision Making (SDM) is increasingly advocated as a model for medical decision making. However, there is still low use of SDM in clinical practice. High impact factor journals might represent an efficient way for its dissemination. We aimed to identify and characterize publication trends of SDM in 15 high impact medical journals. METHODS: We selected the 15 general and internal medicine journals with the highest impact factor publishing original articles, letters and editorials. We retrieved publications from 1996 to 2011 through the full-text search function on each journal website and abstracted bibliometric data. We included publications of any type containing the phrase "shared decision making" or five other variants in their abstract or full text. These were referred to as SDM publications. A polynomial Poisson regression model with logarithmic link function was used to assess the evolution across the period of the number of SDM publications according to publication characteristics. RESULTS: We identified 1285 SDM publications out of 229,179 publications in 15 journals from 1996 to 2011. The absolute number of SDM publications by journal ranged from 2 to 273 over 16 years. SDM publications increased both in absolute and relative numbers per year, from 46 (0.32% relative to all publications from the 15 journals) in 1996 to 165 (1.17%) in 2011. This growth was exponential (P < 0.01). We found fewer research publications (465, 36.2% of all SDM publications) than non-research publications, which included non-systematic reviews, letters, and editorials. The increase of research publications across time was linear. Full-text search retrieved ten times more SDM publications than a similar PubMed search (1285 vs. 119 respectively). CONCLUSION: This review in full-text showed that SDM publications increased exponentially in major medical journals from 1996 to 2011. This growth might reflect an increased dissemination of the SDM concept to the medical community.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.