942 resultados para New-textual genres
Resumo:
Tesis doctoral con mención europea en procesamiento del lenguaje natural realizada en la Universidad de Alicante por Ester Boldrini bajo la dirección del Dr. Patricio Martínez-Barco. El acto de defensa de la tesis tuvo lugar en la Universidad de Alicante el 23 de enero de 2012 ante el tribunal formado por los doctores Manuel Palomar (Universidad de Alicante), Dr. Paloma Moreda (UA), Dr. Mariona Taulé (Universidad de Barcelona), Dr. Horacio Saggion (Universitat Pompeu Fabra) y Dr. Mike Thelwall (University of Wolverhampton). Calificación: Sobresaliente Cum Laude por unanimidad.
Resumo:
This paper presents the first version of EmotiBlog, an annotation scheme for emotions in non-traditional textual genres such as blogs or forums. We collected a corpus composed by blog posts in three languages: English, Spanish and Italian and about three topics of interest. Subsequently, we annotated our collection and carried out the inter-annotator agreement and a ten-fold cross-validation evaluation, obtaining promising results. The main aim of this research is to provide a finer-grained annotation scheme and annotated data that are essential to perform evaluation focused on checking the quality of the created resources.
Resumo:
This paper presents a preliminary study in which Machine Learning experiments applied to Opinion Mining in blogs have been carried out. We created and annotated a blog corpus in Spanish using EmotiBlog. We evaluated the utility of the features labelled firstly carrying out experiments with combinations of them and secondly using the feature selection techniques, we also deal with several problems, such as the noisy character of the input texts, the small size of the training set, the granularity of the annotation scheme and the language object of our study, Spanish, with less resource than English. We obtained promising results considering that it is a preliminary study.
Resumo:
This paper examines the perception of the characteristics of a set of Textual Genres based on Linguistic Analysis, drawing on a Textual Linguistics investigation and offering Semantics considerations as support. This is an attempt to show how relevant semantic investigations are to extend a perception on the characteristics of Textual Genres, without excluding pragmatic and discourse elements. The texts analyzed in this study were taken from a questionnaire designed to assess freshman and senior university students from courses of Bachelor of Arts in Language, in terms of knowledge about different Textual Genres and their characteristics. The analyses focus on the semantic elements that act in respect of question and answer in the questionnaire, and which include: A Semantics-Pragmatics interface, the considerations of the propositional calculus, the theories of tense and aspect of verbal and semantic primitives. On these terms, it is set a relationship between the cognitive mechanisms that operate in the production and reception of texts and a look at the functions that organize semantic text processing. The main analysis in this paper will concern the interface of Textual Linguistics, from authors as Beaugrande and Dressler (1981) and Adam (2011), with Semantic investigations in terms of meaning processing.
Resumo:
The importance of the new textual genres such as blogs or forum entries is growing in parallel with the evolution of the Social Web. This paper presents two corpora of blog posts in English and in Spanish, annotated according to the EmotiBlog annotation scheme. Furthermore, we created 20 factual and opinionated questions for each language and also the Gold Standard for their answers in the corpus. The purpose of our work is to study the challenges involved in a mixed fact and opinion question answering setting by comparing the performance of two Question Answering (QA) systems as far as mixed opinion and factual setting is concerned. The first one is open domain, while the second one is opinion-oriented. We evaluate separately the two systems in both languages and propose possible solutions to improve QA systems that have to process mixed questions.
Resumo:
The development of the Web 2.0 led to the birth of new textual genres such as blogs, reviews or forum entries. The increasing number of such texts and the highly diverse topics they discuss make blogs a rich source for analysis. This paper presents a comparative study on open domain and opinion QA systems. A collection of opinion and mixed fact-opinion questions in English is defined and two Question Answering systems are employed to retrieve the answers to these queries. The first one is generic, while the second is specific for emotions. We comparatively evaluate and analyze the systems’ results, concluding that opinion Question Answering requires the use of specific resources and methods.
Resumo:
EmotiBlog is a corpus labelled with the homonymous annotation schema designed for detecting subjectivity in the new textual genres. Preliminary research demonstrated its relevance as a Machine Learning resource to detect opinionated data. In this paper we compare EmotiBlog with the JRC corpus in order to check the EmotiBlog robustness of annotation. For this research we concentrate on its coarse-grained labels. We carry out a deep ML experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
The Web 2.0 has resulted in a shift as to how users consume and interact with the information, and has introduced a wide range of new textual genres, such as reviews or microblogs, through which users communicate, exchange, and share opinions. The exploitation of all this user-generated content is of great value both for users and companies, in order to assist them in their decision-making processes. Given this context, the analysis and development of automatic methods that can help manage online information in a quicker manner are needed. Therefore, this article proposes and evaluates a novel concept-level approach for ultra-concise opinion abstractive summarization. Our approach is characterized by the integration of syntactic sentence simplification, sentence regeneration and internal concept representation into the summarization process, thus being able to generate abstractive summaries, which is one the most challenging issues for this task. In order to be able to analyze different settings for our approach, the use of the sentence regeneration module was made optional, leading to two different versions of the system (one with sentence regeneration and one without). For testing them, a corpus of 400 English texts, gathered from reviews and tweets belonging to two different domains, was used. Although both versions were shown to be reliable methods for generating this type of summaries, the results obtained indicate that the version without sentence regeneration yielded to better results, improving the results of a number of state-of-the-art systems by 9%, whereas the version with sentence regeneration proved to be more robust to noisy data.
Resumo:
Nos últimos anos, os estudos relacionados à concepção de ensino e aprendizagem que determina os papéis de professor e de aluno, na busca de coerência entre o que se pensa estar fazendo e o que realmente se faz, foram intensificados, tornando evidente o papel social da escola. Em decorrência disso, as orientações curriculares educacionais introduziram modificações nas suas diretrizes, na perspectiva de despertar nos professores a necessidade de atualização dos seus conceitos e reformulação de suas práticas. Entretanto, embora haja uma explosão de pesquisas e novos materiais didáticos tenham sido elaborados, o ensino de português, na prática, em sala de aula, continua motivo de muitas reflexões, no que diz respeito à aplicação de conceitos e à utilização de métodos. Partindo do princípio de que compreender a língua em seus usos efetivos no cotidiano social deve constituir fonte de orientação basilar para o ensino-aprendizagem, aproximando o professor das atuais teorias que alicerçam a concepção interacionista da linguagem, apresentamos uma proposta de referencial curricular, para os anos finais do ensino fundamental, modalidade EJA, a partir do trabalho com gêneros textuais, fornecendo aos professores orientações e reflexões no formato de sequência didática. Para a fundamentação teórica, utilizaram-se aportes de estudos sobre a linguagem, com orientações teórico-metodológicas desenvolvidas por Schneuwly e Dolz (2011) e Bronckart (1999 e 2003); além dos apontamentos de Koch (2002; 2004 e 2013), Bazerman (2005 e 2007), Bonini (2004), Bakhtin (1992) e Marcuschi (2005 e 2008) sobre a função social dos gêneros e a sua contribuição para o ensino da língua portuguesa. Pretende-se que o trabalho seja uma proposta concreta para o desenvolvimento da competência comunicativa dos alunos da EJA e contribua para a sua formação como cidadãos
Resumo:
As novas tecnologias da informação e da comunicação (TIC) já fazem parte da realidade de um grande quantitativo de pessoas. Mesmo aquelas menos favorecidas economicamente, leem, escrevem, fruem e se comunicam na web, incluindo nossos estudantes. No entanto, muitos educadores afirmam que esses mesmos alunos não têm interesse pela leitura e pela escrita. Geralmente, as leituras e textos produzidos no ciberespaço por esses estudantes são desconsideradas ou menosprezadas no espaço escolar. Diante desse contexto, nosso trabalho propõe compreender como os gêneros discursivos/textuais lidos e/ou produzidos por alunos no ciberespaço são configurados; e não apenas isso, busca refletir sobre possibilidades de a escola inserir gêneros, discurso(s) e linguagem (ns) construídos na Internet, não somente a partir de sua recontextualização na escola, conforme critica Bernstein (1996), mas para além, entendendo esses gêneros e toda a possibilidade que carregam como meios para a interação e para a produção de sentidos diversos. Tendo como base principal as considerações de Bernstein (1996), Bakhtin (1992; 2006; 2008) e da Análise Crítica do Discurso de Fairclough (2001), os objetivos desta dissertação são: (1) identificar traços recorrentes presentes nos gêneros discursivos/textuais com os quais nossos estudantes, alunos de duas escolas públicas, com faixa etária entre os 12 e os 19 anos, estão em contato na web, além de buscar (2) contribuir para a incorporação desses gêneros pela escola, refletindo e discutindo com os próprios alunos acerca de caminhos que podem ser trilhados para que a escola, de alguma maneira, abarque os gêneros circulantes na Internet, sem deixar de lado os gêneros já trabalhados no ambiente escolar
Resumo:
Notre recherche a pour but de déterminer comment les genres textuels peuvent être exploités dans le design des environnements numériques de travail afin de faciliter l’accomplissement des pratiques textuelles de cadres et de secrétaires dans une municipalité et une administration fédérale canadiennes. À cet effet, le premier objectif consiste à évaluer l’aptitude des environnements numériques de travail à supporter les pratiques textuelles (lecture, écriture et manipulation des textes) de ces employés. Le deuxième objectif est de décrire les rôles des genres textuels au cours des pratiques textuelles. Avec l’exemple du courriel, le troisième objectif vise à examiner comment le genre peut être exploité dans une perspective d’assistance à la réalisation des pratiques textuelles dans les environnements numériques de travail. Cette recherche de nature qualitative comporte une méthodologie en deux étapes. La première étape consiste en un examen minutieux des pratiques textuelles, des difficultés rencontrées au cours de celles-ci, du rôle du genre dans les environnements numériques de travail, ainsi que des indices sollicités au cours de la gestion du courriel. Trois modes de collecte des données qualitatives sont utilisés auprès de 17 cadres et de 17 secrétaires issus de deux administrations publiques : l’entrevue semi-dirigée, le journal de bord et l’enquête cognitive. Les résultats sont examinés à l’aide de stratégies d’analyse de contenu qualitative. La deuxième phase comprend la mise au point d’une chaîne de traitement du courriel, visant à étayer notre réflexion sur le genre textuel et son exploitation dans la conception des environnements numériques de travail. Un corpus de 1703 messages est élaboré à partir d’un échantillon remis par deux cadres gouvernementaux. Les résultats permettent d’abord de dresser un portrait général des pratiques de lecture, d’écriture et de manipulation des textes communes et spécifiques aux cadres et aux secrétaires. L’importance du courriel, qui constitue environ 40% des systèmes notés dans les journaux de bord, est soulignée. Les difficultés rencontrées dans les environnements numériques de travail sont également décrites. Dans un deuxième temps, les rôles du genre au cours des pratiques textuelles sont examinés en fonction d’une matrice tenant à la fois compte de ses dimensions individuelles et collectives, ainsi que de ses trois principales facettes ; la forme, le contenu et la fonction. Ensuite, nous présentons un cadre d’analyse des indices affectant la gestion du courriel qui synthétise le processus d’interprétation des messages par le destinataire. Une typologie des patrons de catégorisation des cadres est également définie, puis employée dans une expérimentation statistique visant la description et la catégorisation automatique du courriel. Au terme de ce processus, on observe des comportements linguistiques marqués en fonction des catégories du courriel. Il s’avère également que la catégorisation automatique basée sur le lexique des messages est beaucoup plus performante que la catégorisation non lexicale. À l’issue de cette recherche, nous suggérons d’enrichir le paradigme traditionnel relevant de l’interaction humain-ordinateur par une sémiotique du genre dans les environnements numériques de travail. L’étude propose également une réflexion sur l’appartenance du courriel à un genre, en ayant recours aux concepts théoriques d’hypergenre, de genre et de sous-genre. Le succès de la catégorisation automatique du courriel en fonction de facettes tributaires du genre (le contenu, la forme et la fonction) offre des perspectives intéressantes sur l’application de ce concept au design des environnements numériques de travail en vue de faciliter l’accomplissement des pratiques textuelles par les employés.
Resumo:
The sociocultural changes that led to the genesis of Romance languages widened the gap between oral and written patterns, which display different discoursive and linguistic devices. In early documents, discoursive implicatures connecting propositions were not generally codified, so that the reader should furnish the correct interpretation according to his own perception of real facts; which can still be attested in current oral utterances. Once Romance languages had undergone several levelling processes which concluded in the first standardizations, implicatures became explicatures and were syntactically codified by means of univocal new complex conjunctions. As a consequence of the emergence of these new subordination strategies, a freer distribution of the information conveyed by the utterances is allowed. The success of complex structural patterns ran alongside of the genesis of new narrative genres and the generalization of a learned rhetoric. Both facts are a spontaneous effect of new approaches to the act of reading. Ancient texts were written to be read to a wide audience, whereas those printed by the end of the XV th century were conceived to be read quietly, in a low voice, by a private reader. The goal of this paper is twofold, since we will show that: a) The development of new complex conjunctions through the history of Romance languages accommodates to four structural patterns that range from parataxis to hypotaxis. b) This development is a reflex of the well known grammaticalization path from discourse to syntax that implies the codification of discoursive strategies (Givón 2 1979, Sperber and Wilson 1986, Carston 1988, Grice 1989, Bach 1994, Blackemore 2002, among others]
Resumo:
The diachronic studies marked the first decades of the 20th century in Brazilian linguistics, passing by an ostracism period after the 50s. Mainly from the 90s especially with the project, created in 1997, Para a história do português brasileiro (PHPB), which has systemized, in national ambit, the programme related to the area of diachrony the historical studies retake forces and have gradually increased since then. Our work is set in the new scene of Brazilian historical linguistics and it is associated to two research programmes: i) the constitution of a diachronic corpus; ii) the diachrony of text and discourse. As regards the first programme, we made effort to constitute a diachronic corpus of official letters about Rio Grande do Norte, we called cartas oficiais norte-rio-grandenses, written in 18th, 19th and 20th centuries. The chosen for bureaucrat letters occurred for they represent a textual category very productive in historical contexts, mostly 18th and 19th centuries, in which the command of writing was the least and also because they bring, almost always explicitly, the information of where, when, for whom and from whom, as remembers Fonseca (2003). The rules for constituting the corpus were based, although not strictly, on the orientation from PHPB. In respect to the second programme, we set up on the ideas of coserian base came from the studies on discourse traditions (TD) (Koch, 1997; Kabatek, 2006) amongst which that the texts are shaped so as to follow their own tradition (Coseriu, 2007), and we turned to Diplomatics (Belloto, 2002) in order to do the characterization of this corpus by the application of concepts from Diplomatics and TD as well as by the presentation of the structures that form those official letters: their textual genres, a kind of TD, with their macrostructures; and some of their formulaic expressions (microstructures), another sort of TD. This stage of characterizing will pay attention, as far as possible, to the dynamic between tradition and innovation that happen in the actualization of those textual structures along the centuries. This work intends to contribute with the researches connected to Historical Linguistics in Rio Grande do Norte, more specifically the ones related to the constitution of diachronic corpora and to TD; and with the study of official documents, textual category about which there are almost no studies (cf. Silveira, 2007)