966 resultados para Textual data
Resumo:
This paper describes a project called “Development of educational workshops on text reading, interpretation and writing in elementary school”, which took place at the São Paulo State University (UNESP) with financial support given by the PROEX-UNESP (Pro-Rectorate of Extension). This project aimed to organize and run educational workshops on reading, interpreting and writing different genres for students enrolled at a public elementary school in São José do Rio Preto, São Paulo state. The analysis of the texts produced by the students unfolded the project into two essential approaches. In the first one, it was possible to identify problems and inaccuracies in language usage, which was the starting point to prepare the minicourses that would be offered. These mini-courses promoted a deep involvement of undergraduate students (in Portuguese Language and Literature) with the practice of Portuguese teaching at the school. In the second one, 5.468 texts, which were produced during the four-year project, founded researches whose goal is to describe processes in which there is a relation between speech and writing, and are based on a theoretical framework that values the multiplicity of literacies associated with social practices experienced by the students. Thus, this extension project aimed to articulate the service to the external community – in this case, public school students - to the internal community – undergraduate students in Portuguese Language and Literature.
Resumo:
The Vernacular Discourse of the "Arab Spring" is a project that bridges the divide between the East and the West by offering new readings to Arab subjectivities. Through an analysis of the "Arab Spring" through the lens of vernacular discourse, it challenges the Euro-Americo-centric legacies of Orientalism in Western academia and the new wave of extremism in the Arab world by offering alternative representations of Arab bodies and subjectivities. To offer this new reading of the "Arab Spring," it explores the foundations of critical rhetoric as a theory and a practice and argues for a turn towards a critical vernacular discourse. The turn towards critical vernacular discourse is important as it urges the analyses of different artifacts produced by marginalized groups in order to understand their perspectives that have largely been foreclosed in traditional cultural studies research. Building on embodied/performative critical rhetoric, the vernacular discourses of the Arab revolutionary body examines other forms of knowledge productions that are not merely textual; more specifically, through data gathered in the Lhbib Bourguiba, Tunisia. This analysis of the political revolutionary body unveils the complexity underlining the discussion around issues of identity, agency and representation in the Middle East and North Africa, and calls for a critical study towards these issues in the region beyond the binary approach that has been practiced and applied by academics and media analysts. Hence, by analyzing vernacular discourse, this research locates a method of examining and theorizing the dialectic between agency, citizenry, and subjectivity through the study of how power structure is recreated and challenged through the use of the vernacular in revolutionary movements, as well as how marginalized groups construct their own subjectivities through the use of vernacular discourse. Therefore, highlighting the political prominence of evaluating the Arab Spring as a vernacular discourse is important in creating new ways of understanding communication in postcolonial/neocolonial settings.
Resumo:
This paper presents the first version of EmotiBlog, an annotation scheme for emotions in non-traditional textual genres such as blogs or forums. We collected a corpus composed by blog posts in three languages: English, Spanish and Italian and about three topics of interest. Subsequently, we annotated our collection and carried out the inter-annotator agreement and a ten-fold cross-validation evaluation, obtaining promising results. The main aim of this research is to provide a finer-grained annotation scheme and annotated data that are essential to perform evaluation focused on checking the quality of the created resources.
Resumo:
Tese de doutoramento, Linguística (Linguística Educacional), Universidade de Lisboa, Faculdade de Letras, 2016
Resumo:
This report sheds light on the fundamental questions and underlying tensions between current policy objectives, compliance strategies and global trends in online personal data processing, assessing the existing and future framework in terms of effective regulation and public policy. Based on the discussions among the members of the CEPS Digital Forum and independent research carried out by the rapporteurs, policy conclusions are derived with the aim of making EU data protection policy more fit for purpose in today’s online technological context. This report constructively engages with the EU data protection framework, but does not provide a textual analysis of the EU data protection reform proposal as such.
Resumo:
The present work studies the overall structuring of radio news discourse via investigating three metatextual/interactive functions: (1) Discourse Organizing Elements (DOEs), (2) Attribution and (3) Sentential and Nominal Background Information (SBI & NBI). An extended corpus of about 73,000 words from BBC and Radio Damascus news is used to study DOEs and a restricted corpus of 38,000 words for Attribution and S & NBI. A situational approach is adopted to assess the influence of factors such as medium and audience on these functions and their frequence. It is found that: (1) DOEs are organizational and their frequency is determined by length of text; (2) Attribution Function in accordance with the editor's strategy and its frequency is audience sensitive; and (3) BI provides background information and is determined by audience and news topics. Secondly, the salient grammatical elements in DOEs are discourse deictic demonstratives, address pronouns and nouns referring to `the news'. Attribution is realized in reporting/reported clauses, and BI in a sentence, a clause or a nominal group. Thirdly, DOEs establish a hierarchy of (1) news, (2) summary/expansion and (3) item: including topic introduction and details. While Attribution is generally, and SBI solely, a function of detailing, NBI and proper names are generally a function of summary and topic introduction. Being primarily addressed to audience and referring metatextually, the functions investigated support Sinclair's interactive and autonomous planes of discourse. They also shed light on the part(s) of the linguistic system which realize the metatextual/interactive function. Strictly, `discourse structure' inevitably involves a rank-scale; but news discourse also shows a convention of item `listing'. Hence only within the boundary of variety (ultimately interpreted across language and in its situation) can textual functions and discourse structure be studied. Finally, interlingual variety study provides invaluable insights into a level of translation that goes beyond matching grammatical systems or situational factors, an interpretive level which has to be described in linguistic analysis of translation data.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
This doctoral dissertationproposes the description, interpretation and analysis of the compositional structure of thesis and dissertation abstracts, with regard to the linguistic mechanisms that evidence text zones of different typological sequences, such as those of the text plan. Along these lines, the research problem was developed from the notion of compositional structure (sequences and text plans), as one of the levels or plans of text analysis, according to the theoretical framework proposed by Jean-Michel Adam (2011a). The main objective of this study was to recognize how the compositional structure, of thesis and dissertation abstracts, is achieved, with respect to text units and the global organization of this text category. The hypothesis posed in this research posits that specific informational text composition categories of abstracts are necessary to process the representation of the original text and the way in which it makes its meaning. Subsequently, this study is based on the theoretical and methodological framework of Text Linguistics (TL) and, above all, Textual Discourse Analysis (TDA), as we endeavor to understand the organizational structure of abstracts from both a linguistic and textual perspective. This structure involves the text plan of abstracts, with respect to their communicative purpose, i.e, the sharing of scientific information in its standard textual form. Thus, the development of this study, from a theoretical and methodological perspective, is based on the theoretical and descriptive premises from TDA (ADAM, 2011a, 2012; PASSEGGI et al., 2010), and also from TL (BEAUGRANDE; DRESSLER, (2012 [1981]); COSERIU; LAMAS (2010); MARCUSCHI, 2009 [1983]; FÁVERO; KOCH, 1994;KOCH, 2006; BENTES, 2004; BENTES; LEITE, 2010), within the field of text studies. The methodology of this study relies on empirical, documental research, which is qualitative, and adopts a descriptive and interpretive approach. From the empirical perspective, our objective is to understand the problems pertaining to the textual composition of abstracts, aiming to elucidate them in light of the theoretical and methodological framework previously mentioned. The corpus of the analysis is comprised of seven abstracts designated for systematic data collection. These texts, written between 2004 and 2011,were selected from Master’s theses and Doctoral dissertations in their electronic version, from the graduate program at the Federal University of Rio Grande do Norte. A thorough review of the literature reveals a clear fluctuation in the terminology of the concept, ‘abstract’. The results of the analysis revealed that the abstracts, which comprise the corpus of analysis in this study, in general, present typological heterogeneity, while the text plan remains fixed. Finally, the new knowledge gained in this research contributes both to the understanding of the compositional structure of abstracts as well as their production.
Resumo:
This thesis investigates materialization strategies of non-assumption of enunciation responsibility and inscription of an authorial voice in scientific articles produced by initial researchers in Linguistics. The specific focus lays on identify, describe and interpret: i) linguistics marks that assign enunciation responsibility; ii) the positions taken by the first speaker-enunciator (L1/E1) in relation to points of view (PoV) imputed to second enunciators (e2); and iii) the linguistic marks that assign the formulation of themselves' PoV. As a practical deployment, it is proposed to discuss how to teach taking into account text discursive strategies regarding to enunciation responsibility and also authorship in academic and scientific texts. Our research corpus is formed by eight scientific essays and they were selected in a renamed Linguistics scientific magazine which is high evaluated by Qualis/CAPES (Brazil Science Agency). The methodology follows the assumptions of a qualitative research, and an it has such an interpretative basis, even though it takes support in a quantitative approach, too. Theoretically, we based this research on Textual Analysis of Speech and linguistics theories about linguistic enunciation area. The results show two kinds of movements in PoV management: imputation and responsibility. In imputation contexts, the most recursive linguistic marks were reported speech, indirect speech, reported speech with “that”, modalization in reported speech (in enunciation with “according to”, “in agreement with”, “for”), beyond that we see certain points of non-coincidences of speech, specifically the non-coincidence of the speech itself. The way those linguistic marks occur in the text point out three kinds of enunciation positions that are assumed by L1/E1 in relation to PoV of e2: agreement, disagreement and a pseudo neutrality. It was clearly recursive the imputation followed by agreement (explicit or not), this perspective puts other’s voices to defend a speech assumed like own authorship. In speech responsibility contexts, we observed such a formulation of inner PoV that results from theoretical findings undertaken by novice researchers (revealing how he/she interpreted concepts of the theory) or arising from their research data, allowing them to express with more autonomy and without reporting to speeches from e2. Based on those data, we can say that, in text by initial researchers, the authorship is strongly built upon PoV and also dependent from others' words (theory and the scholars quoted there), taking into account that many contexts in which we can observe agreement position, PoV formulations with words taken from e2 and assumed as own words by syntactic integration, the comments about what the other says, the absence of explanations and additions, as well as a data analysis that could show agreement with the theory used to support the work. These results allow us to visualize how initial researcher dialogs with the theoretical enunciation sources he or she takes as support and how he/she displays the status of a subject doing a research and positioning himself/herself as a researcher/author in the scientific field. In assuming the reported speech, when quoting, as a resource that allows the enunciation responsibility and also when doing evidence to the positions of speaker-enunciator in relation do reported PoV, this suggests to a textual-discursive treatment of quoting in academic and scientific text, in a context of teaching that gives attention to the development of communication skills of initial researcher and that can contribute to insert and interact students in the scientific field.
Resumo:
This thesis investigates materialization strategies of non-assumption of enunciation responsibility and inscription of an authorial voice in scientific articles produced by initial researchers in Linguistics. The specific focus lays on identify, describe and interpret: i) linguistics marks that assign enunciation responsibility; ii) the positions taken by the first speaker-enunciator (L1/E1) in relation to points of view (PoV) imputed to second enunciators (e2); and iii) the linguistic marks that assign the formulation of themselves' PoV. As a practical deployment, it is proposed to discuss how to teach taking into account text discursive strategies regarding to enunciation responsibility and also authorship in academic and scientific texts. Our research corpus is formed by eight scientific essays and they were selected in a renamed Linguistics scientific magazine which is high evaluated by Qualis/CAPES (Brazil Science Agency). The methodology follows the assumptions of a qualitative research, and an it has such an interpretative basis, even though it takes support in a quantitative approach, too. Theoretically, we based this research on Textual Analysis of Speech and linguistics theories about linguistic enunciation area. The results show two kinds of movements in PoV management: imputation and responsibility. In imputation contexts, the most recursive linguistic marks were reported speech, indirect speech, reported speech with “that”, modalization in reported speech (in enunciation with “according to”, “in agreement with”, “for”), beyond that we see certain points of non-coincidences of speech, specifically the non-coincidence of the speech itself. The way those linguistic marks occur in the text point out three kinds of enunciation positions that are assumed by L1/E1 in relation to PoV of e2: agreement, disagreement and a pseudo neutrality. It was clearly recursive the imputation followed by agreement (explicit or not), this perspective puts other’s voices to defend a speech assumed like own authorship. In speech responsibility contexts, we observed such a formulation of inner PoV that results from theoretical findings undertaken by novice researchers (revealing how he/she interpreted concepts of the theory) or arising from their research data, allowing them to express with more autonomy and without reporting to speeches from e2. Based on those data, we can say that, in text by initial researchers, the authorship is strongly built upon PoV and also dependent from others' words (theory and the scholars quoted there), taking into account that many contexts in which we can observe agreement position, PoV formulations with words taken from e2 and assumed as own words by syntactic integration, the comments about what the other says, the absence of explanations and additions, as well as a data analysis that could show agreement with the theory used to support the work. These results allow us to visualize how initial researcher dialogs with the theoretical enunciation sources he or she takes as support and how he/she displays the status of a subject doing a research and positioning himself/herself as a researcher/author in the scientific field. In assuming the reported speech, when quoting, as a resource that allows the enunciation responsibility and also when doing evidence to the positions of speaker-enunciator in relation do reported PoV, this suggests to a textual-discursive treatment of quoting in academic and scientific text, in a context of teaching that gives attention to the development of communication skills of initial researcher and that can contribute to insert and interact students in the scientific field.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, specially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships.
Resumo:
Trata-se de um estudo exploratório-descritivo, com abordagem qualitativa, que teve por objetivo analisar as mensagens, acerca da promoção da saúde sexual e reprodutiva, produzidas por adolescentes de escolas públicas e particulares da cidade do Rio Grande, num concurso de redação e música promovido pelo Grupo Gestor Municipal (GGM) do Projeto Saúde e Prevenção nas Escolas (SPE), nos anos de 2007 e 2008. Após autorização pelo GGM para realização deste estudo, foram disponibilizadas para reprodução, via xérox, as 29 redações e as três letras de músicas inscritas nos concursos. Para o tratamento dos dados utilizou-se a técnica de análise de conteúdo na modalidade temática. Participaram 35 adolescentes, sendo 25 moças e dez rapazes, com idades entre onze e dezessete anos. Quanto à escolaridade, dois frequentavam a quinta série; doze a sexta, doze a sétima e nove a oitava. Apreendeu-se que, em sua produção textual, os(as) adolescentes revelaram as vulnerabilidades e fortalezas referentes à saúde sexual e reprodutiva. Entre os inúmeros fatores que aumentam a vulnerabilidade individual, social e programática, discorreram sobre a carência de informações, a dificuldade para transformar o conhecimento em prática, a sensação de imunidade, a violência familiar, a conduta repressora de pais e mães, as mensagens de cunho sexual veiculadas pela mídia, a necessidade de serem aceitos(as) pelo grupo, preconceitos, e falta de ações governamentais direcionadas a adolescentes. No que se refere às fortalezas, sabem que a informação é uma importante aliada para a promoção da saúde sexual e reprodutiva citando, entre as fontes acessíveis, os serviços públicos de saúde, a família e a escola. Demonstraram conhecimento acerca da alarmante propagação da epidemia da AIDS entre jovens, conhecendo os sinais e sintomas das DSTs mais comuns e as formas de prevenção. As moças enfatizaram a necessidade de compartilhar a responsabilidade preventiva com os rapazes, bem como de amor próprio e respeito mútuo. O acesso aos serviços de saúde também foi apresentado como indispensável ao adolescer saudável. Os(as) jovens demonstraram conhecimento sobre drogas seus efeitos e consequências. Referem-se à adolescência como um período gostoso, repleto de dúvidas, mas também cheio de potencialidades. Assim, os mesmos componentes apresentados como desencadeadores de vulnerabilidade podem torná-los(as) fortes e capazes de superar os desafios comuns a essa etapa da vida. Para que tal superação ocorra, é necessário que tenham acesso à informação e a problematizem; que sejam capazes de incorporá-las ao cotidiano, adotando práticas protegidas e protetoras; que haja diálogo, despido de tabus, censuras e preconceitos no ambiente familiar; que as escolas adotem de forma transversalizada temáticas referentes à saúde sexual e reprodutiva; que os serviços de saúde tenham infraestrutura para assegurar os direitos contidos no Estatuto da Criança e do Adolescente; entre outras estratégias fortalecedoras.
Resumo:
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.