5 resultados para Text summarization

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We review recent visualization techniques aimed at supporting tasks that require the analysis of text documents, from approaches targeted at visually summarizing the relevant content of a single document to those aimed at assisting exploratory investigation of whole collections of documents.Techniques are organized considering their target input materialeither single texts or collections of textsand their focus, which may be at displaying content, emphasizing relevant relationships, highlighting the temporal evolution of a document or collection, or helping users to handle results from a query posed to a search engine.We describe the approaches adopted by distinct techniques and briefly review the strategies they employ to obtain meaningful text models, discuss how they extract the information required to produce representative visualizations, the tasks they intend to support and the interaction issues involved, and strengths and limitations. Finally, we show a summary of techniques, highlighting their goals and distinguishing characteristics. We also briefly discuss some open problems and research directions in the fields of visual text mining and text analytics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Assuming that textbooks give literary expression to cultural and ideological values of a nation or group, we propose the analysis of chemistry textbooks used in Brazilian universities throughout the twentieth century. We analyzed iconographic and textual aspects of 31 textbooks which had significant diffusion in the context of Brazilian universities at that period. As a result of the iconographic analysis, nine categories of images were proposed: (1) laboratory and experimentation, (2) industry and production, (3) graphs and diagrams, (4) illustrations related to daily life, (5) models, (6) illustrations related to the history of science, (7) pictures or diagrams of animal, vegetable or mineral samples, (8) analogies and (9) concepts of physics. The distribution of images among the categories showed a different emphasis in the presentation of chemical content due to a commitment to different conceptions of chemistry over the period. So, we started with chemistry as an experimental science in the early twentieth century, with an emphasis change to the principles of chemistry from the 1950s, culminating in a chemistry of undeniable technological influence. Results showed that reflections not only on the history of science, but on the history of science education, may be useful for the improvement of science education.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: In normal aging, the decrease in the syntactic complexity of written production is usually associated with cognitive deficits. This study was aimed to analyze the quality of older adults' textual production indicated by verbal fluency (number of words) and grammatical complexity (number of ideas) in relation to gender, age, schooling, and cognitive status. Methods: From a probabilistic sample of community-dwelling people aged 65 years and above (n = 900), 577 were selected on basis of their responses to the Mini-Mental State Examination (MMSE) sentence writing, which were submitted to content analysis; 323 were excluded as they left the item blank or performed illegible or not meaningful responses. Education adjusted cut-off scores for the MMSE were used to classify the participants as cognitively impaired or unimpaired. Total and subdomain MMSE scores were computed. Results: 40.56% of participants whose answers to the MMSE sentence were excluded from the analyses had cognitive impairment compared to 13.86% among those whose answers were included. The excluded participants were older and less educated. Women and those older than 80 years had the lowest scores in the MMSE. There was no statistically significant relationship between gender, age, schooling, and textual performance. There was a modest but significant correlation between number of words written and the scores in the Language subdomain. Conclusions: Results suggest the strong influence of schooling and age over MMSE sentence performance. Failing to write a sentence may suggest cognitive impairment, yet, instructions for the MMSE sentence, i.e. to produce a simple sentence, may limit its clinical interpretation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of complex biological processes. We noticed the absence of a chart or pathway describing the well-studied preimplantation development stages; furthermore, not all genes involved in the process have entries in KEGG Orthology, important information for knowledge application with relation to other organisms. Results: In this work we sought to develop the regulatory pathway for the preimplantation development stage using text-mining tools such as Medline Ranker and PESCADOR to reveal biointeractions among the genes involved in this process. The genes present in the resulting pathway were also used as seeds for software developed by our group called SeedServer to create clusters of homologous genes. These homologues allowed the determination of the last common ancestor for each gene and revealed that the preimplantation development pathway consists of a conserved ancient core of genes with the addition of modern elements. Conclusions: The generation of regulatory pathways through text-mining tools allows the integration of data generated by several studies for a more complete visualization of complex biological processes. Using the genes in this pathway as “seeds” for the generation of clusters of homologues, the pathway can be visualized for other organisms. The clustering of homologous genes together with determination of the ancestry leads to a better understanding of the evolution of such process.