1000 resultados para summarization evaluation
Resumo:
This dissertation applies statistical methods to the evaluation of automatic summarization using data from the Text Analysis Conferences in 2008-2011. Several aspects of the evaluation framework itself are studied, including the statistical testing used to determine significant differences, the assessors, and the design of the experiment. In addition, a family of evaluation metrics is developed to predict the score an automatically generated summary would receive from a human judge and its results are demonstrated at the Text Analysis Conference. Finally, variations on the evaluation framework are studied and their relative merits considered. An over-arching theme of this dissertation is the application of standard statistical methods to data that does not conform to the usual testing assumptions.
Resumo:
A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora.
Resumo:
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Research in psychology has reported that, among the variety of possibilities for assessment methodologies, summary evaluation offers a particularly adequate context for inferring text comprehension and topic understanding. However, grades obtained in this methodology are hard to quantify objectively. Therefore, we carried out an empirical study to analyze the decisions underlying human summary-grading behavior. The task consisted of expert evaluation of summaries produced in critically relevant contexts of summarization development, and the resulting data were modeled by means of Bayesian networks using an application called Elvira, which allows for graphically observing the predictive power (if any) of the resultant variables. Thus, in this article, we analyzed summary-evaluation decision making in a computational framework
Resumo:
The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present relevant knowledge. In this paper, we tackle the Opinion Retrieval, Mining and Summarization task, by proposing a unified framework, composed of three crucial components (information retrieval, opinion mining and text summarization) that allow the retrieval, classification and summarization of subjective information. An extensive analysis is conducted, where different configurations of the framework are suggested and analyzed, in order to determine which is the best one, and under which conditions. The evaluation carried out and the results obtained show the appropriateness of the individual components, as well as the framework as a whole. By achieving an improvement over 10% compared to the state-of-the-art approaches in the context of blogs, we can conclude that subjective text can be efficiently dealt with by means of our proposed framework.
Resumo:
This article analyzes the appropriateness of a text summarization system, COMPENDIUM, for generating abstracts of biomedical papers. Two approaches are suggested: an extractive (COMPENDIUM E), which only selects and extracts the most relevant sentences of the documents, and an abstractive-oriented one (COMPENDIUM E–A), thus facing also the challenge of abstractive summarization. This novel strategy combines extractive information, with some pieces of information of the article that have been previously compressed or fused. Specifically, in this article, we want to study: i) whether COMPENDIUM produces good summaries in the biomedical domain; ii) which summarization approach is more suitable; and iii) the opinion of real users towards automatic summaries. Therefore, two types of evaluation were performed: quantitative and qualitative, for evaluating both the information contained in the summaries, as well as the user satisfaction. Results show that extractive and abstractive-oriented summaries perform similarly as far as the information they contain, so both approaches are able to keep the relevant information of the source documents, but the latter is more appropriate from a human perspective, when a user satisfaction assessment is carried out. This also confirms the suitability of our suggested approach for generating summaries following an abstractive-oriented paradigm.
Resumo:
Automatic Text Summarization has been shown to be useful for Natural Language Processing tasks such as Question Answering or Text Classification and other related fields of computer science such as Information Retrieval. Since Geographical Information Retrieval can be considered as an extension of the Information Retrieval field, the generation of summaries could be integrated into these systems by acting as an intermediate stage, with the purpose of reducing the document length. In this manner, the access time for information searching will be improved, while at the same time relevant documents will be also retrieved. Therefore, in this paper we propose the generation of two types of summaries (generic and geographical) applying several compression rates in order to evaluate their effectiveness in the Geographical Information Retrieval task. The evaluation has been carried out using GeoCLEF as evaluation framework and following an Information Retrieval perspective without considering the geo-reranking phase commonly used in these systems. Although single-document summarization has not performed well in general, the slight improvements obtained for some types of the proposed summaries, particularly for those based on geographical information, made us believe that the integration of Text Summarization with Geographical Information Retrieval may be beneficial, and consequently, the experimental set-up developed in this research work serves as a basis for further investigations in this field.
Resumo:
In recent years, Twitter has become one of the most important microblogging services of the Web 2.0. Among the possible uses it allows, it can be employed for communicating and broadcasting information in real time. The goal of this research is to analyze the task of automatic tweet generation from a text summarization perspective in the context of the journalism genre. To achieve this, different state-of-the-art summarizers are selected and employed for producing multi-lingual tweets in two languages (English and Spanish). A wide experimental framework is proposed, comprising the creation of a new corpus, the generation of the automatic tweets, and their assessment through a quantitative and a qualitative evaluation, where informativeness, indicativeness and interest are key criteria that should be ensured in the proposed context. From the results obtained, it was observed that although the original tweets were considered as model tweets with respect to their informativeness, they were not among the most interesting ones from a human viewpoint. Therefore, relying only on these tweets may not be the ideal way to communicate news through Twitter, especially if a more personalized and catchy way of reporting news wants to be performed. In contrast, we showed that recent text summarization techniques may be more appropriate, reflecting a balance between indicativeness and interest, even if their content was different from the tweets delivered by the news providers.
Resumo:
Text summarization has been studied for over a half century, but traditional methods process texts empirically and neglect the fundamental characteristics and principles of language use and understanding. Automatic summarization is a desirable technique for processing big data. This reference summarizes previous text summarization approaches in a multi-dimensional category space, introduces a multi-dimensional methodology for research and development, unveils the basic characteristics and principles of language use and understanding, investigates some fundamental mechanisms of summarization, studies dimensions on representations, and proposes a multi-dimensional evaluation mechanism. Investigation extends to incorporating pictures into summary and to the summarization of videos, graphs and pictures, and converges to a general summarization method. Further, some basic behaviors of summarization are studied in the complex cyber-physical-social space. Finally, a creative summarization mechanism is proposed as an effort toward the creative summarization of things, which is an open process of interactions among physical objects, data, people, and systems in cyber-physical-social space through a multi-dimensional lens of semantic computing. The insights can inspire research and development of many computing areas.
Resumo:
While news stories are an important traditional medium to broadcast and consume news, microblogging has recently emerged as a place where people can dis- cuss, disseminate, collect or report information about news. However, the massive information in the microblogosphere makes it hard for readers to keep up with these real-time updates. This is especially a problem when it comes to breaking news, where people are more eager to know “what is happening”. Therefore, this dis- sertation is intended as an exploratory effort to investigate computational methods to augment human effort when monitoring the development of breaking news on a given topic from a microblog stream by extractively summarizing the updates in a timely manner. More specifically, given an interest in a topic, either entered as a query or presented as an initial news report, a microblog temporal summarization system is proposed to filter microblog posts from a stream with three primary concerns: topical relevance, novelty, and salience. Considering the relatively high arrival rate of microblog streams, a cascade framework consisting of three stages is proposed to progressively reduce quantity of posts. For each step in the cascade, this dissertation studies methods that improve over current baselines. In the relevance filtering stage, query and document expansion techniques are applied to mitigate sparsity and vocabulary mismatch issues. The use of word embedding as a basis for filtering is also explored, using unsupervised and supervised modeling to characterize lexical and semantic similarity. In the novelty filtering stage, several statistical ways of characterizing novelty are investigated and ensemble learning techniques are used to integrate results from these diverse techniques. These results are compared with a baseline clustering approach using both standard and delay-discounted measures. In the salience filtering stage, because of the real-time prediction requirement a method of learning verb phrase usage from past relevant news reports is used in conjunction with some standard measures for characterizing writing quality. Following a Cranfield-like evaluation paradigm, this dissertation includes a se- ries of experiments to evaluate the proposed methods for each step, and for the end- to-end system. New microblog novelty and salience judgments are created, building on existing relevance judgments from the TREC Microblog track. The results point to future research directions at the intersection of social media, computational jour- nalism, information retrieval, automatic summarization, and machine learning.
Resumo:
The radiopacity of esthetic root canal posts may impair the assessment of their fit to the root canal when using radiographic images. This study determined in vitro the radiographic density of esthetic root canal posts using digital images. Thirty-six roots of human maxillary canines were assigned to six groups (N=6 per group): Reforpost (RP); Aestheti-Plus (AP); Reforpost MIX (RPM); D.T. Light Post (LP); Reforpost Radiopaque (RPR); and White Post DC (WP). Standardized digital images of the posts were obtained in different conditions: outside the root canal, inside the canal before and after cementation using luting material, and with a tissue simulator. Analysis of variance was used to compare the radiopacity mean values among the posts outside the root canal and among the posts under the other conditions, and the t unpaired test to compare the radiopacity between the posts and the dentin, and between the posts and the root canal space. There was no statistically significant difference in radiopacity between RP and RPM, and LP and WP. AP posts showed radiopacity values significantly lower than those for dentin. No statistically significant difference was found between posts (RP and AP) and the root canal space. A statistically significant difference was observed between the luted and non-luted posts; additionally, luted posts with and without tissue simulator showed no significant differences. Most of the cement-luted posts analyzed in this study were distinguishable from the density of adjacent dentin surfaces, allowing radiographic confirmation of the fit of the post in the canal. The success of using esthetic root canal posts depends mainly on the fit of the post within the canal.[1] The radiopacity of a post allows for radiographic imaging to be used to determine the fit, an important factor in a clinical perspective.
Resumo:
80
Resumo:
Endoscopic endonasal transsphenoidal surgery has gained increasing acceptance by otolaryngologists and neurosurgeons. In many centers throughout the world, this technique is now routinely used for the same indications as conventional microsurgical technique for pituitary tumors. To present a surgical experience of consecutive endoscopic endonasal trans-sphenoidal resections of pituitary adenomas. In this study, consecutive patients with pituitary adenomas submitted to endoscopic endonasal pituitary surgery were evaluated regarding the rate of residual tumor, functional remission, symptoms relief, complications, and tumor size. Forty-seven consecutive patients were evaluated; 17 had functioning adenomas, seven had GH producing tumors, five had Cushing's disease, and five had prolactinomas. Of the functioning adenomas, 12 were macroadenomas and five were microadenomas; 30 cases were non-functioning macroadenomas. Of the patients with functioning adenomas, 87% improved. 85% of the patients with visual deficits related to optic nerve compression progressed over time. Most of the patients with complaints of headaches improved (76%). Surgical complications occurred in 10% of patients, which included with two carotid lesions, two cerebrospinal fluid leaks, and one death of a patient with a previous history of complications. Endoscopic endonasal pituitary surgery is a feasible technique, yielding good surgical and functional outcomes, and low morbidity.
Resumo:
Revascularization outcome depends on microbial elimination because apical repair will not happen in the presence of infected tissues. This study evaluated the microbial composition of traumatized immature teeth and assessed their reduction during different stages of the revascularization procedures performed with 2 intracanal medicaments. Fifteen patients (7-17 years old) with immature teeth were submitted to the revascularization procedures; they were divided into 2 groups according to the intracanal medicament used: TAP group (n = 7), medicated with a triple antibiotic paste, and CHP group (n = 8), dressed with calcium hydroxide + 2% chlorhexidine gel. Samples were taken before any treatment (S1), after irrigation with 6% NaOCl (S2), after irrigation with 2% chlorhexidine (S3), after intracanal dressing (S4), and after 17% EDTA irrigation (S5). Cultivable bacteria recovered from the 5 stages were counted and identified by means of polymerase chain reaction assay (16S rRNA). Both groups had colony-forming unit counts significantly reduced after S2 (P < .05); however, no significant difference was found between the irrigants (S2 and S3, P = .99). No difference in bacteria counts was found between the intracanal medicaments used (P = .95). The most prevalent bacteria detected were Actinomyces naeslundii (66.67%), followed by Porphyromonas endodontalis, Parvimonas micra, and Fusobacterium nucleatum, which were detected in 33.34% of the root canals. An average of 2.13 species per canal was found, and no statistical correlation was observed between bacterial species and clinical/radiographic features. The microbial profile of infected immature teeth is similar to that of primarily infected permanent teeth. The greatest bacterial reduction was promoted by the irrigation solutions. The revascularization protocols that used the tested intracanal medicaments were efficient in reducing viable bacteria in necrotic immature teeth.
Resumo:
The aim of this clinical study was to determine the efficacy of Uncaria tomentosa (cat's claw) against denture stomatitis (DS). Fifty patients with DS were randomly assigned into 3 groups to receive 2% miconazole, placebo, or 2% U tomentosa gel. DS level was recorded immediately, after 1 week of treatment, and 1 week after treatment. The clinical effectiveness of each treatment was measured using Newton's criteria. Mycologic samples from palatal mucosa and prosthesis were obtained to determinate colony forming units per milliliter (CFU/mL) and fungal identification at each evaluation period. Candida species were identified with HiCrome Candida and API 20C AUX biochemical test. DS severity decreased in all groups (P < .05). A significant reduction in number of CFU/mL after 1 week (P < .05) was observed for all groups and remained after 14 days (P > .05). C albicans was the most prevalent microorganism before treatment, followed by C tropicalis, C glabrata, and C krusei, regardless of the group and time evaluated. U tomentosa gel had the same effect as 2% miconazole gel. U tomentosa gel is an effective topical adjuvant treatment for denture stomatitis.