994 resultados para Text summarization


Relevância:

60.00% 60.00%

Publicador:

Resumo:

En este artículo presentamos COMPENDIUM, una herramienta de generación de resúmenes de textos modular. Esta herramienta se compone de un módulo central con cinco etapas bien diferenciadas: i) análisis lingüístico; ii) detección de redundancia; iii) identificación del tópico; iv) detección de relevancia; y v) generación del resumen, y una serie de módulos adicionales que permiten incrementar las funcionalidades de la herramienta permitiendo la generación de distintos tipos de resúmenes, como por ejemplo orientados a un tema concreto. Realizamos una evaluación exhaustiva en dos dominios distintos (noticias de prensa y documentos sobre lugares turísticos) y analizamos diferentes tipos de resúmenes generados con COMPENDIUM (mono-documento, multi-documento, genéricos y orientados a un tema). Además, comparamos nuestro sistema con otros sistemas de generación de resúmenes actuales. Los resultados que se obtienen demuestran que la herramienta COMPENDIUM es capaz de generar resúmenes competitivos para los distintos tipos de resúmenes propuestos.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present an innovative topic segmentation system based on a new informative similarity measure that takes into account word co-occurrence in order to avoid the accessibility to existing linguistic resources such as electronic dictionaries or lexico-semantic databases such as thesauri or ontology. Topic segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. Topic segmentation has extensively been used in information retrieval and text summarization. In particular, our architecture proposes a language-independent topic segmentation system that solves three main problems evidenced by previous research: systems based uniquely on lexical repetition that show reliability problems, systems based on lexical cohesion using existing linguistic resources that are usually available only for dominating languages and as a consequence do not apply to less favored languages and finally systems that need previously existing harvesting training data. For that purpose, we only use statistics on words and sequences of words based on a set of texts. This solution provides a flexible solution that may narrow the gap between dominating languages and less favored languages thus allowing equivalent access to information.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To detect and annotate the key events of live sports videos, we need to tackle the semantic gaps of audio-visual information. Previous work has successfully extracted semantic from the time-stamped web match reports, which are synchronized with the video contents. However, web and social media articles with no time-stamps have not been fully leveraged, despite they are increasingly used to complement the coverage of major sporting tournaments. This paper aims to address this limitation using a novel multimodal summarization framework that is based on sentiment analysis and players' popularity. It uses audiovisual contents, web articles, blogs, and commentators' speech to automatically annotate and visualize the key events and key players in a sports tournament coverage. The experimental results demonstrate that the automatically generated video summaries are aligned with the events identified from the official website match reports.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Query focused summarization is the task of producing a compressed text of original set of documents based on a query. Documents can be viewed as graph with sentences as nodes and edges can be added based on sentence similarity. Graph based ranking algorithms which use 'Biased random surfer model' like topic-sensitive LexRank have been successfully applied to query focused summarization. In these algorithms, random walk will be biased towards the sentences which contain query relevant words. Specifically, it is assumed that random surfer knows the query relevance score of the sentence to where he jumps. However, neighbourhood information of the sentence to where he jumps is completely ignored. In this paper, we propose look-ahead version of topic-sensitive LexRank. We assume that random surfer not only knows the query relevance of the sentence to where he jumps but he can also look N-step ahead from that sentence to find query relevance scores of future set of sentences. Using this look ahead information, we figure out the sentences which are indirectly related to the query by looking at number of hops to reach a sentence which has query relevant words. Then we make the random walk biased towards even to the indirect query relevant sentences along with the sentences which have query relevant words. Experimental results show 20.2% increase in ROUGE-2 score compared to topic-sensitive LexRank on DUC 2007 data set. Further, our system outperforms best systems in DUC 2006 and results are comparable to state of the art systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Summarization is an essential requirement for achieving a more compact and interesting representation of sports video contents. We propose a framework that integrates highlights into play segments and reveal why we should still retain breaks. Experimental results show that fast detections of whistle sounds, crowd excitement, and text boxes can complement existing techniques for play-breaks and highlights localization.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Summarization of cricket videos is very important because of three reasons: 1) its long duration making manual highlights generation tedious 2) less explored area compared to other sports like soccer 3) huge viewership. We propose a novel summarization scheme for cricket which exploits its contextual semantics. First, we detect the bowling frames based on which the video is temporally segmented into individual deliveries. Then each temporal segment representing a delivery is classified into an interesting or non-interesting segment based on detection of events namely boundaries and wickets. Due to the high frequency of ads and replays in cricket, we have proposed robust algorithms for their removal. Finally, we have proposed a finite state automaton based modeling of the temporal segments to extract key-frames. We have also extended the framework to include text cues and expert choices and also developed a hierarchical summary. We have tested our algorithm on several broadcast cricket videos and obtained good results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This action research study of twenty students in my sixth grade mathematics classroom examines the implementation of summarization strategies. Students were taught how to summarize concepts and how to explain their thinking in different ways to the teacher and their peers. Through analysis of students’ summaries of concepts from lessons that I taught, tests scores, and student journals and interviews, I discovered that summarizing mathematical concepts offers students an engaging opportunity to better understand those concepts and render that understanding more visible to the teacher. This analysis suggests that non-traditional summarization, such as verbal and written strategies, and strategies involving movement and discussions, can be useful in mathematics classrooms to improve student understanding, engagement in learning tasks, and as a form of formative assessment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

by W. Jett Lauck and Edgar Sydenstricker.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The MESA Puget Sound Project is sponsored by the National Oceanic and Atmospheric Administration and the Environmental Protection Agency.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.