2 resultados para Grid-based clustering approach
em DRUM (Digital Repository at the University of Maryland)
Resumo:
While news stories are an important traditional medium to broadcast and consume news, microblogging has recently emerged as a place where people can dis- cuss, disseminate, collect or report information about news. However, the massive information in the microblogosphere makes it hard for readers to keep up with these real-time updates. This is especially a problem when it comes to breaking news, where people are more eager to know “what is happening”. Therefore, this dis- sertation is intended as an exploratory effort to investigate computational methods to augment human effort when monitoring the development of breaking news on a given topic from a microblog stream by extractively summarizing the updates in a timely manner. More specifically, given an interest in a topic, either entered as a query or presented as an initial news report, a microblog temporal summarization system is proposed to filter microblog posts from a stream with three primary concerns: topical relevance, novelty, and salience. Considering the relatively high arrival rate of microblog streams, a cascade framework consisting of three stages is proposed to progressively reduce quantity of posts. For each step in the cascade, this dissertation studies methods that improve over current baselines. In the relevance filtering stage, query and document expansion techniques are applied to mitigate sparsity and vocabulary mismatch issues. The use of word embedding as a basis for filtering is also explored, using unsupervised and supervised modeling to characterize lexical and semantic similarity. In the novelty filtering stage, several statistical ways of characterizing novelty are investigated and ensemble learning techniques are used to integrate results from these diverse techniques. These results are compared with a baseline clustering approach using both standard and delay-discounted measures. In the salience filtering stage, because of the real-time prediction requirement a method of learning verb phrase usage from past relevant news reports is used in conjunction with some standard measures for characterizing writing quality. Following a Cranfield-like evaluation paradigm, this dissertation includes a se- ries of experiments to evaluate the proposed methods for each step, and for the end- to-end system. New microblog novelty and salience judgments are created, building on existing relevance judgments from the TREC Microblog track. The results point to future research directions at the intersection of social media, computational jour- nalism, information retrieval, automatic summarization, and machine learning.
Resumo:
The occurrence frequency of failure events serve as critical indexes representing the safety status of dam-reservoir systems. Although overtopping is the most common failure mode with significant consequences, this type of event, in most cases, has a small probability. Estimation of such rare event risks for dam-reservoir systems with crude Monte Carlo (CMC) simulation techniques requires a prohibitively large number of trials, where significant computational resources are required to reach the satisfied estimation results. Otherwise, estimation of the disturbances would not be accurate enough. In order to reduce the computation expenses and improve the risk estimation efficiency, an importance sampling (IS) based simulation approach is proposed in this dissertation to address the overtopping risks of dam-reservoir systems. Deliverables of this study mainly include the following five aspects: 1) the reservoir inflow hydrograph model; 2) the dam-reservoir system operation model; 3) the CMC simulation framework; 4) the IS-based Monte Carlo (ISMC) simulation framework; and 5) the overtopping risk estimation comparison of both CMC and ISMC simulation. In a broader sense, this study meets the following three expectations: 1) to address the natural stochastic characteristics of the dam-reservoir system, such as the reservoir inflow rate; 2) to build up the fundamental CMC and ISMC simulation frameworks of the dam-reservoir system in order to estimate the overtopping risks; and 3) to compare the simulation results and the computational performance in order to demonstrate the ISMC simulation advantages. The estimation results of overtopping probability could be used to guide the future dam safety investigations and studies, and to supplement the conventional analyses in decision making on the dam-reservoir system improvements. At the same time, the proposed methodology of ISMC simulation is reasonably robust and proved to improve the overtopping risk estimation. The more accurate estimation, the smaller variance, and the reduced CPU time, expand the application of Monte Carlo (MC) technique on evaluating rare event risks for infrastructures.