86 resultados para Abstractive summarization


Relevância:

20.00% 20.00%

Publicador:

Resumo:

While news stories are an important traditional medium to broadcast and consume news, microblogging has recently emerged as a place where people can dis- cuss, disseminate, collect or report information about news. However, the massive information in the microblogosphere makes it hard for readers to keep up with these real-time updates. This is especially a problem when it comes to breaking news, where people are more eager to know “what is happening”. Therefore, this dis- sertation is intended as an exploratory effort to investigate computational methods to augment human effort when monitoring the development of breaking news on a given topic from a microblog stream by extractively summarizing the updates in a timely manner. More specifically, given an interest in a topic, either entered as a query or presented as an initial news report, a microblog temporal summarization system is proposed to filter microblog posts from a stream with three primary concerns: topical relevance, novelty, and salience. Considering the relatively high arrival rate of microblog streams, a cascade framework consisting of three stages is proposed to progressively reduce quantity of posts. For each step in the cascade, this dissertation studies methods that improve over current baselines. In the relevance filtering stage, query and document expansion techniques are applied to mitigate sparsity and vocabulary mismatch issues. The use of word embedding as a basis for filtering is also explored, using unsupervised and supervised modeling to characterize lexical and semantic similarity. In the novelty filtering stage, several statistical ways of characterizing novelty are investigated and ensemble learning techniques are used to integrate results from these diverse techniques. These results are compared with a baseline clustering approach using both standard and delay-discounted measures. In the salience filtering stage, because of the real-time prediction requirement a method of learning verb phrase usage from past relevant news reports is used in conjunction with some standard measures for characterizing writing quality. Following a Cranfield-like evaluation paradigm, this dissertation includes a se- ries of experiments to evaluate the proposed methods for each step, and for the end- to-end system. New microblog novelty and salience judgments are created, building on existing relevance judgments from the TREC Microblog track. The results point to future research directions at the intersection of social media, computational jour- nalism, information retrieval, automatic summarization, and machine learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation applies statistical methods to the evaluation of automatic summarization using data from the Text Analysis Conferences in 2008-2011. Several aspects of the evaluation framework itself are studied, including the statistical testing used to determine significant differences, the assessors, and the design of the experiment. In addition, a family of evaluation metrics is developed to predict the score an automatically generated summary would receive from a human judge and its results are demonstrated at the Text Analysis Conference. Finally, variations on the evaluation framework are studied and their relative merits considered. An over-arching theme of this dissertation is the application of standard statistical methods to data that does not conform to the usual testing assumptions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

L'avanzamento nel campo della long document summarization dipende interamente dalla disponibilità di dataset pubblici di alta qualità e con testi di lunghezza considerevole. Risulta pertanto problematico il fatto che tali dataset risultino spesso solo in lingua inglese, comportandone una limitazione notevole se ci si rivolge a linguaggi le cui risorse sono limitate. A tal scopo, si propone LAWSU-IT, un nuovo dataset giudiziario per long document summarization italiana. LAWSU-IT è il primo dataset italiano di summarization ad avere documenti di grandi dimensioni e a trattare il dominio giudiziario, ed è stato costruito attuando procedure di cleaning dei dati e selezione mirata delle istanze, con lo scopo di ottenere un dataset di long document summarization di alta qualità. Inoltre, sono proposte molteplici baseline sperimentali di natura estrattiva e astrattiva con modelli stato dell'arte e approcci di segmentazione del testo. Si spera che tale risultato possa portare a ulteriori ricerche e sviluppi nell'ambito della long document summarization italiana.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In questo elaborato viene trattata l’analisi del problema di soft labeling applicato alla multi-document summarization, in particolare vengono testate varie tecniche per estrarre frasi rilevanti dai documenti presi in dettaglio, al fine di fornire al modello di summarization quelle di maggior rilievo e più informative per il riassunto da generare. Questo problema nasce per far fronte ai limiti che presentano i modelli di summarization attualmente a disposizione, che possono processare un numero limitato di frasi; sorge quindi la necessità di filtrare le informazioni più rilevanti quando il lavoro si applica a documenti lunghi. Al fine di scandire la metrica di importanza, vengono presi come riferimento metodi sintattici, semantici e basati su rappresentazione a grafi AMR. Il dataset preso come riferimento è Multi-LexSum, che include tre granularità di summarization di testi legali. L’analisi in questione si compone quindi della fase di estrazione delle frasi dai documenti, della misurazione delle metriche stabilite e del passaggio al modello stato dell’arte PRIMERA per l’elaborazione del riassunto. Il testo ottenuto viene poi confrontato con il riassunto target già fornito, considerato come ottimale; lavorando in queste condizioni l’obiettivo è di definire soglie ottimali di upper-bound per l’accuratezza delle metriche, che potrebbero ampliare il lavoro ad analisi più dettagliate qualora queste superino lo stato dell’arte attuale.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática - Área de Especialização em Arquiteturas, Sistemas e Redes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O presente relatório diz respeito ao trabalho desenvolvido durante o período de estágio curricular que se enquadra no segundo ano do Mestrado em Engenharia Civil do Instituto Superior de Engenharia do Porto. O estágio decorreu na empresa Paviazeméis em ambiente de gabinete e obra, no cumprimento dos requisitos da unidade curricular DIPRE (Dissertação/Projeto/Estágio). Procurou-se neste documento enquadrar o estágio, e a sua importância, e apresentar a empresa onde teve lugar. A Paviazeméis é uma empresa vocacionada essencialmente para obras públicas de estradas. Uma das empreitadas adjudicadas incluiu a reabilitação de um edifício, tendo sido necessário reforçar a equipa, de modo a corresponder às expectativas propostas. O estágio desenvolvido passou pela integração na equipa responsável pela execução da empreitada, tendo como objetivo dar apoio à obra em questão. Para além do enquadramento do estágio e da obra, fez-se uma breve apresentação da proposta de intervenção, descrevendo os diversos projetos, o planeamento da obra e os trabalhos de execução, juntamente com os pormenores construtivos e imagens de obra. Foram referenciados também os diversos tipos de controlo executados em obra e por fim, o fecho da empreitada.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tämän diplomityön aiheena oli toteuttaa signaalin laatua mittaavien tunnuslukujen keräämiseen tarkoitettu ohjelmisto Nokia GSM-siirtoverkkoon kuuluvista keskuksen päätelaitteista. Keräämiseen tarkoitettu ohjelmisto täydensi tukiasemaohjaimen tilastointijärjestelmään siten, että signaalin laatua kuvaavia tunnuslukuja saadaan nyt kerättyä keskitetysti koko tukiasemajärjestelmän siirtoyhteysverkosta. Signaalin laatua mittaavat tunnusluvut perustuvat kahden siirtolaitteen välisessä yhteydessä havaittuihin bittivirheisiin. Tunnuslukujen keräämisen kohteena olevat päätelaitteet voivat sijaita joko tukiasemaohjaimessa tai toisen sukupolven transkooderi submultipleksenssä. Molemmille tapauksille on toteutettu oma mittaustyyppi tilastointijärjestelmään. Tukiasemaohjaimen tilastointijärjestelmä koostuu mittausten hallintarajapinnasta, mittausten keskitetystä osasta sekä hajautetusta osasta. Kerätty tieto siirretään tilastointijärjestelmän keskitetystä osasta verkonhallintajärjestelmälle jälkikäsittelyä varten. Tässä diplomityössä on esitelty jälkikäsittelyn osa-alueita, joita ovat: turmeltuneiden näytteiden karsinta, tiivistäminen, ennustaminen, puuttuvien näytteiden estimointi sekä mittaustulosten esittäminen.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tässä diplomityössä tutkittiin valmistuksellisia ja rakenteellisia mahdollisuuksia alumiiniveneen rungon valmistuskustannusten optimointiin vallitseva runkomitoitusdirektiivi huomioiden. Varsinaista kustannussäästötavoitetta työlle ei määritetty, mutta tavoitteena oli selvittää runkokonstruktion kustannusten painopisteiden mukaisesti kehityskohteet ja uusi kehitetty runkokonstruktio, joka täyttää mitoitusdirektiivin vaatimukset. Nykyistä ja kehitettyä runkokonstruktiota verrataan kustannusten sekä valmistettavuuden kannalta ja tehdään johtopäätökset. Kehityskohteiden löytämisessä sovellettiin valmistus- ja kokoonpanoystävällisen suunnittelun DFM(A)-periaatteita ja teoriaa, jotka esiteltiin työn teoriaosuudessa. Valmistuskustannusten määrittämisen tueksi teoriaosuudessa esiteltiin hitsauksen ja osavalmistuksen (laserleikkaus, särmäys) kustannusmallit, joita hyödynnettiin soveltuvin osin kustannuslaskennassa. Merkittävänä osana selvitystyötä oli myös SFS-EN ISO 12215-5-(6) runkomitoitusstandardin tulkitseminen ja referointi työn kannalta merkittäviin osa - alueisiin. Nykyisen runkokonstruktion kustannusanalyysin ja valmistettavuusarvion pohjalta päädyttiin rakenteellisiin ja valmistuksellisiin kehitystoimenpiteisiin, jotka täyttävät runkomitoitusdirektiivin rakenteelliset vaatimukset. Rakenteelliset toimenpiteet käsittivät mm. materiaalivahvuuksien, jäykisterakenteen ja osamäärän optimointia. Valmistukselliset toimenpiteet liittyivät mm. hitsauksen tuottavuuden tehostamiseen ja sen kustannusvaikutuksiin. Rakenteellisilla kehitystoimenpiteillä olisi mahdollista saavuttaa noin 6 % säästö ja valmistusmenetelmien kehittämisellä noin 17 % säästö valmistuskustannuksista.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The usage of digital content, such as video clips and images, has increased dramatically during the last decade. Local image features have been applied increasingly in various image and video retrieval applications. This thesis evaluates local features and applies them to image and video processing tasks. The results of the study show that 1) the performance of different local feature detector and descriptor methods vary significantly in object class matching, 2) local features can be applied in image alignment with superior results against the state-of-the-art, 3) the local feature based shot boundary detection method produces promising results, and 4) the local feature based hierarchical video summarization method shows promising new new research direction. In conclusion, this thesis presents the local features as a powerful tool in many applications and the imminent future work should concentrate on improving the quality of the local features.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La technologie des microarrays demeure à ce jour un outil important pour la mesure de l'expression génique. Au-delà de la technologie elle-même, l'analyse des données provenant des microarrays constitue un problème statistique complexe, ce qui explique la myriade de méthodes proposées pour le pré-traitement et en particulier, l'analyse de l'expression différentielle. Toutefois, l'absence de données de calibration ou de méthodologie de comparaison appropriée a empêché l'émergence d'un consensus quant aux méthodes d'analyse optimales. En conséquence, la décision de l'analyste de choisir telle méthode plutôt qu'une autre se fera la plupart du temps de façon subjective, en se basant par exemple sur la facilité d'utilisation, l'accès au logiciel ou la popularité. Ce mémoire présente une approche nouvelle au problème de la comparaison des méthodes d'analyse de l'expression différentielle. Plus de 800 pipelines d'analyse sont appliqués à plus d'une centaine d'expériences sur deux plateformes Affymetrix différentes. La performance de chacun des pipelines est évaluée en calculant le niveau moyen de co-régulation par l'entremise de scores d'enrichissements pour différentes collections de signatures moléculaires. L'approche comparative proposée repose donc sur un ensemble varié de données biologiques pertinentes, ne confond pas la reproductibilité avec l'exactitude et peut facilement être appliquée à de nouvelles méthodes. Parmi les méthodes testées, la supériorité de la sommarisation FARMS et de la statistique de l'expression différentielle TREAT est sans équivoque. De plus, les résultats obtenus quant à la statistique d'expression différentielle corroborent les conclusions d'autres études récentes à propos de l'importance de prendre en compte la grandeur du changement en plus de sa significativité statistique.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A difficulty in the design of automated text summarization algorithms is in the objective evaluation. Viewing summarization as a tradeoff between length and information content, we introduce a technique based on a hierarchy of classifiers to rank, through model selection, different summarization methods. This summary evaluation technique allows for broader comparison of summarization methods than the traditional techniques of summary evaluation. We present an empirical study of two simple, albeit widely used, summarization methods that shows the different usages of this automated task-based evaluation system and confirms the results obtained with human-based evaluation methods over smaller corpora.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The relevance of rising healthcare costs is a main topic in complementary health companies in Brazil. In 2011, these expenses consumed more than 80% of the monthly health insurance in Brazil. Considering the administrative costs, it is observed that the companies operating in this market work, on average, at the threshold between profit and loss. This paper presents results after an investigation of the welfare costs of a health plan company in Brazil. It was based on the KDD process and explorative Data Mining. A diversity of results is presented, such as data summarization, providing compact descriptions of the data, revealing common features and intrinsic observations. Among the key findings was observed that a small portion of the population is responsible for the most demanding of resources devoted to health care