Biblioteca Digital

868 resultados para Big Stream

Efficient Data Management with Applications to IoT

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Internet of Things (IoT) consists of a worldwide “network of networks,” composed by billions of interconnected heterogeneous devices denoted as things or “Smart Objects” (SOs). Significant research efforts have been dedicated to port the experience gained in the design of the Internet to the IoT, with the goal of maximizing interoperability, using the Internet Protocol (IP) and designing specific protocols like the Constrained Application Protocol (CoAP), which have been widely accepted as drivers for the effective evolution of the IoT. This first wave of standardization can be considered successfully concluded and we can assume that communication with and between SOs is no longer an issue. At this time, to favor the widespread adoption of the IoT, it is crucial to provide mechanisms that facilitate IoT data management and the development of services enabling a real interaction with things. Several reference IoT scenarios have real-time or predictable latency requirements, dealing with billions of device collecting and sending an enormous quantity of data. These features create a new need for architectures specifically designed to handle this scenario, hear denoted as “Big Stream”. In this thesis a new Big Stream Listener-based Graph architecture is proposed. Another important step, is to build more applications around the Web model, bringing about the Web of Things (WoT). As several IoT testbeds have been focused on evaluating lower-layer communication aspects, this thesis proposes a new WoT Testbed aiming at allowing developers to work with a high level of abstraction, without worrying about low-level details. Finally, an innovative SOs-driven User Interface (UI) generation paradigm for mobile applications in heterogeneous IoT networks is proposed, to simplify interactions between users and things.

Big social data and political sentiment: the tweet stream during the UK General Election 2015 campaign

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The General Election for the 56th United Kingdom Parliament was held on 7 May 2015. Tweets related to UK politics, not only those with the specific hashtag ”#GE2015”, have been collected in the period between March 1 and May 31, 2015. The resulting dataset contains over 28 million tweets for a total of 118 GB in uncompressed format or 15 GB in compressed format. This study describes the method that was used to collect the tweets and presents some analysis, including a political sentiment index, and outlines interesting research directions on Big Social Data based on Twitter microblogging.

Online Stream Processing di Big Data su Apache Storm per Applicazioni di Instant Coupon

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Big data è il termine usato per descrivere una raccolta di dati così estesa in termini di volume,velocità e varietà da richiedere tecnologie e metodi analitici specifici per l'estrazione di valori significativi. Molti sistemi sono sempre più costituiti e caratterizzati da enormi moli di dati da gestire,originati da sorgenti altamente eterogenee e con formati altamente differenziati,oltre a qualità dei dati estremamente eterogenei. Un altro requisito in questi sistemi potrebbe essere il fattore temporale: sempre più sistemi hanno bisogno di ricevere dati significativi dai Big Data il prima possibile,e sempre più spesso l’input da gestire è rappresentato da uno stream di informazioni continuo. In questo campo si inseriscono delle soluzioni specifiche per questi casi chiamati Online Stream Processing. L’obiettivo di questa tesi è di proporre un prototipo funzionante che elabori dati di Instant Coupon provenienti da diverse fonti con diversi formati e protocolli di informazioni e trasmissione e che memorizzi i dati elaborati in maniera efficiente per avere delle risposte in tempo reale. Le fonti di informazione possono essere di due tipologie: XMPP e Eddystone. Il sistema una volta ricevute le informazioni in ingresso, estrapola ed elabora codeste fino ad avere dati significativi che possono essere utilizzati da terze parti. Lo storage di questi dati è fatto su Apache Cassandra. Il problema più grosso che si è dovuto risolvere riguarda il fatto che Apache Storm non prevede il ribilanciamento delle risorse in maniera automatica, in questo caso specifico però la distribuzione dei clienti durante la giornata è molto varia e ricca di picchi. Il sistema interno di ribilanciamento sfrutta tecnologie innovative come le metriche e sulla base del throughput e della latenza esecutiva decide se aumentare/diminuire il numero di risorse o semplicemente non fare niente se le statistiche sono all’interno dei valori di soglia voluti.

The stream of corrective experiences in action: Big Bang and Constant Dripping

Relevância:

40.00% 40.00%

Publicador:

Context-aware PDM (Coll-Stream)

Relevância:

30.00% 30.00%

Publicador:

Towards a parallel computationally efficient approach to scaling up data stream classification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Elaborazione di Big Data: un’applicazione dello Speed Layer di Lambda Architecture

Relevância:

30.00% 30.00%

Publicador:

Resumo:

I Big Data hanno forgiato nuove tecnologie che migliorano la qualità della vita utilizzando la combinazione di rappresentazioni eterogenee di dati in varie discipline. Occorre, quindi, un sistema realtime in grado di computare i dati in tempo reale. Tale sistema viene denominato speed layer, come si evince dal nome si è pensato a garantire che i nuovi dati siano restituiti dalle query funcions con la rapidità in cui essi arrivano. Il lavoro di tesi verte sulla realizzazione di un’architettura che si rifaccia allo Speed Layer della Lambda Architecture e che sia in grado di ricevere dati metereologici pubblicati su una coda MQTT, elaborarli in tempo reale e memorizzarli in un database per renderli disponibili ai Data Scientist. L’ambiente di programmazione utilizzato è JAVA, il progetto è stato installato sulla piattaforma Hortonworks che si basa sul framework Hadoop e sul sistema di computazione Storm, che permette di lavorare con flussi di dati illimitati, effettuando l’elaborazione in tempo reale. A differenza dei tradizionali approcci di stream-processing con reti di code e workers, Storm è fault-tolerance e scalabile. Gli sforzi dedicati al suo sviluppo da parte della Apache Software Foundation, il crescente utilizzo in ambito di produzione di importanti aziende, il supporto da parte delle compagnie di cloud hosting sono segnali che questa tecnologia prenderà sempre più piede come soluzione per la gestione di computazioni distribuite orientate agli eventi. Per poter memorizzare e analizzare queste moli di dati, che da sempre hanno costituito una problematica non superabile con i database tradizionali, è stato utilizzato un database non relazionale: HBase.

Nord Stream II - yes or no? - Political decision of a political Commission. EPC Commentary, 14 June 2016

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When the new European Commission started work in autumn 2014, the president of the Commission took great pride in calling it a ‘political Commission’, which will be big on big things and small on small. Whilst the EU is currently dealing with many crises, reality is that things do not come much bigger than Nord Stream II. Will this be a political Commission that stands by its principles, including respect for liberty, democracy, the rule of law and human rights? Will this Commission have the backbone to politically assess a project that threatens EU unity and its core values, undermines the Union’s commonly agreed commitment to building an Energy Union and facilitates Russia’s aggression against Ukraine? President Juncker’s controversial visit to Russia and meeting with President Putin on 16-17 June is a test-case: will this Commission be ready to defend its commitments and principles when discussing ‘economic issues’?

Tecnologie per l’analisi in tempo reale di Big Data: prestazioni a confronto

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lo scopo di questo l'elaborato è l'analisi,lo studio e il confronto delle tecnologie per l'analisi in tempo reale di Big Data: Apache Spark Streaming, Apache Storm e Apache Flink. Per eseguire un adeguato confronto si è deciso di realizzare un sistema di rilevamento e riconoscimento facciale all’interno di un video, in maniera da poter parallelizzare le elaborazioni necessarie sfruttando le potenzialità di ogni architettura. Dopo aver realizzato dei prototipi realistici, uno per ogni architettura, si è passati alla fase di testing per misurarne le prestazioni. Attraverso l’impiego di cluster appositamente realizzati in ambiente locale e cloud, sono state misurare le caratteristiche che rappresentavano, meglio di altre, le differenze tra le architetture, cercando di dimostrarne quantitativamente l’efficacia degli algoritmi utilizzati e l’efficienza delle stesse. Si è scelto quindi il massimo input rate sostenibile e la latenza misurate al variare del numero di nodi. In questo modo era possibile osservare la scalabilità di architettura, per analizzarne l’andamento e verificare fino a che limite si potesse giungere per mantenere un compromesso accettabile tra il numero di nodi e l’input rate sostenibile. Gli esperimenti effettuati hanno mostrato che, all’aumentare del numero di worker le prestazioni del sistema migliorano, rendendo i sistemi studiati adatti all’utilizzo su larga scala. Inoltre sono state rilevate sostanziali differenze tra i vari framework, riportando pro e contro di ognuno, cercando di evidenziarne i più idonei al caso di studio.

Geomorphic Impacts of the 2013 Colorado Front Range Flood on Black Canyon Creek and North Fork Big Thompson River

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In September 2013, the Colorado Front Range experienced a five-day storm that brought record-breaking precipitation to the region. As a consequence, many Front Range streams experienced flooding, leading to erosion, debris flows, bank failures and channel incision. I compare the effects that debris flows and flooding have on the channel bar frequency, frequency and location of wood accumulation, and on the shape and size of the channel along two flood impacted reaches located near Estes Park and Glen Haven, Colorado within Rocky Mountain National Park and Arapaho-Roosevelt National Forest: Black Canyon Creek (BCC) and North Fork Big Thompson River (NFBT). The primary difference between the two study areas is that BCC was inundated by multiple debris flows, whereas NFBT only experienced flooding. Fieldwork consisted of recording location and size of large wood and channel bars and surveying reaches to produce cross-sections. Additional observations were made on bank failures in NFBT and the presence of boulders in channel bars in BCC to determine sediment source. The debris flow acted to scour and incise BCC causing long-term alteration. The post-flood channel cross-sectional area is as much as 7 to 23 times larger than the pre-flood channel, caused by the erosion of the channel bed to bedrock and the elimination of riparian vegetation. Large wood was forced out of the stream channel and deposited outside of the bankfull channel. Flooding in NFBT caused bank erosion and widening that contributed sediment to channel bars, but accomplished little stream-bed scour. As a result, there was relatively little damage to mid-channel and riparian vegetation, and most large wood remained within the wetted channel.

Progettazione e prototipazione di un sistema di Data Stream Processing basato su Apache Storm

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Con l’avvento di Internet, il numero di utenti con un effettivo accesso alla rete e la possibilità di condividere informazioni con tutto il mondo è, negli anni, in continua crescita. Con l’introduzione dei social media, in aggiunta, gli utenti sono portati a trasferire sul web una grande quantità di informazioni personali mettendoli a disposizione delle varie aziende. Inoltre, il mondo dell’Internet Of Things, grazie al quale i sensori e le macchine risultano essere agenti sulla rete, permette di avere, per ogni utente, un numero maggiore di dispositivi, direttamente collegati tra loro e alla rete globale. Proporzionalmente a questi fattori anche la mole di dati che vengono generati e immagazzinati sta aumentando in maniera vertiginosa dando luogo alla nascita di un nuovo concetto: i Big Data. Nasce, di conseguenza, la necessità di far ricorso a nuovi strumenti che possano sfruttare la potenza di calcolo oggi offerta dalle architetture più complesse che comprendono, sotto un unico sistema, un insieme di host utili per l’analisi. A tal merito, una quantità di dati così vasta, routine se si parla di Big Data, aggiunta ad una velocità di trasmissione e trasferimento altrettanto alta, rende la memorizzazione dei dati malagevole, tanto meno se le tecniche di storage risultano essere i tradizionali DBMS. Una soluzione relazionale classica, infatti, permetterebbe di processare dati solo su richiesta, producendo ritardi, significative latenze e inevitabile perdita di frazioni di dataset. Occorre, perciò, far ricorso a nuove tecnologie e strumenti consoni a esigenze diverse dalla classica analisi batch. In particolare, è stato preso in considerazione, come argomento di questa tesi, il Data Stream Processing progettando e prototipando un sistema bastato su Apache Storm scegliendo, come campo di applicazione, la cyber security.

USING INDICATORS OF BIOTIC INTEGRITY FOR ASSESSMENT OF STREAM CONDITION

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multiple indices of biotic integrity and biological condition gradient models have been developed and validated to assess ecological integrity in the Laurentian Great Lakes Region. With multiple groups such as Tribal, Federal, and State agencies as well as scientists and local watershed management or river-focused volunteer groups collecting data for bioassessment it is important that we determine the comparability of data and the effectiveness of indices applied to these data for assessment of natural systems. We evaluated the applicability of macroinvertebrate and fish community indices for assessing site integrity. Site quality (i.e., habitat condition) could be classified differently depending on which index was applied. This highlights the need to better understand the metrics driving index variation as well as reference conditions for effective communication and use of indices of biotic integrity in the Upper Midwest. We found the macroinvertebrate benthic community index for the Northern Lakes and Forests Ecoregion and a coldwater fish index of biotic integrity for the Upper Midwest were most appropriate for use in the Big Manistee River watershed based on replicate sampling, ability to track trends over time and overall performance. We evaluated three sites where improper road stream crossings (culverts) were improved by replacing them with modern full-span structures using the most appropriate fish and macroinvertebrate IBIs. We used a before-after-control-impact paired series analytical design and found mixed results, with evidence of improvement in biotic integrity based on macroinvertebrate indices at some of the sites while most sites indicated no response in index score. Culvert replacements are often developed based on the potential, or the perception, that they will restore ecological integrity. As restoration practitioners, researchers and managers, we need to be transparent in our goals and objectives and monitor for those results specifically. The results of this research serve as an important model for the broader field of ecosystem restoration and support the argument that while biotic communities can respond to actions undertaken with the goal of overall restoration, practitioners should be realistic in their expectations and claims of predicted benefit, and then effectively evaluate the true impacts of the restoration activities.

Big Manistee River Tributaries as Potential Arctic Grayling Habitat

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Big Manistee River was one of the most well known Michigan rivers to historically support a population of Arctic grayling (Thymallus arctics). Overfishing, competition with introduced fish, and habitat loss due to logging are believed to have caused their decline and ultimate extirpation from the Big Manistee River around 1900 and from the State of Michigan by 1936. Grayling are a species of great cultural importance to Little River Band of Ottawa Indian tribal heritage and although past attempts to reintroduce Arctic grayling have been unsuccessful, a continued interest in their return led to the assessment of environmental conditions of tributaries within a 21 kilometer section of the Big Manistee River to determine if suitable habitat exists. Although data describing historical conditions in the Big Manistee River is limited, we reviewed the literature to determine abiotic conditions prior to Arctic grayling disappearance and the habitat conditions in rivers in western and northwestern North America where they currently exist. We assessed abiotic habitat metrics from 23 sites distributed across 8 tributaries within the Manistee River watershed. Data collected included basic water parameters, streambed substrate composition, channel profile and areal measurements of channel geomorphic unit, and stream velocity and discharge measurements. These environmental condition values were compared to literature values, habitat suitability thresholds, and current conditions of rivers with Arctic grayling populations to assess the feasibility of the abiotic habitat in Big Manistee River tributaries to support Arctic grayling. Although the historic grayling habitat in the region was disturbed during the era of major logging around the turn of the 20th century, our results indicate that some important abiotic conditions within Big Manistee River tributaries are within the range of conditions that support current and past populations of Arctic grayling. Seven tributaries contained between 20-30% pools by area, used by grayling for refuge. All but two tributaries were composed primarily of pebbles, with the remaining two dominated by fine substrates (sand, silt, clay). Basic water parameters and channel depth were within the ranges of those found for populations of Arctic grayling persisting in Montana, Alaska, and Canada for all tributaries. Based on the metrics analyzed in this study, suitable abiotic grayling habitat does exist in Big Manistee River tributaries.

Un framework di astrazione per lo Stream Processing a supporto di RAM3S

Relevância:

30.00% 30.00%

Publicador:

Resumo:

L’elaborazione di quantità di dati sempre crescente ed in tempi ragionevoli è una delle principali sfide tecnologiche del momento. La difficoltà non risiede esclusivamente nel disporre di motori di elaborazione efficienti e in grado di eseguire la computazione coordinata su un’enorme mole di dati, ma anche nel fornire agli sviluppatori di tali applicazioni strumenti di sviluppo che risultino intuitivi nell’utilizzo e facili nella messa in opera, con lo scopo di ridurre il tempo necessario a realizzare concretamente un’idea di applicazione e abbassare le barriere all’ingresso degli strumenti software disponibili. Questo lavoro di tesi prende in esame il progetto RAM3S, il cui intento è quello di semplificare la realizzazione di applicazioni di elaborazione dati basate su piattaforme di Stream Processing quali Spark, Storm, Flinke e Samza, e si occupa di esaudire il suo scopo originale fornendo un framework astratto ed estensibile per la definizione di applicazioni di stream processing, capaci di eseguire indistintamente sulle piattaforme disponibili sul mercato.

Data-stream driven Fuzzy-granular approaches for system maintenance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.

«
1
2
3
4
5
6
7
8
...
57
58
»