818 resultados para big data


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the Big Data development and the growth of cloud computing and Internet of Things, data centers have been multiplying in Brazil and the rest of the world. Designing and running this sites in an efficient way has become a necessary challenge and to do so, it's essential a better understanding of its infrastructure. Thus, this paper presents a bibliography study using technical concepts in order to understand the specific needs related to this environment and the best forms address them. It discusses the data center infrastructure main systems, methods to improve their energy efficiency and their future trends

Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the Big Data development and the growth of cloud computing and Internet of Things, data centers have been multiplying in Brazil and the rest of the world. Designing and running this sites in an efficient way has become a necessary challenge and to do so, it's essential a better understanding of its infrastructure. Thus, this paper presents a bibliography study using technical concepts in order to understand the specific needs related to this environment and the best forms address them. It discusses the data center infrastructure main systems, methods to improve their energy efficiency and their future trends

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Lo scopo del clustering è quindi quello di individuare strutture nei dati significative, ed è proprio dalla seguente definizione che è iniziata questa attività di tesi , fornendo un approccio innovativo ed inesplorato al cluster, ovvero non ricercando la relazione ma ragionando su cosa non lo sia. Osservando un insieme di dati ,cosa rappresenta la non relazione? Una domanda difficile da porsi , che ha intrinsecamente la sua risposta, ovvero l’indipendenza di ogni singolo dato da tutti gli altri. La ricerca quindi dell’indipendenza tra i dati ha portato il nostro pensiero all’approccio statistico ai dati , in quanto essa è ben descritta e dimostrata in statistica. Ogni punto in un dataset, per essere considerato “privo di collegamenti/relazioni” , significa che la stessa probabilità di essere presente in ogni elemento spaziale dell’intero dataset. Matematicamente parlando , ogni punto P in uno spazio S ha la stessa probabilità di cadere in una regione R ; il che vuol dire che tale punto può CASUALMENTE essere all’interno di una qualsiasi regione del dataset. Da questa assunzione inizia il lavoro di tesi, diviso in più parti. Il secondo capitolo analizza lo stato dell’arte del clustering, raffrontato alla crescente problematica della mole di dati, che con l’avvento della diffusione della rete ha visto incrementare esponenzialmente la grandezza delle basi di conoscenza sia in termini di attributi (dimensioni) che in termini di quantità di dati (Big Data). Il terzo capitolo richiama i concetti teorico-statistici utilizzati dagli algoritimi statistici implementati. Nel quarto capitolo vi sono i dettagli relativi all’implementazione degli algoritmi , ove sono descritte le varie fasi di investigazione ,le motivazioni sulle scelte architetturali e le considerazioni che hanno portato all’esclusione di una delle 3 versioni implementate. Nel quinto capitolo gli algoritmi 2 e 3 sono confrontati con alcuni algoritmi presenti in letteratura, per dimostrare le potenzialità e le problematiche dell’algoritmo sviluppato , tali test sono a livello qualitativo , in quanto l’obbiettivo del lavoro di tesi è dimostrare come un approccio statistico può rivelarsi un’arma vincente e non quello di fornire un nuovo algoritmo utilizzabile nelle varie problematiche di clustering. Nel sesto capitolo saranno tratte le conclusioni sul lavoro svolto e saranno elencati i possibili interventi futuri dai quali la ricerca appena iniziata del clustering statistico potrebbe crescere.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

La Tesi tratta i concetti di Privacy e Protezione dei Dati personali, contestualizzandone il quadro normativo e tecnologico con particolare riferimento ai contesti emergenti rappresentati – per un verso – dalla proposta di nuovo Regolamento generale sulla protezione dei dati personali (redatto dal Parlamento Europeo e dal Consiglio dell’Unione Europea), – per un altro – dalla metodologia di progettazione del Privacy by Design e – per entrambi – dalla previsione di un nuovo attore: il responsabile per la protezione dei dati personali (Privacy Officer). L’elaborato si articola su tre parti oltre introduzione, conclusioni e riferimenti bibliografici. La prima parte descrive il concetto di privacy e le relative minacce e contromisure (tradizionali ed emergenti) con riferimento ai contesti di gestione (aziendale e Big Data) e al quadro normativo vigente. La seconda Parte illustra in dettaglio i principi e le prassi del Privacy by Design e la figura del Privacy Officer formalmente riconosciuta dal novellato giuridico. La terza parte illustra il caso di studio nel quale vengono analizzate tramite una tabella comparativa minacce e contromisure rilevabili in un contesto aziendale.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Sviluppo e analisi di un dataset campione, composto da circa 3 mln di entry ed estratto da un data warehouse di informazioni riguardanti il consumo energetico di diverse smart home.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The era of big data opens up new opportunities in personalised medicine, preventive care, chronic disease management and in telemonitoring and managing of patients with implanted devices. The rich data accumulating within online services and internet companies provide a microscope to study human behaviour at scale, and to ask completely new questions about the interplay between behavioural patterns and health. In this paper, we shed light on a particular aspect of data-driven healthcare: autonomous decision-making. We first look at three examples where we can expect data-driven decisions to be taken autonomously by technology, with no or limited human intervention. We then discuss some of the technical and practical challenges that can be expected, and sketch the research agenda to address them.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Sensor networks are increasingly becoming one of the main sources of Big Data on the Web. However, the observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse these data for other purposes than those for which they were originally set up. In this thesis we address these challenges, considering how we can transform streaming raw data to rich ontology-based information that is accessible through continuous queries for streaming data. Our main contribution is an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. We introduce novel query rewriting and data translation techniques that rely on mapping definitions relating streaming data models to ontological concepts. Specific contributions include: • The syntax and semantics of the SPARQLStream query language for ontologybased data access, and a query rewriting approach for transforming SPARQLStream queries into streaming algebra expressions. • The design of an ontology-based streaming data access engine that can internally reuse an existing data stream engine, complex event processor or sensor middleware, using R2RML mappings for defining relationships between streaming data models and ontology concepts. Concerning the sensor metadata of such streaming data sources, we have investigated how we can use raw measurements to characterize streaming data, producing enriched data descriptions in terms of ontological models. Our specific contributions are: • A representation of sensor data time series that captures gradient information that is useful to characterize types of sensor data. • A method for classifying sensor data time series and determining the type of data, using data mining techniques, and a method for extracting semantic sensor metadata features from the time series.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The electrical power distribution and commercialization scenario is evolving worldwide, and electricity companies, faced with the challenge of new information requirements, are demanding IT solutions to deal with the smart monitoring of power networks. Two main challenges arise from data management and smart monitoring of power networks: real-time data acquisition and big data processing over short time periods. We present a solution in the form of a system architecture that conveys real time issues and has the capacity for big data management.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Web of Data currently comprises ? 62 billion triples from more than 2,000 different datasets covering many fields of knowledge3. This volume of structured Linked Data can be seen as a particular case of Big Data, referred to as Big Semantic Data [4]. Obviously, powerful computational configurations are tradi- tionally required to deal with the scalability problems arising to Big Semantic Data. It is not surprising that this ?data revolution? has competed in parallel with the growth of mobile computing. Smartphones and tablets are massively used at the expense of traditional computers but, to date, mobile devices have more limited computation resources. Therefore, one question that we may ask ourselves would be: can (potentially large) semantic datasets be consumed natively on mobile devices? Currently, only a few mobile apps (e.g., [1, 9, 2, 8]) make use of semantic data that they store in the mobile devices, while many others access existing SPARQL endpoints or Linked Data directly. Two main reasons can be considered for this fact. On the one hand, in spite of some initial approaches [6, 3], there are no well-established triplestores for mobile devices. This is an important limitation because any po- tential app must assume both RDF storage and SPARQL resolution. On the other hand, the particular features of these devices (little storage space, less computational power or more limited bandwidths) limit the adoption of seman- tic data for different uses and purposes. This paper introduces our HDTourist mobile application prototype. It con- sumes urban data from DBpedia4 to help tourists visiting a foreign city. Although it is a simple app, its functionality allows illustrating how semantic data can be stored and queried with limited resources. Our prototype is implemented for An- droid, but its foundations, explained in Section 2, can be deployed in any other platform. The app is described in Section 3, and Section 4 concludes about our current achievements and devises the future work.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Over the last few years, the Data Center market has increased exponentially and this tendency continues today. As a direct consequence of this trend, the industry is pushing the development and implementation of different new technologies that would improve the energy consumption efficiency of data centers. An adaptive dashboard would allow the user to monitor the most important parameters of a data center in real time. For that reason, monitoring companies work with IoT big data filtering tools and cloud computing systems to handle the amounts of data obtained from the sensors placed in a data center.Analyzing the market trends in this field we can affirm that the study of predictive algorithms has become an essential area for competitive IT companies. Complex algorithms are used to forecast risk situations based on historical data and warn the user in case of danger. Considering that several different users will interact with this dashboard from IT experts or maintenance staff to accounting managers, it is vital to personalize it automatically. Following that line of though, the dashboard should only show relevant metrics to the user in different formats like overlapped maps or representative graphs among others. These maps will show all the information needed in a visual and easy-to-evaluate way. To sum up, this dashboard will allow the user to visualize and control a wide range of variables. Monitoring essential factors such as average temperature, gradients or hotspots as well as energy and power consumption and savings by rack or building would allow the client to understand how his equipment is behaving, helping him to optimize the energy consumption and efficiency of the racks. It also would help him to prevent possible damages in the equipment with predictive high-tech algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In September 2015, the UN Member States are expected to commit to an ambitious new set of global goals for a new era of sustainable development. Achieving them will require an unprecedented joint effort on the part of governments at every level, civil society and the private sector, and millions of individual choices and actions. To be realised, the SDGs will require a monitoring and accountability framework and a plan for implementation. A commitment to realise the opportunities of the data revolution should be firmly embedded into the action plan for the SDGs, to support those countries most in need of resources, and to set the world on track for an unprecedented push towards a new world of data for change.