889 resultados para Big data, Spark, Hadoop


Relevância:

100.00% 100.00%

Publicador:

Resumo:

66 p.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Comunicação apresentada na 44th SEFI Conference, 12-­15 September 2016, Tampere, Finland

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While governments are capturing as much data as they can about us for security reasons, private companies are doing the same in a practice that has become known as “Big Data”. Martin Hirst (p.19) explains how the data we generate is leading to a “surveillance economy” in which businesses build data profiles that enable them to target their advertising at people more effectively, while Benjamin Shiller (p.22) explains how your online purchasing decisions will enable merchants to alter their prices to extract the maximum value from individual customers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The intersection of network function virtualisation (NFV) technologies and big data has the potential of revolutionising today's telecommunication networks from deployment to operations resulting in significant reductions in capital expenditure (CAPEX) and operational expenditure, as well as cloud vendor and additional revenue growths for the operators. One of the contributions of this article is the comparisons of the requirements for big data and network virtualisation and the formulation of the key performance indicators for the distributed big data NFVs at the operator's infrastructures. Big data and virtualisation are highly interdependent and their intersections and dependencies are analysed and the potential optimisation gains resulted from open interfaces between big data and carrier networks NFV functional blocks for an adaptive environment are then discussed. Another contribution of this article is a comprehensive discussion on open interface recommendations which enables global collaborative and scalable virtualised big data applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data analytics for traffic accidents is a hot topic and has significant values for a smart and safe traffic in the city. Based on the massive traffic accident data from October 2014 to March 2015 in Xiamen, China, we propose a novel accident occurrences analytics method in both spatial and temporal dimensions to predict when and where an accident with a specific crash type will occur consequentially by whom. Firstly, we analyze and visualize accident occurrences in both temporal and spatial view. Second, we illustrate spatio-temporal visualization results through two case studies in multiple road segments, and the impact of weather on crash types. These findings of accident occurrences analysis and visualization would not only help traffic police department implement instant personnel assignments among simultaneous accidents, but also inform individual drivers about accident-prone sections and the time span which requires their most attention.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In current digital era according to (as far) massive progress and development of internet and online world technologies such as big and powerful data servers we face huge volume of information and data day by day from many different resources and services which was not available to human kind just a few decades ago. This data comes from available different online resources and services that are established to serve customers. Services and resources like Sensor Networks, Cloud Storages, Social Networks and etc., produce big volume of data and also need to manage and reuse that data or some analytical aspects of the data. Although this massive volume of data can be really useful for people and corporates it could be problematic as well. Therefore big volume of data or big data has its own deficiencies as well. They need big storage/s and this volume makes operations such as analytical operations, process operations, retrieval operations real difficult and hugely time consuming. One resolution to overcome these difficult problems is to have big data summarized so they would need less storage and extremely shorter time to get processed and retrieved. The summarized data will be then in "compact format" and still informative version of the entire data. Data summarization techniques aim then to produce a "good" quality of summaries. Therefore, they would hugely benefit everyone from ordinary users to researches and corporate world, as it can provide an efficient tool to deal with large data such as news (for new summarization).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Virtualization brought an immense commute in the modern technology especially in computer networks since last decade. The enormity of big data has led the massive graphs to be increased in size exponentially in recent years so that normal tools and algorithms are going weak to process it. Size diminution of the massive graphs is a big challenge in the current era and extraction of useful information from huge graphs is also problematic. In this paper, we presented a concept to design the virtual graph vGraph in the virtual plane above the original plane having original massive graph and proposed a novel cumulative similarity measure for vGraph. The use of vGraph is utile in lieu of massive graph in terms of space and time. Our proposed algorithm has two main parts. In the first part, virtual nodes are designed from the original nodes based on the calculation of cumulative similarity among them. In the second part, virtual edges are designed to link the virtual nodes based on the calculation of similarity measure among the original edges of the original massive graph. The algorithm is tested on synthetic and real-world datasets which shows the efficiency of our proposed algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The demand for data storage and processing is increasing at a rapid speed in the big data era. The management of such tremendous volume of data is a critical challenge to the data storage systems. Firstly, since 60% of the stored data is claimed to be redundant, data deduplication technology becomes an attractive solution to save storage space and traffic in a big data environment. Secondly, the security issues, such as confidentiality, integrity and privacy of the big data should also be considered for big data storage. To address these problems, convergent encryption is widely used to secure data deduplication for big data storage. Nonetheless, there still exist some other security issues, such as proof of ownership, key management and so on. In this chapter, we first introduce some major cyber attacks for big data storage. Then, we describe the existing fundamental security techniques, whose integration is essential for preventing data from existing and future security attacks. By discussing some interesting open problems, we finally expect to trigger more research efforts in this new research field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Because of the strong demands of physical resources of big data, it is an effective and efficient way to store and process big data in clouds, as cloud computing allows on-demand resource provisioning. With the increasing requirements for the resources provisioned by cloud platforms, the Quality of Service (QoS) of cloud services for big data management is becoming significantly important. Big data has the character of sparseness, which leads to frequent data accessing and processing, and thereby causes huge amount of energy consumption. Energy cost plays a key role in determining the price of a service and should be treated as a first-class citizen as other QoS metrics, because energy saving services can achieve cheaper service prices and environmentally friendly solutions. However, it is still a challenge to efficiently schedule Virtual Machines (VMs) for service QoS enhancement in an energy-aware manner. In this paper, we propose an energy-aware dynamic VM scheduling method for QoS enhancement in clouds over big data to address the above challenge. Specifically, the method consists of two main VM migration phases where computation tasks are migrated to servers with lower energy consumption or higher performance to reduce service prices and execution time. Extensive experimental evaluation demonstrates the effectiveness and efficiency of our method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, big data have become a hot research topic. The increasing amount of big data also increases the chance of breaching the privacy of individuals. Since big data require high computational power and large storage, distributed systems are used. As multiple parties are involved in these systems, the risk of privacy violation is increased. There have been a number of privacy-preserving mechanisms developed for privacy protection at different stages (e.g., data generation, data storage, and data processing) of a big data life cycle. The goal of this paper is to provide a comprehensive overview of the privacy preservation mechanisms in big data and present the challenges for existing mechanisms. In particular, in this paper, we illustrate the infrastructure of big data and the state-of-the-art privacy-preserving mechanisms in each stage of the big data life cycle. Furthermore, we discuss the challenges and future research directions related to privacy preservation in big data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the biggest concerns of big data is privacy. However, the study on big data privacy is still at a very early stage. We believe the forthcoming solutions and theories of big data privacy root from the in place research output of the privacy discipline. Motivated by these factors, we extensively survey the existing research outputs and achievements of the privacy field in both application and theoretical angles, aiming to pave a solid starting ground for interested readers to address the challenges in the big data case. We first present an overview of the battle ground by defining the roles and operations of privacy systems. Second, we review the milestones of the current two major research categories of privacy: data clustering and privacy frameworks. Third, we discuss the effort of privacy study from the perspectives of different disciplines, respectively. Fourth, the mathematical description, measurement, and modeling on privacy are presented. We summarize the challenges and opportunities of this promising topic at the end of this paper, hoping to shed light on the exciting and almost uncharted land.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Evolutionary algorithms (EAs) have recently been suggested as candidate for solving big data optimisation problems that involve very large number of variables and need to be analysed in a short period of time. However, EAs face scalability issue when dealing with big data problems. Moreover, the performance of EAs critically hinges on the utilised parameter values and operator types, thus it is impossible to design a single EA that can outperform all other on every problem instances. To address these challenges, we propose a heterogeneous framework that integrates a cooperative co-evolution method with various types of memetic algorithms. We use the cooperative co-evolution method to split the big problem into sub-problems in order to increase the efficiency of the solving process. The subproblems are then solved using various heterogeneous memetic algorithms. The proposed heterogeneous framework adaptively assigns, for each solution, different operators, parameter values and local search algorithm to efficiently explore and exploit the search space of the given problem instance. The performance of the proposed algorithm is assessed using the Big Data 2015 competition benchmark problems that contain data with and without noise. Experimental results demonstrate that the proposed algorithm, with the cooperative co-evolution method, performs better than without cooperative co-evolution method. Furthermore, it obtained very competitive results for all tested instances, if not better, when compared to other algorithms using a lower computational times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k-means model. Specifically, we use k-means, single pass k-means, online k-means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k-means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4 292 637 samples in 41 dimensions) in 76 s.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La mia tesi si occupa di trattare come, attraverso questo nuovo prodotto dell’informatica chiamato big data, si possano ottenere informazioni e fare previsioni sull’andamento del turismo.