100 resultados para big data analytics

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data analytics for traffic accidents is a hot topic and has significant values for a smart and safe traffic in the city. Based on the massive traffic accident data from October 2014 to March 2015 in Xiamen, China, we propose a novel accident occurrences analytics method in both spatial and temporal dimensions to predict when and where an accident with a specific crash type will occur consequentially by whom. Firstly, we analyze and visualize accident occurrences in both temporal and spatial view. Second, we illustrate spatio-temporal visualization results through two case studies in multiple road segments, and the impact of weather on crash types. These findings of accident occurrences analysis and visualization would not only help traffic police department implement instant personnel assignments among simultaneous accidents, but also inform individual drivers about accident-prone sections and the time span which requires their most attention.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is written through the vision on integrating Internet-of-Things (IoT) with the power of Cloud Computing and the intelligence of Big Data analytics. But integration of all these three cutting edge technologies is complex to understand. In this research we first provide a security centric view of three layered approach for understanding the technology, gaps and security issues. Then with a series of lab experiments on different hardware, we have collected performance data from all these three layers, combined these data together and finally applied modern machine learning algorithms to distinguish 18 different activities and cyber-attacks. From our experiments we find classification algorithm RandomForest can identify 93.9% attacks and activities in this complex environment. From the existing literature, no one has ever attempted similar experiment for cyber-attack detection for IoT neither with performance data nor with a three layered approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data analytics has shown great potential in optimizing operations, making decisions, spotting business trends, preventing threats, and capitalizing on new sources of revenues in various fields such as manufacturing, healthcare, finance, insurance, and retail. The management of various networks has become inefficient and difficult because of their high complexities and interdependencies. Big data, in forms of device logs, software logs, media content, and sensed data, provide rich information and facilitate a fundamentally different and novel approach to explore, design, and develop reliable and scalable networks. This Special Issue covers the most recent research results that address challenges of big data for networking. We received 45 submissions, and ultimately nine high quality papers, organized into two groups, have been selected for inclusion in this Special Issue.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the advance of the Internet of Things (IoT), more M2M sensors and devices are connected to the Internet. These sensors and devices generate sensor-based big data and bring new business opportunities and demands for creating and developing sensor-oriented big data infrastructures, platforms and analytics service applications. Big data sensing is becoming a new concept and next technology trend based on a connected sensor world because of IoT. It brings a strong impact on many sensor-oriented applications, including smart city, disaster control and monitor, healthcare services, and environment protection and climate change study. This paper is written as a tutorial paper by providing the informative concepts and taxonomy on big data sensing and services. The paper not only discusses the motivation, research scope, and features of big data sensing and services, but also exams the required services in big data sensing based on the state-of-the-art research work. Moreover, the paper discusses big data sensing challenges, issues, and needs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k-means model. Specifically, we use k-means, single pass k-means, online k-means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k-means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4 292 637 samples in 41 dimensions) in 76 s.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the arrival of Big Data Era, properly utilizing the power of big data is becoming increasingly essential for the strength and competitiveness of businesses and organizations. We are facing grand challenges from big data from different perspectives, such as processing, communication, security, and privacy. In this talk, we discuss the big data challenges in network traffic classification and our solutions to the challenges. The significance of the research lies in the fact that each year the network traffic increase exponentially on the current Internet. Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine-learning techniques to flow statistical feature based classification methods. In this talk, we propose a series of novel approaches for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approaches and their performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic datasets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples. Our work has significant impact on security applications.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for big data. These classifiers are very large, but are quite easy to generate and use. They can be so large that it makes sense to use them only for big data. They are generated automatically as a result of several iterations in applying ensemble meta classifiers. They incorporate diverse ensemble meta classifiers into several tiers simultaneously and combine them into one automatically generated iterative system so that many ensemble meta classifiers function as integral parts of other ensemble meta classifiers at higher tiers. In this paper, we carry out a comprehensive investigation of the performance of LIME classifiers for a problem concerning security of big data. Our experiments compare LIME classifiers with various base classifiers and standard ordinary ensemble meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of classifications. LIME classifiers performed better than the base classifiers and standard ensemble meta classifiers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data presents a remarkable opportunity for organisations to obtain critical intelligence to drive decisions and obtain insights as never before. However, big data generates high network traffic. Moreover, the continuous growth in the variety of network traffic due to big data variety has rendered the network to be one of the key big data challenges. In this article, we present a comprehensive analysis of big data variety and its adverse effects on the network performance. We present taxonomy of big data variety and discuss various dimensions of the big data variety features. We also discuss how the features influence the interconnection network requirements. Finally, we discuss some of the challenges each big data variety dimension presents and possible approach to address them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stochastic search techniques such as evolutionary algorithms (EA) are known to be better explorer of search space as compared to conventional techniques including deterministic methods. However, in the era of big data like most other search methods and learning algorithms, suitability of evolutionary algorithms is naturally questioned. Big data pose new computational challenges including very high dimensionality and sparseness of data. Evolutionary algorithms' superior exploration skills should make them promising candidates for handling optimization problems involving big data. High dimensional problems introduce added complexity to the search space. However, EAs need to be enhanced to ensure that majority of the potential winner solutions gets the chance to survive and mature. In this paper we present an evolutionary algorithm with enhanced ability to deal with the problems of high dimensionality and sparseness of data. In addition to an informed exploration of the solution space, this technique balances exploration and exploitation using a hierarchical multi-population approach. The proposed model uses informed genetic operators to introduce diversity by expanding the scope of search process at the expense of redundant less promising members of the population. Next phase of the algorithm attempts to deal with the problem of high dimensionality by ensuring broader and more exhaustive search and preventing premature death of potential solutions. To achieve this, in addition to the above exploration controlling mechanism, a multi-tier hierarchical architecture is employed, where, in separate layers, the less fit isolated individuals evolve in dynamic sub-populations that coexist alongside the original or main population. Evaluation of the proposed technique on well known benchmark problems ascertains its superior performance. The algorithm has also been successfully applied to a real world problem of financial portfolio management. Although the proposed method cannot be considered big data-ready, it is certainly a move in the right direction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, the Big Data paradigm has received considerable attention since it gives a great opportunity to mine knowledge from massive amounts of data. However, the new mined knowledge will be useless if data is fake, or sometimes the massive amounts of data cannot be collected due to the worry on the abuse of data. This situation asks for new security solutions. On the other hand, the biggest feature of Big Data is "massive", which requires that any security solution for Big Data should be "efficient". In this paper, we propose a new identity-based generalized signcryption scheme to solve the above problems. In particular, it has the following two properties to fit the efficiency requirement. (1) It can work as an encryption scheme, a signature scheme or a signcryption scheme as per need. (2) It does not have the heavy burden on the complicated certificate management as the traditional cryptographic schemes. Furthermore, our proposed scheme can be proven-secure in the standard model. © 2014 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador: