873 resultados para big data analytics


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data presents a remarkable opportunity for organisations to obtain critical intelligence to drive decisions and obtain insights as never before. However, big data generates high network traffic. Moreover, the continuous growth in the variety of network traffic due to big data variety has rendered the network to be one of the key big data challenges. In this article, we present a comprehensive analysis of big data variety and its adverse effects on the network performance. We present taxonomy of big data variety and discuss various dimensions of the big data variety features. We also discuss how the features influence the interconnection network requirements. Finally, we discuss some of the challenges each big data variety dimension presents and possible approach to address them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stochastic search techniques such as evolutionary algorithms (EA) are known to be better explorer of search space as compared to conventional techniques including deterministic methods. However, in the era of big data like most other search methods and learning algorithms, suitability of evolutionary algorithms is naturally questioned. Big data pose new computational challenges including very high dimensionality and sparseness of data. Evolutionary algorithms' superior exploration skills should make them promising candidates for handling optimization problems involving big data. High dimensional problems introduce added complexity to the search space. However, EAs need to be enhanced to ensure that majority of the potential winner solutions gets the chance to survive and mature. In this paper we present an evolutionary algorithm with enhanced ability to deal with the problems of high dimensionality and sparseness of data. In addition to an informed exploration of the solution space, this technique balances exploration and exploitation using a hierarchical multi-population approach. The proposed model uses informed genetic operators to introduce diversity by expanding the scope of search process at the expense of redundant less promising members of the population. Next phase of the algorithm attempts to deal with the problem of high dimensionality by ensuring broader and more exhaustive search and preventing premature death of potential solutions. To achieve this, in addition to the above exploration controlling mechanism, a multi-tier hierarchical architecture is employed, where, in separate layers, the less fit isolated individuals evolve in dynamic sub-populations that coexist alongside the original or main population. Evaluation of the proposed technique on well known benchmark problems ascertains its superior performance. The algorithm has also been successfully applied to a real world problem of financial portfolio management. Although the proposed method cannot be considered big data-ready, it is certainly a move in the right direction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, the Big Data paradigm has received considerable attention since it gives a great opportunity to mine knowledge from massive amounts of data. However, the new mined knowledge will be useless if data is fake, or sometimes the massive amounts of data cannot be collected due to the worry on the abuse of data. This situation asks for new security solutions. On the other hand, the biggest feature of Big Data is "massive", which requires that any security solution for Big Data should be "efficient". In this paper, we propose a new identity-based generalized signcryption scheme to solve the above problems. In particular, it has the following two properties to fit the efficiency requirement. (1) It can work as an encryption scheme, a signature scheme or a signcryption scheme as per need. (2) It does not have the heavy burden on the complicated certificate management as the traditional cryptographic schemes. Furthermore, our proposed scheme can be proven-secure in the standard model. © 2014 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Smart grid is a technological innovation that improves efficiency, reliability, economics, and sustainability of electricity services. It plays a crucial role in modern energy infrastructure. The main challenges of smart grids, however, are how to manage different types of front-end intelligent devices such as power assets and smart meters efficiently; and how to process a huge amount of data received from these devices. Cloud computing, a technology that provides computational resources on demands, is a good candidate to address these challenges since it has several good properties such as energy saving, cost saving, agility, scalability, and flexibility. In this paper, we propose a secure cloud computing based framework for big data information management in smart grids, which we call 'Smart-Frame.' The main idea of our framework is to build a hierarchical structure of cloud computing centers to provide different types of computing services for information management and big data analysis. In addition to this structural framework, we present a security solution based on identity-based encryption, signature and proxy re-encryption to address critical security issues of the proposed framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a leading framework for processing and analyzing big data, MapReduce is leveraged by many enterprises to parallelize their data processing on distributed computing systems. Unfortunately, the all-to-all data forwarding from map tasks to reduce tasks in the traditional MapReduce framework would generate a large amount of network traffic. The fact that the intermediate data generated by map tasks can be combined with significant traffic reduction in many applications motivates us to propose a data aggregation scheme for MapReduce jobs in cloud. Specifically, we design an aggregation architecture under the existing MapReduce framework with the objective of minimizing the data traffic during the shuffle phase, in which aggregators can reside anywhere in the cloud. Some experimental results also show that our proposal outperforms existing work by reducing the network traffic significantly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data is the large or complex data that exceed the processing capacity of conventional data processing systems. This book provides a big picture in this broad research area, covering all the phases of its value chains. The authors have attempted to survey most of the relevant technologies in each phrase of big data. The book is recommended for readers interested in advanced research in big data, also for industry practitioners who are interested in building big data applications. If the reader is not with necessary technical background, complementary readings may be needed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data is an emerging hot research topic due to its pervasive application in human society, such as government, climate, finance, and science. Currently, most research work on big data falls in data mining, machine learning, and data analysis. However, these amazing top-level killer applications would not be possible without the underneath support of networking due to their extremely large volume and computing complexity, especially when real-time or near-real-time applications are demanded.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O estudo da FGV Projetos, coordenado pelo economista Fernando Blumenschein, desenvolve um quadro metodológico específico para compras governamentais com base em pesquisa realizada para o Fundo Nacional de Desenvolvimento da Educação (FNDE). Este estudo destaca o potencial de uso dos conceitos da teoria dos leilões, juntamente com métodos de análise de "Big Data" na formação de sessões públicas.