875 resultados para big data storage


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In big data analysis, frequent itemsets mining plays a key role in mining associations, correlations and causality. Since some traditional frequent itemsets mining algorithms are unable to handle massive small files datasets effectively, such as high memory cost, high I/O overhead, and low computing performance, we propose a novel parallel frequent itemsets mining algorithm based on the FP-Growth algorithm and discuss its applications in this paper. First, we introduce a small files processing strategy for massive small files datasets to compensate defects of low read-write speed and low processing efficiency in Hadoop. Moreover, we use MapReduce to redesign the FP-Growth algorithm for implementing parallel computing, thereby improving the overall performance of frequent itemsets mining. Finally, we apply the proposed algorithm to the association analysis of the data from the national college entrance examination and admission of China. The experimental results show that the proposed algorithm is feasible and valid for a good speedup and a higher mining efficiency, and can meet the actual requirements of frequent itemsets mining for massive small files datasets. © 2014 ISSN 2185-2766.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the arrival of big data era, the Internet traffic is growing exponentially. A wide variety of applications arise on the Internet and traffic classification is introduced to help people manage the massive applications on the Internet for security monitoring and quality of service purposes. A large number of Machine Learning (ML) algorithms are introduced to deal with traffic classification. A significant challenge to the classification performance comes from imbalanced distribution of data in traffic classification system. In this paper, we proposed an Optimised Distance-based Nearest Neighbor (ODNN), which has the capability of improving the classification performance of imbalanced traffic data. We analyzed the proposed ODNN approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments were implemented on the real-world traffic dataset. The results show that the performance of “small classes” can be improved significantly even only with small number of training data and the performance of “large classes” remains stable.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Hadoop framework provides a powerful way to handle Big Data. Since Hadoop has inherent defects of high memory overhead and low computing performance in processing massive small files, we implement three methods and propose two strategies for solving small files problem in this paper. First, we implement three methods, i.e., Hadoop Archives (HAR), Sequence Files (SF) and CombineFileInputFormat (CFIF), to compensate the existing defects of Hadoop. Moreover, we propose two strategies for meeting the actual needs of different users. Finally, we evaluate the efficiency of the implemented methods and the validity of the proposed strategies. The experimental results show that our methods and strategies can improve the efficiency of massive small files processing, thereby enhancing the overall performance of Hadoop. © 2014 ISSN 1881-803X.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Multidimensional WSNs are deployed in complex environments to sense and collect data relating to multiple attributes (multi-dimensional data). Such networks present unique challenges to data dissemination, data storage and in-network query processing (information discovery). Recent algorithms proposed for such WSNs are aimed at achieving better energy efficiency and minimizing latency. This creates a partitioned network area due to the overuse of certain nodes in areas which are on the shortest or closest or path to the base station or data aggregation points which results in hotspots nodes. In this paper, we propose a time-based multi-dimensional, multi-resolution storage approach for range queries that balances the energy consumption by balancing the traffic load as uniformly as possible. Thus ensuring a maximum network lifetime. We present simulation results to show that the proposed approach to information discovery offers significant improvements on information discovery latency compared with current approaches. In addition, the results prove that the Quality of Service (QoS) improvements reduces hotspots thus resulting in significant network-wide energy saving and an increased network lifetime.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cloud and service computing has started to change the way research in science, in particular biology and medicine, is being carried out. Researchers that have taken advantage of this technology (making use of public and private cloud compute resources) can process large amounts of data (big data) and speed up discovery. However, this requires researchers to acquire a solid knowledge and skills in the development of sequential and high performance computing (HPC), and cloud development and deployment background. In response a technology exposing HPC applications as services through the development and deployment of a SaaS cloud, and its proof of concept in the form of implementation of a cloud environment, Uncinus, has been developed and implemented to allow researchers easy access to cloud computing resources. The new technology offers and Uncinus supports the development of applications as services and the sharing of compute resources to speed up applications' execution. Users access these cloud resources and services through web interfaces. Using the Uncinus platform, a bio-informatics workflow was executed on a private (HPC) cloud, server and public cloud (Amazon EC2) resources, performance results showing a 3 fold improvement compared to local resources' performance. Biology and medicine specialists with no programming and application deployment on clouds background could run the case study applications with ease.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Autonomous Wireless sensor networks(WSNs) have sensors that are usually deployed randomly to monitor one or more phenomena. They are attractive for information discovery in large-scale data rich environments and can add value to mission–critical applications such as battlefield surveillance and emergency response systems. However, in order to fully exploit these networks for such applications, energy efficient, load balanced and scalable solutions for information discovery are essential. Multi-dimensional autonomous WSNs are deployed in complex environments to sense and collect data relating to multiple attributes (multi-dimensional data). Such networks present unique challenges to data dissemination, data storage of in-network information discovery. In this paper, we propose a novel method for information discovery for multi-dimensional autonomous WSNs which sensors are deployed randomly that can significantly increase network lifetime and minimize query processing latency, resulting in quality of service (QoS) improvements that are of immense benefit to mission–critical applications. We present simulation results to show that the proposed approach to information discovery offers significant improvements on query resolution latency compared with current approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals’ carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD—their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Smartphone technology has become more popular and innovative over the last few years, and technology companies are now introducing wearable devices into the market. By emerging and converging with technologies such as Cloud, Internet of Things (IoT) and Virtualization, requirements to personal sensor devices are immense and essential to support existing networks, e.g. mobile health (mHealth) as well as IoT users. Traditional physiological and biological medical sensors in mHealth provide health data either periodically or on-demand. Both of these situations can cause rapid battery consumption, consume significant bandwidth, and raise privacy issues, because these sensors do not consider or understand sensor status when converged together. The aim of this research is to provide a novel approach and solution to managing and controlling personal sensors that can be used in various areas such as the health, military, aged care, IoT and sport. This paper presents an inference system to transfer health data collected by personal sensors efficiently and effectively to other networks in a secure and effective manner without burdening workload on sensor devices.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Due to the potential important information in real world networks, link prediction has become an interesting focus of different branches of science. Nevertheless, in "big data" era, link prediction faces significant challenges, such as how to predict the massive data efficiently and accurately. In this paper, we propose two novel node-coupling clustering approaches and their extensions for link prediction, which combine the coupling degrees of the common neighbor nodes of a predicted node-pair with cluster geometries of nodes. We then present an experimental evaluation to compare the prediction accuracy and effectiveness between our approaches and the representative existing methods on two synthetic datasets and six real world datasets. The experimental results show our approaches outperform the existing methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Orange may be the new black, but as I have seen only five minutes of that show, I can’t really use it here. Besides, based on the five minutes I saw, I would assume it is a series written by males. Not since the Victoria’s Secret catalog have I seen so many women wearing fewer clothes, or engaging in so many unmentionable acts. I’ll stop there because my Victorianism is showing, I’m sure.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Uma linguagem orientada ao problema de projeto estrutural de edifícios e a correspondente estrutura de armazenamento de dados são apresentados, como núcleo principal do sistema PROADE. Objetiva-se assim permitir ao engenheiro estrutural descrever o problema em termos correntes de Engenharia, organizandose os dados recebidos para posterior análise e dimensionamento da estrutura. São discutidos o problema PROADE e os dados correspondentes, seguidos pela descrição das estruturas de armazenamento de dados do sistema. A seguir, define-se a linguagem PROADE e finalmente apresenta-se a organização do sistema PROADE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Comércio eletrônico é um assunto que envolve múltiplos aspectos relacionados à utilização de infra-estrutura digital para suportar a transação de negócios. V árias são as formas de se exemplificar a aplicação de comércio eletrônico, como, por exemplo: utilização de quiosques de auto atendimento para a aquisição de refrigerantes, cartões telefônicos e café, obtenção de saldos bancários e pagamento de contas; utilização de armazenamento de dados de clientes que para fins de implantação de um programa de marketing de relacionamento; utilização de redes que interliguem organizações diversas, internamente ou externamente, para fins de otimização logística e permitir o fluxo de dados necessários à gestão das organizações Apesar das diversas possibilidades de se adotar práticas de comércio eletrônico, não se deve esperar que essas práticas sejam passíveis de serem replicadas, genericamente, por todas as organizações, pois estas diferem em sua composição, no que se trata das suas culturas, estruturas, estratégias e outros componentes. Devido ao caráter amplo do tema comércio eletrônico, este trabalho traz uma abordagem conceitual do mesmo e algumas das suas aplicações nas áreas de marketing, logística e governo eletrônico; apresenta alguns comentários sobre sistemas de informações e tecnologias de comunicação que os suportam; caracteriza as diferenças que existem entre as organizações, utilizando-se de um modelo organizacional que retrata as organizações como sendo um conjunto de forças: cultura e estrutura, estratégia, pessoas e seus papéis e tecnologia, em equilíbrio dinâmico entre si e inseridas no ambiente social, tecnológico, econômico e político e, como exemplo, infere a respeito de possíveis resultados que podem ser esperados a partir da adoção da modalidade de licitação pregão, eletrônico ou presencial, no âmbito de organizações militares do Exército Brasileiro, no que se refere à cultura, às pessoas e seus papéis e à tecnologia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O maior acesso dos brasileiros à internet e o aumento do volume de conteúdo disseminado pela web têm atraído atenção para as análises de big data. Mercado, imprensa e governos recorrem cada vez mais a técnicas de análise de rede para apoiar decisões. Mas essa prática embute riscos de manipulação pouco considerados. A DAPP/FGV — parceira do GLOBO no monitoramento de redes — tem desenvolvido mecanismos próprios de filtragem e identificou ao menos 25% de “lixo on-line” em pesquisas feitas nas duas últimas semanas.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Na moderna Economia do Conhecimento, na Era do Big Data, entender corretamente o uso e a gestão da Tecnologia de Informação e Comunicação (TIC) tendo como base o campo acadêmico de estudos de Sistemas de Informação (SI), torna-se cada vez mais relevante e estratégico para as organizações que pretendem: permanecer em atividade, estar aptas para atender novas demandas (internas e externas) e enfrentar as complexas mudanças na competição de mercado. Esta pesquisa utiliza a teoria dos estágios de crescimento, fundamentada pelos estudos de Richard L. Nolan nos anos 70. A literatura acadêmica relacionada com modelos de estágios de crescimento e o contexto do campo de estudo de SI, fornecem as bases conceituais deste estudo. A pesquisa identifica um modelo com seus construtos relacionados aos estágios de crescimento das iniciativas da TIC/SI organizacional, partindo das variáveis de benchmark de segundo nível de Nolan, e propõe sua operacionalização com a criação e desenvolvimento de uma escala. De caráter exploratório e descritivo, a pesquisa traz contribuição teórica ao paradigma da teoria dos estágios de crescimento, adicionando um novo processo de crescimento em sua estrutura conceitual. Como resultado, é disponibilizado além de um instrumento de escala bilíngue (português e inglês), recomendações e regras para aplicação de um instrumento de pesquisa do tipo survey, na continuidade deste estudo. Como implicação geral desta pesquisa, é esperado que seu uso e aplicação ao mensurar a avaliação do nível de estágio da TIC/SI em organizações, possam auxiliar dois perfis de indivíduos: acadêmicos que estudam essa temática, assim como, profissionais que buscam respostas de suas ações práticas nas organizações onde trabalham.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Our focus is on information in expectation surveys that can now be built on thousands (or millions) of respondents on an almost continuous-time basis (big data) and in continuous macroeconomic surveys with a limited number of respondents. We show that, under standard microeconomic and econometric techniques, survey forecasts are an affine function of the conditional expectation of the target variable. This is true whether or not the survey respondent knows the data-generating process (DGP) of the target variable or the econometrician knows the respondents individual loss function. If the econometrician has a mean-squared-error risk function, we show that asymptotically efficient forecasts of the target variable can be built using Hansens (Econometrica, 1982) generalized method of moments in a panel-data context, when N and T diverge or when T diverges with N xed. Sequential asymptotic results are obtained using Phillips and Moon s (Econometrica, 1999) framework. Possible extensions are also discussed.