94 resultados para big data storage


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cloud and service computing has started to change the way research in science, in particular biology and medicine, is being carried out. Researchers that have taken advantage of this technology (making use of public and private cloud compute resources) can process large amounts of data (big data) and speed up discovery. However, this requires researchers to acquire a solid knowledge and skills in the development of sequential and high performance computing (HPC), and cloud development and deployment background. In response a technology exposing HPC applications as services through the development and deployment of a SaaS cloud, and its proof of concept in the form of implementation of a cloud environment, Uncinus, has been developed and implemented to allow researchers easy access to cloud computing resources. The new technology offers and Uncinus supports the development of applications as services and the sharing of compute resources to speed up applications' execution. Users access these cloud resources and services through web interfaces. Using the Uncinus platform, a bio-informatics workflow was executed on a private (HPC) cloud, server and public cloud (Amazon EC2) resources, performance results showing a 3 fold improvement compared to local resources' performance. Biology and medicine specialists with no programming and application deployment on clouds background could run the case study applications with ease.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Autonomous Wireless sensor networks(WSNs) have sensors that are usually deployed randomly to monitor one or more phenomena. They are attractive for information discovery in large-scale data rich environments and can add value to mission–critical applications such as battlefield surveillance and emergency response systems. However, in order to fully exploit these networks for such applications, energy efficient, load balanced and scalable solutions for information discovery are essential. Multi-dimensional autonomous WSNs are deployed in complex environments to sense and collect data relating to multiple attributes (multi-dimensional data). Such networks present unique challenges to data dissemination, data storage of in-network information discovery. In this paper, we propose a novel method for information discovery for multi-dimensional autonomous WSNs which sensors are deployed randomly that can significantly increase network lifetime and minimize query processing latency, resulting in quality of service (QoS) improvements that are of immense benefit to mission–critical applications. We present simulation results to show that the proposed approach to information discovery offers significant improvements on query resolution latency compared with current approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals’ carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD—their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Smartphone technology has become more popular and innovative over the last few years, and technology companies are now introducing wearable devices into the market. By emerging and converging with technologies such as Cloud, Internet of Things (IoT) and Virtualization, requirements to personal sensor devices are immense and essential to support existing networks, e.g. mobile health (mHealth) as well as IoT users. Traditional physiological and biological medical sensors in mHealth provide health data either periodically or on-demand. Both of these situations can cause rapid battery consumption, consume significant bandwidth, and raise privacy issues, because these sensors do not consider or understand sensor status when converged together. The aim of this research is to provide a novel approach and solution to managing and controlling personal sensors that can be used in various areas such as the health, military, aged care, IoT and sport. This paper presents an inference system to transfer health data collected by personal sensors efficiently and effectively to other networks in a secure and effective manner without burdening workload on sensor devices.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Due to the potential important information in real world networks, link prediction has become an interesting focus of different branches of science. Nevertheless, in "big data" era, link prediction faces significant challenges, such as how to predict the massive data efficiently and accurately. In this paper, we propose two novel node-coupling clustering approaches and their extensions for link prediction, which combine the coupling degrees of the common neighbor nodes of a predicted node-pair with cluster geometries of nodes. We then present an experimental evaluation to compare the prediction accuracy and effectiveness between our approaches and the representative existing methods on two synthetic datasets and six real world datasets. The experimental results show our approaches outperform the existing methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Distributed caching-empowered wireless networks can greatly improve the efficiency of data storage and transmission and thereby the users' quality of experience (QoE). However, how this technology can alleviate the network access pressure while ensuring the consistency of content delivery is still an open question, especially in the case where the users are in fast motion. Therefore, in this paper, we investigate the caching issue emerging from a forthcoming scenario where vehicular video streaming is performed under cellular networks. Specifically, a QoE centric distributed caching approach is proposed to fulfill as many users' requests as possible, considering the limited caching space of base stations and basic user experience guarantee. Firstly, a QoE evaluation model is established using verified empirical data. Also, the mathematic relationship between the streaming bit rate and actual storage space is developed. Then, the distributed caching management for vehicular video streaming is formulated as a constrained optimization problem and solved with the generalized-reduced gradient method. Simulation results indicate that our approach can improve the users' satisfaction ratio by up to 40%.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Big Data technologies are exciting cutting-edge technologies that generate, collect, store and analyse tremendous amount of data. Like any other IT revolution, Big Data technologies also have big challenges that are obstructing it to be adopted by wider community or perhaps impeding to extract value from Big Data with pace and accuracy it is promising. In this paper we first offer an alternative view of «Big Data Cloud» with the main aim to make this complex technology easy to understand for new researchers and identify gaps efficiently. In our lab experiment, we have successfully implemented cyber-attacks on Apache Hadoop's management interface «Ambari». On our thought about «attackers only need one way in», we have attacked the Apache Hadoop's management interface, successfully turned down all communication between Ambari and Hadoop's ecosystem and collected performance data from Ambari Virtual Machine (VM) and Big Data Cloud hypervisor. We have also detected these cyber-attacks with 94.0187% accurateness using modern machine learning algorithms. From the existing researchs, no one has ever attempted similar experimentation in detection of cyber-attacks on Hadoop using performance data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Conducting research in the rapidly evolving fields constituting the digital social sciences raises challenging ethical and technical issues, especially when the subject matter includes activities of stigmatised populations. Our study of a dark-web drug-use community provides a case example of ‘how to’ conduct studies in digital environments where sensitive and illicit activities are discussed. In this paper we present the workflow from our digital ethnography and consider the consequences of particular choices of action upon knowledge production. Key considerations that our workflow responded to include adapting to volatile field-sites, researcher safety in digital environments, data security and encryption, and ethical-legal challenges. We anticipate that this workflow may assist other researchers to emulate, test and adapt our approach to the diverse range of illicit studies online. In this paper we argue that active engagement with stigmatised communities through multi-sited digital ethnography can complement and augment the findings of digital trace analyses.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The proliferation of cloud computing allows users to flexibly store, re-compute or transfer large generated datasets with multiple cloud service providers. However, due to the pay-As-you-go model, the total cost of using cloud services depends on the consumption of storage, computation and bandwidth resources which are three key factors for the cost of IaaS-based cloud resources. In order to reduce the total cost for data, given cloud service providers with different pricing models on their resources, users can flexibly choose a cloud service to store a generated dataset, or delete it and choose a cloud service to regenerate it whenever reused. However, finding the minimum cost is a complicated yet unsolved problem. In this paper, we propose a novel algorithm that can calculate the minimum cost for storing and regenerating datasets in clouds, i.e. whether datasets should be stored or deleted, and furthermore where to store or to regenerate whenever they are reused. This minimum cost also achieves the best trade-off among computation, storage and bandwidth costs in multiple clouds. Comprehensive analysis and rigid theorems guarantee the theoretical soundness of the paper, and general (random) simulations conducted with popular cloud service providers' pricing models demonstrate the excellent performance of our approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With multimedia dominating the digital contents, Device-To-Device (D2D) communication has been proposed as a promising data offloading solution in the big data area. As the quality of experience (QoE) is a major determining factor in the success of new multimedia applications, we propose a QoEdriven cooperative content dissemination (QeCS) scheme in this work. Specifically, all users predict the QoE of the potential connections characterized by the mean opinion score (MOS), and send the results to the content provider (CP). Then CP formulates a weighted directed graph according to the network topology and MOS of each potential connection. In order to stimulate cooperation among the users, the content dissemination mechanism is designed through seeking 1-factor of the weighted directed graph with the maximum weight thus achieving maximum total user MOS. Additionally, a debt mechanism is adopted to combat the cheat attacks. Furthermore, we extend the proposed QeCS scheme by considering a constrained condition to the optimization problem for fairness improvement. Extensive simulation results demonstrate that the proposed QeCS scheme achieves both efficiency and fairness especially in large scale and density networks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Scale-free networks are often used to model a wide range of real-world networks, such as social, technological, and biological networks. Understanding the structure of scale-free networks evolves into a big data problem for business, management, and protein function prediction. In the past decade, there has been a surge of interest in exploring the properties of scale-free networks. Two interesting properties have attracted much attention: the assortative mixing and community structure. However, these two properties have been studied separately in either theoretical models or real-world networks. In this paper, we show that the structural features of communities are highly related with the assortative mixing in scale-free networks. According to the value of assortativity coefficient, scale-free networks can be categorized into assortative, disassortative, and neutral networks, respectively. We systematically analyze the community structure in these three types of scale-free networks through six metrics: node embeddedness, link density, hub dominance, community compactness, the distribution of community sizes, and the presence of hierarchical communities. We find that the three types of scale-free networks exhibit significant differences in these six metrics of community structures. First, assortative networks present high embeddedness, meaning that many links lying within communities but few links lying between communities. This leads to the high link density of communities. Second, disassortative networks exhibit great hubs in communities, which results in the high compactness of communities that nodes can reach each other via short paths. Third, in neutral networks, a big portion of links act as community bridges, so they display sparse and less compact communities. In addition, we find that (dis)assortative networks show hierarchical community structure with power-law-distributed community sizes, while neutral networks present no hierarchy. Understanding the structure of communities from the angle of assortative mixing patterns of nodes can provide insights into the network structure and guide us in modeling information propagation in different categories of scale-free networks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cloud computing systems and services have become major targets for cyberattackers. To provide strong protection of cloud platforms, infrastructure, hosted applications, and data stored in the cloud, we need to address the security issue from a range of perspectives-from secure data and application outsourcing, to anonymous communication, to secure multiparty computation. This special issue on cloud security aims to address the importance of protecting and securing cloud platforms, infrastructures, hosted applications, and data storage.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs.

OBJECTIVE: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence.

METHODS: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method.

RESULTS: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models.

CONCLUSIONS: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The main problem in data grids is how to provide good and timely access to huge data given the limited number and size of storage devices and high latency of the interconnection network. One approach to address this problem is to cache the files locally such that remote access overheads are avoided. Caching requires a cache-replacement algorithm, which is the focus of this paper. Specifically, we propose a new replacement policy and compare it with an existing policy using simulations. The results of the simulation show that the proposed policy performs better than the baseline policy.