Biblioteca Digital

72 resultados para bigdata, data stream processing, dsp, apache storm, cyber security

em Deakin Research Online - Australia

A general communication cost optimization framework for big data stream processing in geo-distributed data centers

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.

Machine-to-machine communications with in-network data aggregation, processing, and actuation for large-scale cyber-physical systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine-to-Machine (M2M) paradigm enables machines (sensors, actuators, robots, and smart meter readers) to communicate with each other with little or no human intervention. M2M is a key enabling technology for the cyber-physical systems (CPSs). This paper explores CPS beyond M2M concept and looks at futuristic applications. Our vision is CPS with distributed actuation and in-network processing. We describe few particular use cases that motivate the development of the M2M communication primitives tailored to large-scale CPS. M2M communications in literature were considered in limited extent so far. The existing work is based on small-scale M2M models and centralized solutions. Different sources discuss different primitives. Few existing decentralized solutions do not scale well. There is a need to design M2M communication primitives that will scale to thousands and trillions of M2M devices, without sacrificing solution quality. The main paradigm shift is to design localized algorithms, where CPS nodes make decisions based on local knowledge. Localized coordination and communication in networked robotics, for matching events and robots, were studied to illustrate new directions.

Anomaly detection in large-scale data stream networks

Relevância:

100.00% 100.00%

Publicador:

Adaptive spike detection for resilient data stream mining

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automated adversarial detection systems can fail when under attack by adversaries. As part of a resilient data stream mining system to reduce the possibility of such failure, adaptive spike detection is attribute ranking and selection without class-labels. The first part of adaptive spike detection requires weighing all attributes for spiky-ness to rank them. The second part involves filtering some attributes with extreme weights to choose the best ones for computing each example’s suspicion score. Within an identity crime detection domain, adaptive spike detection is validated on a few million real credit applications with adversarial activity. The results are F-measure curves on eleven experiments and relative weights discussion on the best experiment. The results reinforce adaptive spike detection’s effectiveness for class-label-free attribute ranking and selection.

Utility of real-time decision-making in commercial data stream mining domains

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective is to measure utility of real-time commercial decision making. It is important due to a higher possibility of mistakes in real-time decisions, problems with recording actual occurrences, and significant costs associated with predictions produced by algorithms. The first contribution is to use overall utility and represent individual utility with a monetary value instead of a prediction. The second is to calculate the benefit from predictions using the utility-based decision threshold. The third is to incorporate cost of predictions. For experiments, overall utility is used to evaluate communal and spike detection, and their adaptive versions. The overall utility results show that with fewer alerts, communal detection is better than spike detection. With more alerts, adaptive communal and spike detection are better than their static versions. To maximise overall utility with all algorithms, only 1% to 4% in the highest predictions should be alerts.

Data pre-processing for more effective gene clustering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The high-throughput experimental data from the new gene microarray technology has spurred numerous efforts to find effective ways of processing microarray data for revealing real biological relationships among genes. This work proposes an innovative data pre-processing approach to identify noise data in the data sets and eliminate or reduce the impact of the noise data on gene clustering, With the proposed algorithm, the pre-processed data sets make the clustering results stable across clustering algorithms with different similarity metrics, the important information of genes and features is kept, and the clustering quality is improved. The primary evaluation on real microarray data sets has shown the effectiveness of the proposed algorithm.

Performance analysis of cyber security awareness delivery methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to decrease information security threats caused by human-related vulnerabilities, an increased concentration on information security awareness and training is necessary. There are numerous information security awareness training delivery methods. The purpose of this study was to determine what delivery method is most successful in providing security awareness training. We conducted security awareness training using various delivery methods such as text based, game based and a short video presentation with the aim of determining user preference delivery methods. Our study suggests that a combined delvery methods are better than individual secrity awareness delivery method.

User preference of cyber security awareness delivery methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Operating systems and programmes are more protected these days and attackers have shifted their attention to human elements to break into the organisation's information systems. As the number and frequency of cyber-attacks designed to take advantage of unsuspecting personnel are increasing, the significance of the human factor in information security management cannot be understated. In order to counter cyber-attacks designed to exploit human factors in information security chain, information security awareness with an objective to reduce information security risks that occur due to human related vulnerabilities is paramount. This paper discusses and evaluates the effects of various information security awareness delivery methods used in improving end-users’ information security awareness and behaviour. There are a wide range of information security awareness delivery methods such as web-based training materials, contextual training and embedded training. In spite of efforts to increase information security awareness, research is scant regarding effective information security awareness delivery methods. To this end, this study focuses on determining the security awareness delivery method that is most successful in providing information security awareness and which delivery method is preferred by users. We conducted information security awareness using text-based, game-based and video-based delivery methods with the aim of determining user preferences. Our study suggests that a combined delivery methods are better than individual security awareness delivery method.

Cyber security and protection of ICS systems: An australian example

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many aspects of our modern society now have either a direct or implicit dependence upon information technology. As such, a compromise of the availability or integrity in relation to these systems (which may encompass such diverse domains as banking, government, health care, and law enforcement) could have dramatic consequences from a societal perspective. These key systems are often referred to as critical infrastructure. Critical infrastructure can consist of corporate information systems or systems that control key industrial processes; these specific systems are referred to as ICS (Industry Control Systems) systems. ICS systems have devolved since the 1960s from standalone systems to networked architectures that communicate across large distances, utilise wireless network and can be controlled via the Internet. ICS systems form part of many countries’ key critical infrastructure, including Australia. They are used to remotely monitor and control the delivery of essential services and products, such as electricity, gas, water, waste treatment and transport systems. The need for security measures within these systems was not anticipated in the early development stages as they were designed to be closed systems and not open systems to be accessible via the Internet. We are also seeing these ICS and their supporting systems being integrated into organisational corporate systems.

Distributed multi-agent scheme to enhance cyber security of smart power grids

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a distributed multi-agent scheme for enhancing the cyber security of smart grids which integrates computational resources, physical processes, and communication capabilities. Smart grid infrastructures are vulnerable to various cyber attacks and noises whose influences are significant for reliable and secure operations. A distributed agent-based framework is developed to investigate the interactions between physical processes and cyber activities where the attacks are considered as additive sensor fault signals and noises as randomly generated disturbance signals. A model of innovative physical process-oriented counter-measure and abnormal angle-state observer is designed for detection and mitigation against integrity attacks. Furthermore, this model helps to identify if the observation errors are caused either by attacks or noises.

Distributed multi-agent scheme to enhance cyber security of smart power grids

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a distributed multi-agent scheme for enhancing the cyber security of smart grids which integrates computational resources, physical processes, and communication capabilities. Smart grid infrastructures are vulnerable to various cyber attacks and noises whose influences are significant for reliable and secure operations. A distributed agent-based framework is developed to investigate the interactions between physical processes and cyber activities where the attacks are considered as additive sensor fault signals and noises as randomly generated disturbance signals. A model of innovative physical process-oriented counter-measure and abnormal angle-state observer is designed for detection and mitigation against integrity attacks. Furthermore, this model helps to identify if the observation errors are caused either by attacks or noises.

An architecture of internet based data processing based on multicast and anycast protocols

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most of the current web-based application systems suffer from poor performance and costly heterogeneous accessing. Distributed or replicated strategies can alleviate the problem in some degree, but there are still some problems of the distributed or replicated model, such as data synchronization, load balance, and so on. In this paper, we propose a novel architecture for Internet-based data processing system based on multicast and anycast protocols. The proposed architecture breaks the functionalities of existing data processing system, in particular, the database functionality, into several agents. These agents communicate with each other using multicast and anycast mechanisms. We show that the proposed architecture provides better scalability, robustness, automatic load balance, and performance than the current distributed architecture of Internet-based data
processing.

Online mining of frequent sets in data streams with error guarantee

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For most data stream applications, the volume of data is too huge to be stored in permanent devices or to be thoroughly scanned more than once. It is hence recognized that approximate answers are usually sufficient, where a good approximation obtained in a timely manner is often better than the exact answer that is delayed beyond the window of opportunity. Unfortunately, this is not the case for mining frequent patterns over data streams where algorithms capable of online processing data streams do not conform strictly to a precise error guarantee. Since the quality of approximate answers is as important as their timely delivery, it is necessary to design algorithms to meet both criteria at the same time. In this paper, we propose an algorithm that allows online processing of streaming data and yet guaranteeing the support error of frequent patterns strictly within a user-specified threshold. Our theoretical and experimental studies show that our algorithm is an effective and reliable method for finding frequent sets in data stream environments when both constraints need to be satisfied.

Incremental and adaptive clustering stream data over sliding window

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cluster analysis has played a key role in data stream understanding. The problem is difficult when the clustering task is considered in a sliding window model in which the requirement of outdated data elimination must be dealt with properly. We propose SWEM algorithm that is designed based on the Expectation Maximization technique to address these challenges. Equipped in SWEM is the capability to compute clusters incrementally using a small number of statistics summarized over the stream and the capability to adapt to the stream distribution’s changes. The feasibility of SWEM has been verified via a number of experiments and we show that it is superior than Clustream algorithm, for both synthetic and real datasets.

The adaptable buffer algorithm for high quantile estimation in non-stationary data streams

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The need to estimate a particular quantile of a distribution is an important problem which frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many semi-automatic surveillance analytics systems which detect abnormalities in close-circuit television (CCTV) footage using statistical models of low-level motion features. In this paper we specifically address the problem of estimating the running quantile of a data stream with non-stationary stochasticity when the memory for storing observations is limited. We make several major contributions: (i) we derive an important theoretical result which shows that the change in the quantile of a stream is constrained regardless of the stochastic properties of data, (ii) we describe a set of high-level design goals for an effective estimation algorithm that emerge as a consequence of our theoretical findings, (iii) we introduce a novel algorithm which implements the aforementioned design goals by retaining a sample of data values in a manner adaptive to changes in the distribution of data and progressively narrowing down its focus in the periods of quasi-stationary stochasticity, and (iv) we present a comprehensive evaluation of the proposed algorithm and compare it with the existing methods in the literature on both synthetic data sets and three large 'real-world' streams acquired in the course of operation of an existing commercial surveillance system. Our findings convincingly demonstrate that the proposed method is highly successful and vastly outperforms the existing alternatives, especially when the target quantile is high valued and the available buffer capacity severely limited.

«
1
2
3
4
5
»