904 resultados para bigdata, data stream processing, dsp, apache storm, cyber security


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Video surveillance systems using Closed Circuit Television (CCTV) cameras, is one of the fastest growing areas in the field of security technologies. However, the existing video surveillance systems are still not at a stage where they can be used for crime prevention. The systems rely heavily on human observers and are therefore limited by factors such as fatigue and monitoring capabilities over long periods of time. This work attempts to address these problems by proposing an automatic suspicious behaviour detection which utilises contextual information. The utilisation of contextual information is done via three main components: a context space model, a data stream clustering algorithm, and an inference algorithm. The utilisation of contextual information is still limited in the domain of suspicious behaviour detection. Furthermore, it is nearly impossible to correctly understand human behaviour without considering the context where it is observed. This work presents experiments using video feeds taken from CAVIAR dataset and a camera mounted on one of the buildings Z-Block) at the Queensland University of Technology, Australia. From these experiments, it is shown that by exploiting contextual information, the proposed system is able to make more accurate detections, especially of those behaviours which are only suspicious in some contexts while being normal in the others. Moreover, this information gives critical feedback to the system designers to refine the system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Road networks are a national critical infrastructure. The road assets need to be monitored and maintained efficiently as their conditions deteriorate over time. The condition of one of such assets, road pavement, plays a major role in the road network maintenance programmes. Pavement conditions depend upon many factors such as pavement types, traffic and environmental conditions. This paper presents a data analytics case study for assessing the factors affecting the pavement deflection values measured by the traffic speed deflectometer (TSD) device. The analytics process includes acquisition and integration of data from multiple sources, data pre-processing, mining useful information from them and utilising data mining outputs for knowledge deployment. Data mining techniques are able to show how TSD outputs vary in different roads, traffic and environmental conditions. The generated data mining models map the TSD outputs to some classes and define correction factors for each class.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a single pass algorithm for mining discriminative Itemsets in data streams using a novel data structure and the tilted-time window model. Discriminative Itemsets are defined as Itemsets that are frequent in one data stream and their frequency in that stream is much higher than the rest of the streams in the dataset. In order to deal with the data structure size, we propose a pruning process that results in the compact tree structure containing discriminative Itemsets. Empirical analysis shows the sound time and space complexity of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stochastic modelling is critical in GNSS data processing. Currently, GNSS data processing commonly relies on the empirical stochastic model which may not reflect the actual data quality or noise characteristics. This paper examines the real-time GNSS observation noise estimation methods enabling to determine the observation variance from single receiver data stream. The methods involve three steps: forming linear combination, handling the ionosphere and ambiguity bias and variance estimation. Two distinguished ways are applied to overcome the ionosphere and ambiguity biases, known as the time differenced method and polynomial prediction method respectively. The real time variance estimation methods are compared with the zero-baseline and short-baseline methods. The proposed method only requires single receiver observation, thus applicable to both differenced and un-differenced data processing modes. However, the methods may be subject to the normal ionosphere conditions and low autocorrelation GNSS receivers. Experimental results also indicate the proposed method can result on more realistic parameter precision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases – scientific data compression and remote visualization – have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization and low power dissipation, we propose REDEFINE, a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re-composition. It is a ``future-proof'' custom hardware solution for multiple applications and their derivatives in a domain. In this article, we describe a compiler framework and supporting hardware comprising compute, storage, and communication resources. Applications described in high-level language (e.g., C) are compiled into application substructures. For each application substructure, a set of compute elements on the hardware are interconnected during runtime to form a pattern that closely matches the communication pattern of that particular application. The advantage is that the bounded CEs are neither processor cores nor logic elements as in FPGAs. Hence, REDEFINE offers the power and performance advantage of an ASIC and the hardware reconfigurability and programmability of that of an FPGA/instruction set processor. In addition, the hardware supports custom instruction pipelining. Existing instruction-set extensible processors determine a sequence of instructions that repeatedly occur within the application to create custom instructions at design time to speed up the execution of this sequence. We extend this scheme further, where a kernel is compiled into custom instructions that bear strong producer-consumer relationship (and not limited to frequently occurring sequences of instructions). Custom instructions, realized as hardware compositions effected at runtime, allow several instances of the same to be active in parallel. A key distinguishing factor in majority of the emerging embedded applications is stream processing. To reduce the overheads of data transfer between custom instructions, direct communication paths are employed among custom instructions. In this article, we present the overview of the hardware-aware compiler framework, which determines the NoC-aware schedule of transports of the data exchanged between the custom instructions on the interconnect. The results for the FFT kernel indicate a 25% reduction in the number of loads/stores, and throughput improves by log(n) for n-point FFT when compared to sequential implementation. Overall, REDEFINE offers flexibility and a runtime reconfigurability at the expense of 1.16x in power and 8x in area when compared to an ASIC. REDEFINE implementation consumes 0.1x the power of an FPGA implementation. In addition, the configuration overhead of the FPGA implementation is 1,000x more than that of REDEFINE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scalable stream processing and continuous dataflow systems are gaining traction with the rise of big data due to the need for processing high velocity data in near real time. Unlike batch processing systems such as MapReduce and workflows, static scheduling strategies fall short for continuous dataflows due to the variations in the input data rates and the need for sustained throughput. The elastic resource provisioning of cloud infrastructure is valuable to meet the changing resource needs of such continuous applications. However, multi-tenant cloud resources introduce yet another dimension of performance variability that impacts the application's throughput. In this paper we propose PLAStiCC, an adaptive scheduling algorithm that balances resource cost and application throughput using a prediction-based lookahead approach. It not only addresses variations in the input data rates but also the underlying cloud infrastructure. In addition, we also propose several simpler static scheduling heuristics that operate in the absence of accurate performance prediction model. These static and adaptive heuristics are evaluated through extensive simulations using performance traces obtained from Amazon AWS IaaS public cloud. Our results show an improvement of up to 20% in the overall profit as compared to the reactive adaptation algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For a multilayered specimen, the back-scattered signal in frequency-domain optical-coherence tomography (FDOCT) is expressible as a sum of cosines, each corresponding to a change of refractive index in the specimen. Each of the cosines represent a peak in the reconstructed tomogram. We consider a truncated cosine series representation of the signal, with the constraint that the coefficients in the basis expansion be sparse. An l(2) (sum of squared errors) data error is considered with an l(1) (summation of absolute values) constraint on the coefficients. The optimization problem is solved using Weiszfeld's iteratively reweighted least squares (IRLS) algorithm. On real FDOCT data, improved results are obtained over the standard reconstruction technique with lower levels of background measurement noise and artifacts due to a strong l(1) penalty. The previous sparse tomogram reconstruction techniques in the literature proposed collecting sparse samples, necessitating a change in the data capturing process conventionally used in FDOCT. The IRLS-based method proposed in this paper does not suffer from this drawback.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In big data image/video analytics, we encounter the problem of learning an over-complete dictionary for sparse representation from a large training dataset, which cannot be processed at once because of storage and computational constraints. To tackle the problem of dictionary learning in such scenarios, we propose an algorithm that exploits the inherent clustered structure of the training data and make use of a divide-and-conquer approach. The fundamental idea behind the algorithm is to partition the training dataset into smaller clusters, and learn local dictionaries for each cluster. Subsequently, the local dictionaries are merged to form a global dictionary. Merging is done by solving another dictionary learning problem on the atoms of the locally trained dictionaries. This algorithm is referred to as the split-and-merge algorithm. We show that the proposed algorithm is efficient in its usage of memory and computational complexity, and performs on par with the standard learning strategy, which operates on the entire data at a time. As an application, we consider the problem of image denoising. We present a comparative analysis of our algorithm with the standard learning techniques that use the entire database at a time, in terms of training and denoising performance. We observe that the split-and-merge algorithm results in a remarkable reduction of training time, without significantly affecting the denoising performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Local polynomial approximation of data is an approach towards signal denoising. Savitzky-Golay (SG) filters are finite-impulse-response kernels, which convolve with the data to result in polynomial approximation for a chosen set of filter parameters. In the case of noise following Gaussian statistics, minimization of mean-squared error (MSE) between noisy signal and its polynomial approximation is optimum in the maximum-likelihood (ML) sense but the MSE criterion is not optimal for non-Gaussian noise conditions. In this paper, we robustify the SG filter for applications involving noise following a heavy-tailed distribution. The optimal filtering criterion is achieved by l(1) norm minimization of error through iteratively reweighted least-squares (IRLS) technique. It is interesting to note that at any stage of the iteration, we solve a weighted SG filter by minimizing l(2) norm but the process converges to l(1) minimized output. The results show consistent improvement over the standard SG filter performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the design, application, and evaluation of a user friendly, flexible, scalable and inexpensive Advanced Educational Parallel (AdEPar) digital signal processing (DSP) system based on TMS320C25 digital processors to implement DSP algorithms. This system will be used in the DSP laboratory by graduate students to work on advanced topics such as developing parallel DSP algorithms. The graduating senior students who have gained some experience in DSP can also use the system. The DSP laboratory has proved to be a useful tool in the hands of the instructor to teach the mathematically oriented topics of DSP that are often difficult for students to grasp. The DSP laboratory with assigned projects has greatly improved the ability of the students to understand such complex topics as the fast Fourier transform algorithm, linear and circular convolution, the theory and design of infinite impulse response (IIR) and finite impulse response (FIR) filters. The user friendly PC software support of the AdEPar system makes it easy to develop DSP programs for students. This paper gives the architecture of the AdEPar DSP system. The communication between processors and the PC-DSP processor communication are explained. The parallel debugger kernels and the restrictions of the system are described. The programming in the AdEPar is explained, and two benchmarks (parallel FFT and DES) are presented to show the system performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in silicon technology have been a key development in the realisation of many telecommunication and signal processing systems. In many cases, the development of application-specific digital signal processing (DSP) chips is the most cost-effective solution and provides the highest performance. Advances made in computer-aided design (CAD) tools and design methodologies now allow designers to develop complex chips within months or even weeks. This paper gives an insight into the challenges and design methodologies of implementing advanced highperformance chips for DSP. In particular, the paper reviews some of the techniques used to develop circuit architectures from high-level descriptions and the tools which are then used to realise silicon layout.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a methodology for implementing a complete Digital Signal Processing (DSP) system onto a heterogeneous network including Field Programmable Gate Arrays (FPGAs) automatically. The methodology aims to allow design refinement and real time verification at the system level. The DSP application is constructed in the form of a Data Flow Graph (DFG) which provides an entry point to the methodology. The netlist for parts that are mapped onto the FPGA(s) together with the corresponding software and hardware Application Protocol Interface (API) are also generated. Using a set of case studies, we demonstrate that the design and development time can be significantly reduced using the methodology developed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cyber-attacks against Smart Grids have been found in the real world. Malware such as Havex and BlackEnergy have been found targeting industrial control systems (ICS) and researchers have shown that cyber-attacks can exploit vulnerabilities in widely used Smart Grid communication standards. This paper addresses a deep investigation of attacks against the manufacturing message specification of IEC 61850, which is expected to become one of the most widely used communication services in Smart Grids. We investigate how an attacker can build a custom tool to execute man-in-the-middle attacks, manipulate data, and affect the physical system. Attack capabilities are demonstrated based on NESCOR scenarios to make it possible to thoroughly test these scenarios in a real system. The goal is to help understand the potential for such attacks, and to aid the development and testing of cyber security solutions. An attack use-case is presented that focuses on the standard for power utility automation, IEC 61850 in the context of inverter-based distributed energy resource devices; especially photovoltaic (PV) generators.