75 resultados para bigdata, data stream processing, dsp, apache storm, cyber security

em Indian Institute of Science - Bangalore - Índia


100.00% 100.00%



Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization and low power dissipation, we propose REDEFINE, a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re-composition. It is a ``future-proof'' custom hardware solution for multiple applications and their derivatives in a domain. In this article, we describe a compiler framework and supporting hardware comprising compute, storage, and communication resources. Applications described in high-level language (e.g., C) are compiled into application substructures. For each application substructure, a set of compute elements on the hardware are interconnected during runtime to form a pattern that closely matches the communication pattern of that particular application. The advantage is that the bounded CEs are neither processor cores nor logic elements as in FPGAs. Hence, REDEFINE offers the power and performance advantage of an ASIC and the hardware reconfigurability and programmability of that of an FPGA/instruction set processor. In addition, the hardware supports custom instruction pipelining. Existing instruction-set extensible processors determine a sequence of instructions that repeatedly occur within the application to create custom instructions at design time to speed up the execution of this sequence. We extend this scheme further, where a kernel is compiled into custom instructions that bear strong producer-consumer relationship (and not limited to frequently occurring sequences of instructions). Custom instructions, realized as hardware compositions effected at runtime, allow several instances of the same to be active in parallel. A key distinguishing factor in majority of the emerging embedded applications is stream processing. To reduce the overheads of data transfer between custom instructions, direct communication paths are employed among custom instructions. In this article, we present the overview of the hardware-aware compiler framework, which determines the NoC-aware schedule of transports of the data exchanged between the custom instructions on the interconnect. The results for the FFT kernel indicate a 25% reduction in the number of loads/stores, and throughput improves by log(n) for n-point FFT when compared to sequential implementation. Overall, REDEFINE offers flexibility and a runtime reconfigurability at the expense of 1.16x in power and 8x in area when compared to an ASIC. REDEFINE implementation consumes 0.1x the power of an FPGA implementation. In addition, the configuration overhead of the FPGA implementation is 1,000x more than that of REDEFINE.


100.00% 100.00%



Scalable stream processing and continuous dataflow systems are gaining traction with the rise of big data due to the need for processing high velocity data in near real time. Unlike batch processing systems such as MapReduce and workflows, static scheduling strategies fall short for continuous dataflows due to the variations in the input data rates and the need for sustained throughput. The elastic resource provisioning of cloud infrastructure is valuable to meet the changing resource needs of such continuous applications. However, multi-tenant cloud resources introduce yet another dimension of performance variability that impacts the application's throughput. In this paper we propose PLAStiCC, an adaptive scheduling algorithm that balances resource cost and application throughput using a prediction-based lookahead approach. It not only addresses variations in the input data rates but also the underlying cloud infrastructure. In addition, we also propose several simpler static scheduling heuristics that operate in the absence of accurate performance prediction model. These static and adaptive heuristics are evaluated through extensive simulations using performance traces obtained from Amazon AWS IaaS public cloud. Our results show an improvement of up to 20% in the overall profit as compared to the reactive adaptation algorithm.


100.00% 100.00%



For a multilayered specimen, the back-scattered signal in frequency-domain optical-coherence tomography (FDOCT) is expressible as a sum of cosines, each corresponding to a change of refractive index in the specimen. Each of the cosines represent a peak in the reconstructed tomogram. We consider a truncated cosine series representation of the signal, with the constraint that the coefficients in the basis expansion be sparse. An l(2) (sum of squared errors) data error is considered with an l(1) (summation of absolute values) constraint on the coefficients. The optimization problem is solved using Weiszfeld's iteratively reweighted least squares (IRLS) algorithm. On real FDOCT data, improved results are obtained over the standard reconstruction technique with lower levels of background measurement noise and artifacts due to a strong l(1) penalty. The previous sparse tomogram reconstruction techniques in the literature proposed collecting sparse samples, necessitating a change in the data capturing process conventionally used in FDOCT. The IRLS-based method proposed in this paper does not suffer from this drawback.


100.00% 100.00%



In big data image/video analytics, we encounter the problem of learning an over-complete dictionary for sparse representation from a large training dataset, which cannot be processed at once because of storage and computational constraints. To tackle the problem of dictionary learning in such scenarios, we propose an algorithm that exploits the inherent clustered structure of the training data and make use of a divide-and-conquer approach. The fundamental idea behind the algorithm is to partition the training dataset into smaller clusters, and learn local dictionaries for each cluster. Subsequently, the local dictionaries are merged to form a global dictionary. Merging is done by solving another dictionary learning problem on the atoms of the locally trained dictionaries. This algorithm is referred to as the split-and-merge algorithm. We show that the proposed algorithm is efficient in its usage of memory and computational complexity, and performs on par with the standard learning strategy, which operates on the entire data at a time. As an application, we consider the problem of image denoising. We present a comparative analysis of our algorithm with the standard learning techniques that use the entire database at a time, in terms of training and denoising performance. We observe that the split-and-merge algorithm results in a remarkable reduction of training time, without significantly affecting the denoising performance.


100.00% 100.00%



Local polynomial approximation of data is an approach towards signal denoising. Savitzky-Golay (SG) filters are finite-impulse-response kernels, which convolve with the data to result in polynomial approximation for a chosen set of filter parameters. In the case of noise following Gaussian statistics, minimization of mean-squared error (MSE) between noisy signal and its polynomial approximation is optimum in the maximum-likelihood (ML) sense but the MSE criterion is not optimal for non-Gaussian noise conditions. In this paper, we robustify the SG filter for applications involving noise following a heavy-tailed distribution. The optimal filtering criterion is achieved by l(1) norm minimization of error through iteratively reweighted least-squares (IRLS) technique. It is interesting to note that at any stage of the iteration, we solve a weighted SG filter by minimizing l(2) norm but the process converges to l(1) minimized output. The results show consistent improvement over the standard SG filter performance.


100.00% 100.00%



The Orthogonal Frequency Division Multiplexing (OFDM) is a form of Multi-Carrier Modulation where the data stream is transmitted over a number of carriers which are orthogonal to each other i.e. the carrier spacing is selected such that each carrier is located at the zeroes of all other carriers in the spectral domain. This paper proposes a new novel iterative frequency offset estimation algorithm for an OFDM system in order to receive the OFDM data symbols error-free over the noisy channel at the receiver and to achieve frequency synchronization between the transmitter and the receiver. The performance of this algorithm has been studied in AWGN, ADSL and SUI channels successfully.


100.00% 100.00%



The Orthogonal Frequency Division Multiplexing (OFDM) is a form of Multi-Carrier Modulation where the data stream is transmitted over a number of carriers which are orthogonal to each other i.e. the carrier spacing is selected such that each carrier is located at the zeroes of all other carriers in the spectral domain. This paper proposes a new novel sampling offset estimation algorithm for an OFDM system in order to receive the OFDM data symbols error-free over the noisy channel at the receiver and to achieve fine timing synchronization between the transmitter and the receiver. The performance of this algorithm has been studied in AWGN, ADSL and SUI channels successfully.


100.00% 100.00%



The Orthogonal Frequency Division Multiplexing (OFDM) is a form of Multi-Carrier Modulation where the data stream is transmitted over a number of carriers which are orthogonal to each other i.e. the carrier spacing is selected such that each carrier is located at the zeroes of all other carriers in the spectral domain. This paper proposes a new novel sampling offset estimation algorithm for an OFDM system in order to receive the OFDM data symbols error-free over the noisy channel at the receiver and to achieve fine timing synchronization between the transmitter and the receiver. The performance of this algorithm has been studied in AWGN, ADSL and SUI channels successfully.


100.00% 100.00%



We address the problem of separating a speech signal into its excitation and vocal-tract filter components, which falls within the framework of blind deconvolution. Typically, the excitation in case of voiced speech is assumed to be sparse and the vocal-tract filter stable. We develop an alternating l(p) - l(2) projections algorithm (ALPA) to perform deconvolution taking into account these constraints. The algorithm is iterative, and alternates between two solution spaces. The initialization is based on the standard linear prediction decomposition of a speech signal into an autoregressive filter and prediction residue. In every iteration, a sparse excitation is estimated by optimizing an l(p)-norm-based cost and the vocal-tract filter is derived as a solution to a standard least-squares minimization problem. We validate the algorithm on voiced segments of natural speech signals and show applications to epoch estimation. We also present comparisons with state-of-the-art techniques and show that ALPA gives a sparser impulse-like excitation, where the impulses directly denote the epochs or instants of significant excitation.


100.00% 100.00%



The effect of multiplicative noise on a signal when compared with that of additive noise is very large. In this paper, we address the problem of suppressing multiplicative noise in one-dimensional signals. To deal with signals that are corrupted with multiplicative noise, we propose a denoising algorithm based on minimization of an unbiased estimator (MURE) of meansquare error (MSE). We derive an expression for an unbiased estimate of the MSE. The proposed denoising is carried out in wavelet domain (soft thresholding) by considering time-domain MURE. The parameters of thresholding function are obtained by minimizing the unbiased estimator MURE. We show that the parameters for optimal MURE are very close to the optimal parameters considering the oracle MSE. Experiments show that the SNR improvement for the proposed denoising algorithm is competitive with a state-of-the-art method.


50.00% 50.00%



The design of a dual-DSP microprocessor system and its application for parallel FFT and two-dimensional convolution are explained. The system is based on a master-salve configuration. Two ADSP-2101s are configured as slave processors and a PC/AT serves as the master. The master serves as a control processor to transfer the program code and data to the DSPs. The system architecture and the algorithms for the two applications, viz. FFT and two-dimensional convolutions, are discussed.


40.00% 40.00%



The StreamIt programming model has been proposed to exploit parallelism in streaming applications oil general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of if StreamIt program oil a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task oil the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as all integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. We also propose an efficient heuristic algorithm for the work-partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark Suite. The partitioned tasks are then software pipelined to execute oil the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.94X with it maximum of 51.96X over it single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over it partitioning strategy that maps only the filters that cannot be executed oil the GPU - the filters with state that is persistent across firings - onto the CPU.


40.00% 40.00%



The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.


40.00% 40.00%



With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window by multiple transactions at a time. Therefore, we propose a novel approach for mining CFIpsilas cumulatively by making sliding width(ges1) over high speed data streams. However, it is nontrivial to mine CFIpsilas cumulatively over stream, because such growth may lead to the generation of exponential number of candidates for closure checking. In this study, we develop an efficient algorithm, stream-close, for mining CFIpsilas over stream by exploring some interesting properties. Our performance study reveals that stream-close achieves good scalability and has promising results.