Biblioteca Digital

20 resultados para Parallel processing (Electronic computers) - Research

em Indian Institute of Science - Bangalore - Índia

A Parallel Progressive Refinement Image Rendering Algorithm on a Scalable Multithreaded VLSI Processor Array

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we develop a multithreaded VLSI processor linear array architecture to render complex environments based on the radiosity approach. The processing elements are identical and multithreaded. They work in Single Program Multiple Data (SPMD) mode. A new algorithm to do the radiosity computations based on the progressive refinement approach[2] is proposed. Simulation results indicate that the architecture is latency tolerant and scalable. It is shown that a linear array of 128 uni-threaded processing elements sustains a throughput close to 0.4 million patches/sec.

Packet Reordering in Network Processors

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Network processors today consist of multiple parallel processors (micro engines) with support for multiple threads to exploit packet level parallelism inherent in network workloads. With such concurrency, packet ordering at the output of the network processor cannot be guaranteed. This paper studies the effect of concurrency in network processors on packet ordering. We use a validated Petri net model of a commercial network processor, Intel IXP 2400, to determine the extent of packet reordering for IPv4 forwarding application. Our study indicates that in addition to the parallel processing in the network processor, the allocation scheme for the transmit buffer also adversely impacts packet ordering. In particular, our results reveal that these packet reordering results in a packet retransmission rate of up to 61%. We explore different transmit buffer allocation schemes namely, contiguous, strided, local, and global which reduces the packet retransmission to 24%. We propose an alternative scheme, packet sort, which guarantees complete packet ordering while achieving a throughput of 2.5 Gbps. Further, packet sort outperforms the in-built packet ordering schemes in the IXP processor by up to 35%.

Performance Modeling based on Multidimensional Surface Learning for Performance Predictions of Parallel Applications in Non-Dedicated Environments

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modeling the performance behavior of parallel applications to predict the execution times of the applications for larger problem sizes and number of processors has been an active area of research for several years. The existing curve fitting strategies for performance modeling utilize data from experiments that are conducted under uniform loading conditions. Hence the accuracy of these models degrade when the load conditions on the machines and network change. In this paper, we analyze a curve fitting model that attempts to predict execution times for any load conditions that may exist on the systems during application execution. Based on the experiments conducted with the model for a parallel eigenvalue problem, we propose a multi-dimensional curve-fitting model based on rational polynomials for performance predictions of parallel applications in non-dedicated environments. We used the rational polynomial based model to predict execution times for 2 other parallel applications on systems with large load dynamics. In all the cases, the model gave good predictions of execution times with average percentage prediction errors of less than 20%

Prediction of Queue Waiting Times for Metascheduling on Parallel Batch Systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Prediction of queue waiting times of jobs submitted to production parallel batch systems is important to provide overall estimates to users and can also help meta-schedulers make scheduling decisions. In this work, we have developed a framework for predicting ranges of queue waiting times for jobs by employing multi-class classification of similar jobs in history. Our hierarchical prediction strategy first predicts the point wait time of a job using dynamic k-Nearest Neighbor (kNN) method. It then performs a multi-class classification using Support Vector Machines (SVMs) among all the classes of the jobs. The probabilities given by the SVM for the class predicted using k-NN and its neighboring classes are used to provide a set of ranges of predicted wait times with probabilities. We have used these predictions and probabilities in a meta-scheduling strategy that distributes jobs to different queues/sites in a multi-queue/grid environment for minimizing wait times of the jobs. Experiments with different production supercomputer job traces show that our prediction strategies can give correct predictions for about 77-87% of the jobs, and also result in about 12% improved accuracy when compared to the next best existing method. Experiments with our meta-scheduling strategy using different production and synthetic job traces for various system sizes, partitioning schemes and different workloads, show that the meta-scheduling strategy gives much improved performance when compared to existing scheduling policies by reducing the overall average queue waiting times of the jobs by about 47%.

Modelling and analysis of the variance in parallelism in parallel computations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we introduce an analytical technique based on queueing networks and Petri nets for making a performance analysis of dataflow computations when executed on the Manchester machine. This technique is also applicable for the analysis of parallel computations on multiprocessors. We characterize the parallelism in dataflow computations through a four-parameter characterization, namely, the minimum parallelism, the maximum parallelism, the average parallelism and the variance in parallelism. We observe through detailed investigation of our analytical models that the average parallelism is a good characterization of the dataflow computations only as long as the variance in parallelism is small. However, significant difference in performance measures will result when the variance in parallelism is comparable to or higher than the average parallelism.

Performability Analysis of Fork-Join Queueing Systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fork-join queueing systems offer a natural modelling paradigm for parallel processing systems and for assembly operations in automated manufacturing. The analysis of fork-join queueing systems has been an important subject of research in recent years. Existing analysis methodologies-both exact and approximate-assume that the servers are failure-free. In this study, we consider fork-join queueing systems in the presence of server failures and compute the cumulative distribution of performability with respect to the response time of such systems. For this, we employ a computational methodology that uses a recent technique based on randomization. We compare the performability of three different fork-join queueing models proposed in the literature: the distributed model, the centralized splitting model, and the split-merge model. The numerical results show that the centralized splitting model offers the highest levels of performability, followed by the distributed splitting and split-merge models.

Editorial optical computing circuits, devices and systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information forms the basis of modern technology. To meet the ever-increasing demand for information, means have to be devised for a more efficient and better-equipped technology to intelligibly process data. Advances in photonics have made their impact on each of the four key applications in information processing, i.e., acquisition, transmission, storage and processing of information. The inherent advantages of ultrahigh bandwidth, high speed and low-loss transmission has already established fiber-optics as the backbone of communication technology. However, the optics to electronics inter-conversion at the transmitter and receiver ends severely limits both the speed and bit rate of lightwave communication systems. As the trend towards still faster and higher capacity systems continues, it has become increasingly necessary to perform more and more signal-processing operations in the optical domain itself, i.e., with all-optical components and devices that possess a high bandwidth and can perform parallel processing functions to eliminate the electronic bottleneck.

Composite Performance and Reliability Analysis for Hypercube Systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose a novel technique to model and anaÂ¿ lyze the performability of parallel and distributed architectures using GSPN-reward models.

Analog-Computer Technique for Impulsive Atmospheric Radio-Noise Analysis

Relevância:

100.00% 100.00%

Publicador:

Artificial neural network application to power system voltage stability improvement

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As power systems grow in their size and interconnections, their complexity increases. Rising costs due to inflation and increased environmental concerns has made transmission, as well as generation systems be operated closer to design limits. Hence power system voltage stability and voltage control are emerging as major problems in the day-to-day operation of stressed power systems. For secure operation and control of power systems under normal and contingency conditions it is essential to provide solutions in real time to the operator in energy control center (ECC). Artificial neural networks (ANN) are emerging as an artificial intelligence tool, which give fast, though approximate, but acceptable solutions in real time as they mostly use the parallel processing technique for computation. The solutions thus obtained can be used as a guide by the operator in ECC for power system control. This paper deals with development of an ANN architecture, which provide solutions for monitoring, and control of voltage stability in the day-to-day operation of power systems.

A Diffusion-Based Processor Reallocation Strategy for Tracking Multiple Dynamically Varying Weather Phenomena

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many meteorological phenomena occur at different locations simultaneously. These phenomena vary temporally and spatially. It is essential to track these multiple phenomena for accurate weather prediction. Efficient analysis require high-resolution simulations which can be conducted by introducing finer resolution nested simulations, nests at the locations of these phenomena. Simultaneous tracking of these multiple weather phenomena requires simultaneous execution of the nests on different subsets of the maximum number of processors for the main weather simulation. Dynamic variation in the number of these nests require efficient processor reallocation strategies. In this paper, we have developed strategies for efficient partitioning and repartitioning of the nests among the processors. As a case study, we consider an application of tracking multiple organized cloud clusters in tropical weather systems. We first present a parallel data analysis algorithm to detect such clouds. We have developed a tree-based hierarchical diffusion method which reallocates processors for the nests such that the redistribution cost is less. We achieve this by a novel tree reorganization approach. We show that our approach exhibits up to 25% lower redistribution cost and 53% lesser hop-bytes than the processor reallocation strategy that does not consider the existing processor allocation.

Real-time maximum a-posteriori image reconstruction for fluorescence microscopy

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rapid reconstruction of multidimensional image is crucial for enabling real-time 3D fluorescence imaging. This becomes a key factor for imaging rapidly occurring events in the cellular environment. To facilitate real-time imaging, we have developed a graphics processing unit (GPU) based real-time maximum a-posteriori (MAP) image reconstruction system. The parallel processing capability of GPU device that consists of a large number of tiny processing cores and the adaptability of image reconstruction algorithm to parallel processing (that employ multiple independent computing modules called threads) results in high temporal resolution. Moreover, the proposed quadratic potential based MAP algorithm effectively deconvolves the images as well as suppresses the noise. The multi-node multi-threaded GPU and the Compute Unified Device Architecture (CUDA) efficiently execute the iterative image reconstruction algorithm that is similar to 200-fold faster (for large dataset) when compared to existing CPU based systems. (C) 2015 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 Unported License.

A Generalized Approach to Multiresolution Complex SAR Signal Processing

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Multiresolution synthetic aperture radar (SAR) image formation has been proven to be beneficial in a variety of applications such as improved imaging and target detection as well as speckle reduction. SAR signal processing traditionally carried out in the Fourier domain has inherent limitations in the context of image formation at hierarchical scales. We present a generalized approach to the formation of multiresolution SAR images using biorthogonal shift-invariant discrete wavelet transform (SIDWT) in both range and azimuth directions. Particularly in azimuth, the inherent subband decomposition property of wavelet packet transform is introduced to produce multiscale complex matched filtering without involving any approximations. This generalized approach also includes the formulation of multilook processing within the discrete wavelet transform (DWT) paradigm. The efficiency of the algorithm in parallel form of execution to generate hierarchical scale SAR images is shown. Analytical results and sample imagery of diffuse backscatter are presented to validate the method.

Minimizing total completion time on batch processing machines

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We study the problem of minimizing total completion time on single and parallel batch processing machines. A batch processing machine is one which can process up to B jobs simultaneously. The processing time of a batch is equal to the largest processing time among all jobs in the batch. This problem is motivated by burn-in operations in the final testing stage of semiconductor manufacturing and is expected to occur in other production environments. We provide an exact solution procedure for the single-machine problem and heuristic algorithms for both single and parallel machine problems. While the exact algorithms have limited applicability due to high computational requirements, extensive experiments show that the heuristics are capable of consistently obtaining near-optimal solutions in very reasonable CPU times.

Electronic databases, networks and information support for scientific research

Relevância:

40.00% 40.00%

Publicador:

«
1
2
»