Biblioteca Digital

686 resultados para Processors

A generalized Linear Programming based approach to optimal divisible load scheduling

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a general Linear Programming (LP) based formulation and solution methodology for obtaining optimal solution to the load distribution problem in divisible load scheduling. We exploit the power of the versatile LP formulation to propose algorithms that yield exact solutions to several very general load distribution problems for which either no solutions or only heuristic solutions were available. We consider both star (single-level tree) networks and linear daisy chain networks, having processors equipped with front-ends, that form the generic models for several important network topologies. We consider arbitrary processing node availability or release times and general models for communication delays and computation time that account for constant overheads such as start up times in communication and computation. The optimality of the LP based algorithms is proved rigorously.

A suprathreshold stochastic resonance based pre-processor for direction-of-arrival estimation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a nonlinear preprocessor for enhancing the performance of processors used for direction-of-arrival (DOA) estimation in heavy-tailed non-Gaussian noise. The preprocessor based on the phenomenon of suprathreshold stochastic resonance (SSR), provides SNR gain. The preprocessed data is used for DOA estimation by the MUSIC algorithm. Simulation results are presented to show that the SSR preprocessor provides a significant improvement in the performance of MUSIC in heavy-tailed noise at low SNR.

Utilization bounds for RM scheduling on uniform multiprocessors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Utilization bounds for Earliest Deadline First(EDF) and Rate Monotonic(RM) scheduling are known and well understood for uniprocessor systems. In this paper, we derive limits on similar bounds for the multiprocessor case, when the individual processors need not be identical. Tasks are partitioned among the processors and RM scheduling is assumed to be the policy used in individual processors. A minimum limit on the bounds for a 'greedy' class of algorithms is given and proved, since the actual value of the bound depends on the algorithm that allocates the tasks. We also derive the utilization bound of an algorithm which allocates tasks in decreasing order of utilization factors. Knowledge of such bounds allows us to carry out very fast schedulability tests although we are constrained by the fact that the tests are sufficient but not necessary to ensure schedulability.

Parallelization of an unstructured data based cell centre finite volume code, HIFUN-3D

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work describes the parallelization of High Resolution flow solver on unstructured meshes, HIFUN-3D, an unstructured data based finite volume solver for 3-D Euler equations. For mesh partitioning, we use METIS, a software based on multilevel graph partitioning. The unstructured graph used for partitioning is associated with weights both on its vertices and edges. The data residing on every processor is split into four layers. Such a novel procedure of handling data helps in maintaining the effectiveness of the serial code. The communication of data across the processors is achieved by explicit message passing using the standard blocking mode feature of Message Passing Interface (MPI). The parallel code is tested on PACE++128 available in CFD Center

Adaptive load control of the central processor in a distributed system with a star topology

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The author presents adaptive control techniques for controlling the flow of real-time jobs from the peripheral processors (PPs) to the central processor (CP) of a distributed system with a star topology. He considers two classes of flow control mechanisms: (1) proportional control, where a certain proportion of the load offered to each PP is sent to the CP, and (2) threshold control, where there is a maximum rate at which each PP can send jobs to the CP. The problem is to obtain good algorithms for dynamically adjusting the control level at each PP in order to prevent overload of the CP, when the load offered by the PPs is unknown and varying. The author formulates the problem approximately as a standard system control problem in which the system has unknown parameters that are subject to change. Using well-known techniques (e.g., naive-feedback-controller and stochastic approximation techniques), he derives adaptive controls for the system control problem. He demonstrates the efficacy of these controls in the original problem by using the control algorithms in simulations of a queuing model of the CP and the load controls.

Parallel circuit partitioning on a reduced array architecture

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The physical design of a VLSI circuit involves circuit partitioning as a subtask. Typically, it is necessary to partition a large electrical circuit into several smaller circuits such that the total cross-wiring is minimized. This problem is a variant of the more general graph partitioning problem, and it is known that there does not exist a polynomial time algorithm to obtain an optimal partition. The heuristic procedure proposed by Kernighan and Lin1,2 requires O(n2 log2n) time to obtain a near-optimal two-way partition of a circuit with n modules. In the VLSI context, due to the large problem size involved, this computational requirement is unacceptably high. This paper is concerned with the hardware acceleration of the Kernighan-Lin procedure on an SIMD architecture. The proposed parallel partitioning algorithm requires O(n) processors, and has a time complexity of O(n log2n). In the proposed scheme, the reduced array architecture is employed with due considerations towards cost effectiveness and VLSI realizability of the architecture.The authors are not aware of any earlier attempts to parallelize a circuit partitioning algorithm in general or the Kernighan-Lin algorithm in particular. The use of the reduced array architecture is novel and opens up the possibilities of using this computing structure for several other applications in electronic design automation.

Distributed nonlinear optimization algorithms for general network problems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There are a number of large networks which occur in many problems dealing with the flow of power, communication signals, water, gas, transportable goods, etc. Both design and planning of these networks involve optimization problems. The first part of this paper introduces the common characteristics of a nonlinear network (the network may be linear, the objective function may be non linear, or both may be nonlinear). The second part develops a mathematical model trying to put together some important constraints based on the abstraction for a general network. The third part deals with solution procedures; it converts the network to a matrix based system of equations, gives the characteristics of the matrix and suggests two solution procedures, one of them being a new one. The fourth part handles spatially distributed networks and evolves a number of decomposition techniques so that we can solve the problem with the help of a distributed computer system. Algorithms for parallel processors and spatially distributed systems have been described.There are a number of common features that pertain to networks. A network consists of a set of nodes and arcs. In addition at every node, there is a possibility of an input (like power, water, message, goods etc) or an output or none. Normally, the network equations describe the flows amoungst nodes through the arcs. These network equations couple variables associated with nodes. Invariably, variables pertaining to arcs are constants; the result required will be flows through the arcs. To solve the normal base problem, we are given input flows at nodes, output flows at nodes and certain physical constraints on other variables at nodes and we should find out the flows through the network (variables at nodes will be referred to as across variables).The optimization problem involves in selecting inputs at nodes so as to optimise an objective function; the objective may be a cost function based on the inputs to be minimised or a loss function or an efficiency function. The above mathematical model can be solved using Lagrange Multiplier technique since the equalities are strong compared to inequalities. The Lagrange multiplier technique divides the solution procedure into two stages per iteration. Stage one calculates the problem variables % and stage two the multipliers lambda. It is shown that the Jacobian matrix used in stage one (for solving a nonlinear system of necessary conditions) occurs in the stage two also.A second solution procedure has also been imbedded into the first one. This is called total residue approach. It changes the equality constraints so that we can get faster convergence of the iterations.Both solution procedures are found to coverge in 3 to 7 iterations for a sample network.The availability of distributed computer systems — both LAN and WAN — suggest the need for algorithms to solve the optimization problems. Two types of algorithms have been proposed — one based on the physics of the network and the other on the property of the Jacobian matrix. Three algorithms have been deviced, one of them for the local area case. These algorithms are called as regional distributed algorithm, hierarchical regional distributed algorithm (both using the physics properties of the network), and locally distributed algorithm (a multiprocessor based approach with a local area network configuration). The approach used was to define an algorithm that is faster and uses minimum communications. These algorithms are found to converge at the same rate as the non distributed (unitary) case.

A parallel multistep predictor-corrector algorithm for solving ordinary differential equations

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we give a generalized predictor-corrector algorithm for solving ordinary differential equations with specified initial values. The method uses multiple correction steps which can be carried out in parallel with a prediction step. The proposed method gives a larger stability interval compared to the existing parallel predictor-corrector methods. A method has been suggested to implement the algorithm in multiple processor systems with efficient utilization of all the processors.

Exact bounds for distributed graph colouring

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A distributed system is a collection of networked autonomous processing units which must work in a cooperative manner. Currently, large-scale distributed systems, such as various telecommunication and computer networks, are abundant and used in a multitude of tasks. The field of distributed computing studies what can be computed efficiently in such systems. Distributed systems are usually modelled as graphs where nodes represent the processors and edges denote communication links between processors. This thesis concentrates on the computational complexity of the distributed graph colouring problem. The objective of the graph colouring problem is to assign a colour to each node in such a way that no two nodes connected by an edge share the same colour. In particular, it is often desirable to use only a small number of colours. This task is a fundamental symmetry-breaking primitive in various distributed algorithms. A graph that has been coloured in this manner using at most k different colours is said to be k-coloured. This work examines the synchronous message-passing model of distributed computation: every node runs the same algorithm, and the system operates in discrete synchronous communication rounds. During each round, a node can communicate with its neighbours and perform local computation. In this model, the time complexity of a problem is the number of synchronous communication rounds required to solve the problem. It is known that 3-colouring any k-coloured directed cycle requires at least ½(log* k - 3) communication rounds and is possible in ½(log* k + 7) communication rounds for all k ≥ 3. This work shows that for any k ≥ 3, colouring a k-coloured directed cycle with at most three colours is possible in ½(log* k + 3) rounds. In contrast, it is also shown that for some values of k, colouring a directed cycle with at most three colours requires at least ½(log* k + 1) communication rounds. Furthermore, in the case of directed rooted trees, reducing a k-colouring into a 3-colouring requires at least log* k + 1 rounds for some k and possible in log* k + 3 rounds for all k ≥ 3. The new positive and negative results are derived using computational methods, as the existence of distributed colouring algorithms corresponds to the colourability of so-called neighbourhood graphs. The colourability of these graphs is analysed using Boolean satisfiability (SAT) solvers. Finally, this thesis shows that similar methods are applicable in capturing the existence of distributed algorithms for other graph problems, such as the maximal matching problem.

A FIFO-based multicast network and its use in multicomputers

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present an implementation of a multicast network of processors. The processors are connected in a fully connected network and it is possible to broadcast data in a single instruction. The network works at the processor-memory speed and therefore provides a fast communication link among processors. A number of interesting architectures are possible using such a network. We show some of these architectures which have been implemented and are functional. We also show the system software calls which allow programming of these machines in parallel mode.

A study of symbolic processing and computational aspects in helicopter dynamics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Even research models of helicopter dynamics often lead to a large number of equations of motion with periodic coefficients; and Floquet theory is a widely used mathematical tool for dynamic analysis. Presently, three approaches are used in generating the equations of motion. These are (1) general-purpose symbolic processors such as REDUCE and MACSYMA, (2) a special-purpose symbolic processor, DEHIM (Dynamic Equations for Helicopter Interpretive Models), and (3) completely numerical approaches. In this paper, comparative aspects of the first two purely algebraic approaches are studied by applying REDUCE and DEHIM to the same set of problems. These problems range from a linear model with one degree of freedom to a mildly non-linear multi-bladed rotor model with several degrees of freedom. Further, computational issues in applying Floquet theory are also studied, which refer to (1) the equilibrium solution for periodic forced response together with the transition matrix for perturbations about that response and (2) a small number of eigenvalues and eigenvectors of the unsymmetric transition matrix. The study showed the following: (1) compared to REDUCE, DEHIM is far more portable and economical, but it is also less user-friendly, particularly during learning phases; (2) the problems of finding the periodic response and eigenvalues are well conditioned.

Dual-DSP system for signal and image processing

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The design of a dual-DSP microprocessor system and its application for parallel FFT and two-dimensional convolution are explained. The system is based on a master-salve configuration. Two ADSP-2101s are configured as slave processors and a PC/AT serves as the master. The master serves as a control processor to transfer the program code and data to the DSPs. The system architecture and the algorithms for the two applications, viz. FFT and two-dimensional convolutions, are discussed.

Area Optimized H.264 Intra Prediction Architecture for 1080p HD Resolution

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High performance video standards use prediction techniques to achieve high picture quality at low bit rates. The type of prediction decides the bit rates and the image quality. Intra Prediction achieves high video quality with significant reduction in bit rate. This paper present an area optimized architecture for Intra prediction, for H.264 decoding at HDTV resolution with a target of achieving 60 fps. The architecture was validated on Virtex-5 FPGA based platform. The architecture achieves a frame rate of 64 fps. The architecture is based on multi-level memory hierarchy to reduce latency and ensure optimum resources utilization. It removes redundancy by reusing same functional blocks across different modes. The proposed architecture uses only 13% of the total LUTs available on the Xilinx FPGA XC5VLX50T.

Load balancing algorithms for an extended hypercube

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Reduction of the execution time of a job through equitable distribution of work load among the processors in a distributed system is the goal of load balancing. Performance of static and dynamic load balancing algorithms for the extended hypercube, is discussed. Threshold algorithms are very well-known algorithms for dynamic load balancing in distributed systems. An extension of the threshold algorithm, called the multilevel threshold algorithm, has been proposed. The hierarchical interconnection network of the extended hypercube is suitable for implementing the proposed algorithm. The new algorithm has been implemented on a transputer-based system and the performance of the algorithm for an extended hypercube is compared with those for mesh and binary hypercube networks

Performance of a pipelined ring algorithm for Fast Fourier Transform on transputer arrays

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper a pipelined ring algorithm is presented for efficient computation of one and two dimensional Fast Fourier Transform (FFT) on a message passing multiprocessor. The algorithm has been implemented on a transputer based system and experiments reveal that the algorithm is very efficient. A model for analysing the performance of the algorithm is developed from its computation-communication characteristics. Expressions for execution time, speedup and efficiency are obtained and these expressions are validated with experimental results obtained on a four transputer system. The analytical model is then used to estimate the performance of the algorithm for different number of processors, and for different sizes of the input data.

«
1
2
...
8
9
10
11
12
13
14
...
45
46
»