990 resultados para distributed computation


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Numerical solutions of realistic 2-D and 3-D inverse problems may require a very large amount of computation. A two-level concept on parallelism is often used to solve such problems. The primary level uses the problem partitioning concept which is a decomposition based on the mathematical/physical problem. The secondary level utilizes the widely used data partitioning concept. A theoretical performance model is built based on the two-level parallelism. The observed performance results obtained from a network of general purpose Sun Sparc stations are compared with the theoretical values. Restrictions of the theoretical model are also discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Com- puting (HPC) storage systems, which are at the forefront of handling the data del- uge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and can become a bottleneck during data reconstruction. In this paper, we design an innovative solution to achieve a flex- ible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system (PFS). Our system utilizes low-cost, strategically placed GPUs — both on the client and server sides — to accelerate parity computation. In contrast to hardware-based approaches, we provide full control over the size, length and location of a RAID array on a per file basis, end-to-end data integrity checking, and parallelization of RAID array reconstruction. We have deployed our system in conjunction with the widely-used Lustre PFS, and show that our approach is feasible and imposes ac- ceptable overhead.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2014

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, vehicular cloud computing (VCC) has emerged as a new technology which is being used in wide range of applications in the area of multimedia-based healthcare applications. In VCC, vehicles act as the intelligent machines which can be used to collect and transfer the healthcare data to the local, or global sites for storage, and computation purposes, as vehicles are having comparatively limited storage and computation power for handling the multimedia files. However, due to the dynamic changes in topology, and lack of centralized monitoring points, this information can be altered, or misused. These security breaches can result in disastrous consequences such as-loss of life or financial frauds. Therefore, to address these issues, a learning automata-assisted distributive intrusion detection system is designed based on clustering. Although there exist a number of applications where the proposed scheme can be applied but, we have taken multimedia-based healthcare application for illustration of the proposed scheme. In the proposed scheme, learning automata (LA) are assumed to be stationed on the vehicles which take clustering decisions intelligently and select one of the members of the group as a cluster-head. The cluster-heads then assist in efficient storage and dissemination of information through a cloud-based infrastructure. To secure the proposed scheme from malicious activities, standard cryptographic technique is used in which the auotmaton learns from the environment and takes adaptive decisions for identification of any malicious activity in the network. A reward and penalty is given by the stochastic environment where an automaton performs its actions so that it updates its action probability vector after getting the reinforcement signal from the environment. The proposed scheme was evaluated using extensive simulations on ns-2 with SUMO. The results obtained indicate that the proposed scheme yields an improvement of 10 % in detection rate of malicious nodes when compared with the existing schemes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this report, we discuss the application of global optimization and Evolutionary Computation to distributed systems. We therefore selected and classified many publications, giving an insight into the wide variety of optimization problems which arise in distributed systems. Some interesting approaches from different areas will be discussed in greater detail with the use of illustrative examples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A distributed method for mobile robot navigation, spatial learning, and path planning is presented. It is implemented on a sonar-based physical robot, Toto, consisting of three competence layers: 1) Low-level navigation: a collection of reflex-like rules resulting in emergent boundary-tracing. 2) Landmark detection: dynamically extracts landmarks from the robot's motion. 3) Map learning: constructs a distributed map of landmarks. The parallel implementation allows for localization in constant time. Spreading of activation computes both topological and physical shortest paths in linear time. The main issues addressed are: distributed, procedural, and qualitative representation and computation, emergent behaviors, dynamic landmarks, minimized communication.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aquest treball proposa una nova arquitectura de control amb coordinació distribuïda per a un robot mòbil (ARMADiCo). La metodologia de coordinació distribuïda consisteix en dos passos: el primer determina quin és l'agent que guanya el recurs basat en el càlcul privat de la utilitat i el segon, com es fa el canvi del recurs per evitar comportaments abruptes del robot. Aquesta arquitectura ha estat concebuda per facilitar la introducció de nous components hardware i software, definint un patró de disseny d'agents que captura les característiques comunes dels agents. Aquest patró ha portat al desenvolupament d'una arquitectura modular dins l'agent que permet la separació dels diferents mètodes utilitzats per aconseguir els objectius, la col·laboració, la competició i la coordinació de recursos. ARMADiCo s'ha provat en un robot Pioneer 2DX de MobileRobots Inc.. S'han fet diversos experiments i els resultats han demostrat que s'han aconseguit les característiques proposades per l'arquitectura.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we study the periodic oscillatory behavior of a class of bidirectional associative memory (BAM) networks with finite distributed delays. A set of criteria are proposed for determining global exponential periodicity of the proposed BAM networks, which assume neither differentiability nor monotonicity of the activation function of each neuron. In addition, our criteria are easily checkable. (c) 2005 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bayesian analysis is given of an instrumental variable model that allows for heteroscedasticity in both the structural equation and the instrument equation. Specifically, the approach for dealing with heteroscedastic errors in Geweke (1993) is extended to the Bayesian instrumental variable estimator outlined in Rossi et al. (2005). Heteroscedasticity is treated by modelling the variance for each error using a hierarchical prior that is Gamma distributed. The computation is carried out by using a Markov chain Monte Carlo sampling algorithm with an augmented draw for the heteroscedastic case. An example using real data illustrates the approach and shows that ignoring heteroscedasticity in the instrument equation when it exists may lead to biased estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper applies the concepts and methods of complex networks to the development of models and simulations of master-slave distributed real-time systems by introducing an upper bound in the allowable delivery time of the packets with computation results. Two representative interconnection models are taken into account: Uniformly random and scale free (Barabasi-Albert), including the presence of background traffic of packets. The obtained results include the identification of the uniformly random interconnectivity scheme as being largely more efficient than the scale-free counterpart. Also, increased latency tolerance of the application provides no help under congestion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parallel technique, for a distributed memory machine, based on domain decomposition for solving the Navier-Stokes equations in cartesian and cylindrical coordinates in two dimensions with free surfaces is described. It is based on the code by Tome and McKee (J. Comp. Phys. 110 (1994) 171-186) and Tome (Ph.D. Thesis, University of Strathclyde, Glasgow, 1993) which in turn is based on the SMAC method by Amsden and Harlow (Report LA-4370, Los Alamos Scientific Laboratory, 1971), which solves the Navier-Stokes equations in three steps: the momentum and Poisson equations and particle movement, These equations are discretized by explicit and 5-point finite differences. The parallelization is performed by splitting the computation domain into vertical panels and assigning each of these panels to a processor. All the computation can then be performed using nearest neighbour communication. Test runs comparing the performance of the parallel with the serial code, and a discussion of the load balancing question are presented. PVM is used for communication between processes. (C) 1999 Elsevier B.V. B.V. All rights reserved.