Biblioteca Digital

988 resultados para parallel implementation

ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

A novel parallel approach to the likelihood-based estimation of admixture in population genetics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Inferring population admixture from genetic data and quantifying it is a difficult but crucial task in evolutionary and conservation biology. Unfortunately state-of-the-art probabilistic approaches are computationally demanding. Effectively exploiting the computational power of modern multiprocessor systems can thus have a positive impact to Monte Carlo-based simulation of admixture modeling. A novel parallel approach is briefly described and promising results on its message passing interface (MPI)-based C implementation are reported.

Parallel performance and scalability experiments with the Danish Eulerian Model on the EPCC supercomputers

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Danish Eulerian Model (DEM) is a powerful air pollution model, designed to calculate the concentrations of various dangerous species over a large geographical region (e.g. Europe). It takes into account the main physical and chemical processes between these species, the actual meteorological conditions, emissions, etc.. This is a huge computational task and requires significant resources of storage and CPU time. Parallel computing is essential for the efficient practical use of the model. Some efficient parallel versions of the model were created over the past several years. A suitable parallel version of DEM by using the Message Passing Interface library (AIPI) was implemented on two powerful supercomputers of the EPCC - Edinburgh, available via the HPC-Europa programme for transnational access to research infrastructures in EC: a Sun Fire E15K and an IBM HPCx cluster. Although the implementation is in principal, the same for both supercomputers, few modifications had to be done for successful porting of the code on the IBM HPCx cluster. Performance analysis and parallel optimization was done next. Results from bench marking experiments will be presented in this paper. Another set of experiments was carried out in order to investigate the sensitivity of the model to variation of some chemical rate constants in the chemical submodel. Certain modifications of the code were necessary to be done in accordance with this task. The obtained results will be used for further sensitivity analysis Studies by using Monte Carlo simulation.

An MPI-based implementation of intelligent agents on clusters

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Intelligent agents for fault tolerance: from multi-agent simulation to cluster-based implementation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

A cluster computing implementation of a Monte Carlo simulation of a particle growth mechanism

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clusters of computers can be used together to provide a powerful computing resource. Large Monte Carlo simulations, such as those used to model particle growth, are computationally intensive and take considerable time to execute on conventional workstations. By spreading the work of the simulation across a cluster of computers, the elapsed execution time can be greatly reduced. Thus a user has apparently the performance of a supercomputer by using the spare cycles on other workstations.

Implementation of robust hybrid feedback FxLMS active noise cancellation headset

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Instability is a serious problem for acoustic Active Noise Cancellation (ANC) headsets as a result of large errors in estimating the transfer function of the plant. Typically this occurs when, for example, a wearer adjusts the headset. In this paper, the instability problem of adaptive ANC headset is addressed. To ensure stability of the whole system, we propose a hybrid solution consisting of an analog feedback loop parallel to the digital loop, and the role of the analog loop in stabilizing the headset is analyzed theoretically. Finally the methodology of implementing such a hybrid ANC headset is described in detail. The experiments carried out on the headset prototype show that the headset is robust under considerable fluctuations of the plant transfer characteristics, and has very good noise cancellation performance both for narrow-band and wide-band disturbances.

PMCRI: a parallel modular classification rule induction framework

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.

Parallel induction of modular classification rules

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Distributed Rule Induction (DRI) project at the University of Portsmouth is concerned with distributed data mining algorithms for automatically generating rules of all kinds. In this paper we present a system architecture and its implementation for inducing modular classification rules in parallel in a local area network using a distributed blackboard system. We present initial results of a prototype implementation based on the Prism algorithm.

A Parallel Hardware Architecture for Scale and Rotation Invariant Feature Detection

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a parallel hardware architecture for image feature detection based on the Scale Invariant Feature Transform algorithm and applied to the Simultaneous Localization And Mapping problem. The work also proposes specific hardware optimizations considered fundamental to embed such a robotic control system on-a-chip. The proposed architecture is completely stand-alone; it reads the input data directly from a CMOS image sensor and provides the results via a field-programmable gate array coupled to an embedded processor. The results may either be used directly in an on-chip application or accessed through an Ethernet connection. The system is able to detect features up to 30 frames per second (320 x 240 pixels) and has accuracy similar to a PC-based implementation. The achieved system performance is at least one order of magnitude better than a PC-based solution, a result achieved by investigating the impact of several hardware-orientated optimizations oil performance, area and accuracy.

Improvements in the score matrix calculation method using parallel score estimating algorithm

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increasing amount of sequences stored in genomic databases has become unfeasible to the sequential analysis. Then, the parallel computing brought its power to the Bioinformatics through parallel algorithms to align and analyze the sequences, providing improvements mainly in the running time of these algorithms. In many situations, the parallel strategy contributes to reducing the computational complexity of the big problems. This work shows some results obtained by an implementation of a parallel score estimating technique for the score matrix calculation stage, which is the first stage of a progressive multiple sequence alignment. The performance and quality of the parallel score estimating are compared with the results of a dynamic programming approach also implemented in parallel. This comparison shows a significant reduction of running time. Moreover, the quality of the final alignment, using the new strategy, is analyzed and compared with the quality of the approach with dynamic programming.

Hyperspectral image compression onboard next-generation satellites: implementation solutions on GPU and FPGAs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Programa de doctorado: Ingeniería de Telecomunicación Avanzada

Analysis of optimal control problems for the incompressible MHD equations and implementation in a finite element multiphysics code

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis deals with the study of optimal control problems for the incompressible Magnetohydrodynamics (MHD) equations. Particular attention to these problems arises from several applications in science and engineering, such as fission nuclear reactors with liquid metal coolant and aluminum casting in metallurgy. In such applications it is of great interest to achieve the control on the fluid state variables through the action of the magnetic Lorentz force. In this thesis we investigate a class of boundary optimal control problems, in which the flow is controlled through the boundary conditions of the magnetic field. Due to their complexity, these problems present various challenges in the definition of an adequate solution approach, both from a theoretical and from a computational point of view. In this thesis we propose a new boundary control approach, based on lifting functions of the boundary conditions, which yields both theoretical and numerical advantages. With the introduction of lifting functions, boundary control problems can be formulated as extended distributed problems. We consider a systematic mathematical formulation of these problems in terms of the minimization of a cost functional constrained by the MHD equations. The existence of a solution to the flow equations and to the optimal control problem are shown. The Lagrange multiplier technique is used to derive an optimality system from which candidate solutions for the control problem can be obtained. In order to achieve the numerical solution of this system, a finite element approximation is considered for the discretization together with an appropriate gradient-type algorithm. A finite element object-oriented library has been developed to obtain a parallel and multigrid computational implementation of the optimality system based on a multiphysics approach. Numerical results of two- and three-dimensional computations show that a possible minimum for the control problem can be computed in a robust and accurate manner.

A parallel community detection algorithm

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Complex networks analysis is a very popular topic in computer science. Unfortunately this networks, extracted from different contexts, are usually very large and the analysis may be very complicated: computation of metrics on these structures could be very complex. Among all metrics we analyse the extraction of subnetworks called communities: they are groups of nodes that probably play the same role within the whole structure. Communities extraction is an interesting operation in many different fields (biology, economics,...). In this work we present a parallel community detection algorithm that can operate on networks with huge number of nodes and edges. After an introduction to graph theory and high performance computing, we will explain our design strategies and our implementation. Then, we will show some performance evaluation made on a distributed memory architectures i.e. the supercomputer IBM-BlueGene/Q "Fermi" at the CINECA supercomputing center, Italy, and we will comment our results.

Implementation of network entropy algorithms on hpc machines, with application to high-dimensional experimental data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Network Theory is a prolific and lively field, especially when it approaches Biology. New concepts from this theory find application in areas where extensive datasets are already available for analysis, without the need to invest money to collect them. The only tools that are necessary to accomplish an analysis are easily accessible: a computing machine and a good algorithm. As these two tools progress, thanks to technology advancement and human efforts, wider and wider datasets can be analysed. The aim of this paper is twofold. Firstly, to provide an overview of one of these concepts, which originates at the meeting point between Network Theory and Statistical Mechanics: the entropy of a network ensemble. This quantity has been described from different angles in the literature. Our approach tries to be a synthesis of the different points of view. The second part of the work is devoted to presenting a parallel algorithm that can evaluate this quantity over an extensive dataset. Eventually, the algorithm will also be used to analyse high-throughput data coming from biology.

«
1
2
...
5
6
7
8
9
10
11
...
65
66
»