989 resultados para Parallel computation


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skeleton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel computation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases, cost models are used to drive the refactoring process. We show how sample use case applications/kernels may be optimized and discuss preliminary experiments with FastFlow assessing the theoretical results. © 2013 Springer-Verlag.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. The advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictable high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. To this end, it is of paramount importance to develop new techniques for exploiting the massively parallel computation capabilities of such platforms in a predictable way. P-SOCRATES will tackle this important challenge by merging leading research groups from the HPC and EC communities. The time-criticality and parallelisation challenges common to both areas will be addressed by proposing an integrated framework for executing workload-intensive applications with real-time requirements on top of next-generation commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The project will investigate new HPC techniques that fulfil real-time requirements. The main sources of indeterminism will be identified, proposing efficient mapping and scheduling algorithms, along with the associated timing and schedulability analysis, to guarantee the real-time and performance requirements of the applications.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Very large spatially-referenced datasets, for example, those derived from satellite-based sensors which sample across the globe or large monitoring networks of individual sensors, are becoming increasingly common and more widely available for use in environmental decision making. In large or dense sensor networks, huge quantities of data can be collected over small time periods. In many applications the generation of maps, or predictions at specific locations, from the data in (near) real-time is crucial. Geostatistical operations such as interpolation are vital in this map-generation process and in emergency situations, the resulting predictions need to be available almost instantly, so that decision makers can make informed decisions and define risk and evacuation zones. It is also helpful when analysing data in less time critical applications, for example when interacting directly with the data for exploratory analysis, that the algorithms are responsive within a reasonable time frame. Performing geostatistical analysis on such large spatial datasets can present a number of problems, particularly in the case where maximum likelihood. Although the storage requirements only scale linearly with the number of observations in the dataset, the computational complexity in terms of memory and speed, scale quadratically and cubically respectively. Most modern commodity hardware has at least 2 processor cores if not more. Other mechanisms for allowing parallel computation such as Grid based systems are also becoming increasingly commonly available. However, currently there seems to be little interest in exploiting this extra processing power within the context of geostatistics. In this paper we review the existing parallel approaches for geostatistics. By recognising that diffeerent natural parallelisms exist and can be exploited depending on whether the dataset is sparsely or densely sampled with respect to the range of variation, we introduce two contrasting novel implementations of parallel algorithms based on approximating the data likelihood extending the methods of Vecchia [1988] and Tresp [2000]. Using parallel maximum likelihood variogram estimation and parallel prediction algorithms we show that computational time can be significantly reduced. We demonstrate this with both sparsely sampled data and densely sampled data on a variety of architectures ranging from the common dual core processor, found in many modern desktop computers, to large multi-node super computers. To highlight the strengths and weaknesses of the diffeerent methods we employ synthetic data sets and go on to show how the methods allow maximum likelihood based inference on the exhaustive Walker Lake data set.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper compares the performances of two different optimisation techniques for solving inverse problems; the first one deals with the Hierarchical Asynchronous Parallel Evolutionary Algorithms software (HAPEA) and the second is implemented with a game strategy named Nash-EA. The HAPEA software is based on a hierarchical topology and asynchronous parallel computation. The Nash-EA methodology is introduced as a distributed virtual game and consists of splitting the wing design variables - aerofoil sections - supervised by players optimising their own strategy. The HAPEA and Nash-EA software methodologies are applied to a single objective aerodynamic ONERA M6 wing reconstruction. Numerical results from the two approaches are compared in terms of the quality of model and computational expense and demonstrate the superiority of the distributed Nash-EA methodology in a parallel environment for a similar design quality.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Node-based Local Mesh Generation (NLMG) algorithm, which is free of mesh inconsistency, is one of core algorithms in the Node-based Local Finite Element Method (NLFEM) to achieve the seamless link between mesh generation and stiffness matrix calculation, and the seamless link helps to improve the parallel efficiency of FEM. Furthermore, the key to ensure the efficiency and reliability of NLMG is to determine the candidate satellite-node set of a central node quickly and accurately. This paper develops a Fast Local Search Method based on Uniform Bucket (FLSMUB) and a Fast Local Search Method based on Multilayer Bucket (FLSMMB), and applies them successfully to the decisive problems, i.e. presenting the candidate satellite-node set of any central node in NLMG algorithm. Using FLSMUB or FLSMMB, the NLMG algorithm becomes a practical tool to reduce the parallel computation cost of FEM. Parallel numerical experiments validate that either FLSMUB or FLSMMB is fast, reliable and efficient for their suitable problems and that they are especially effective for computing the large-scale parallel problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The load–frequency control (LFC) problem has been one of the major subjects in a power system. In practice, LFC systems use proportional–integral (PI) controllers. However since these controllers are designed using a linear model, the non-linearities of the system are not accounted for and they are incapable of gaining good dynamical performance for a wide range of operating conditions in a multi-area power system. A strategy for solving this problem because of the distributed nature of a multi-area power system is presented by using a multi-agent reinforcement learning (MARL) approach. It consists of two agents in each power area; the estimator agent provides the area control error (ACE) signal based on the frequency bias estimation and the controller agent uses reinforcement learning to control the power system in which genetic algorithm optimisation is used to tune its parameters. This method does not depend on any knowledge of the system and it admits considerable flexibility in defining the control objective. Also, by finding the ACE signal based on the frequency bias estimation the LFC performance is improved and by using the MARL parallel, computation is realised, leading to a high degree of scalability. Here, to illustrate the accuracy of the proposed approach, a three-area power system example is given with two scenarios.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Considerate amount of research has proposed optimization-based approaches employing various vibration parameters for structural damage diagnosis. The damage detection by these methods is in fact a result of updating the analytical structural model in line with the current physical model. The feasibility of these approaches has been proven. But most of the verification has been done on simple structures, such as beams or plates. In the application on a complex structure, like steel truss bridges, a traditional optimization process will cost massive computational resources and lengthy convergence. This study presents a multi-layer genetic algorithm (ML-GA) to overcome the problem. Unlike the tedious convergence process in a conventional damage optimization process, in each layer, the proposed algorithm divides the GA’s population into groups with a less number of damage candidates; then, the converged population in each group evolves as an initial population of the next layer, where the groups merge to larger groups. In a damage detection process featuring ML-GA, as parallel computation can be implemented, the optimization performance and computational efficiency can be enhanced. In order to assess the proposed algorithm, the modal strain energy correlation (MSEC) has been considered as the objective function. Several damage scenarios of a complex steel truss bridge’s finite element model have been employed to evaluate the effectiveness and performance of ML-GA, against a conventional GA. In both single- and multiple damage scenarios, the analytical and experimental study shows that the MSEC index has achieved excellent damage indication and efficiency using the proposed ML-GA, whereas the conventional GA only converges at a local solution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

在具有复杂边界的计算区域内,求解偏微分方程组时,经常需要分区和并行计算,分区方法直接关系到数值计算的并行化程度,本文在应用时间算子分裂方法求解Euler方程组的过程中,提出了一种非常容易实现并行化计算的分区技术.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In order to capture shock waves and contact discontinuities in the field and easy to program with parallel computation a new algorithm is developed to solve the N-S equations for simulation of R-M instability problems. The method with group velocity control is used to suppress numerical oscillations, and an adaptive non-uniform mesh is used to get fine resolution. Numerical results for cylindrical shock-cylindrical interface interaction with a shock Mach number Ms=1.2 and Atwood number A=0.818, 0.961, 0.980 (the interior density of the interface/outer density p(1)/p(2) = 10, 50, 100, respectively), and for the planar shock-spherical interface interaction with Ms=1.2 and p(1)/p(2) = 14.28are presented. The effect of Atwood number and multi-mode initial perturbation on the R-M instability are studied. Multi-collisions of the reflected shock with the interface is a main reason of nonlinear development of the interface instability and formation of the spike-bubble structures In simulation with double mode perturbation vortex merging and second instability are found. After second instability the small vortex structures near the interface produced. It is important factor for turbulent mixing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Optical Coherence Tomography(OCT) is a popular, rapidly growing imaging technique with an increasing number of bio-medical applications due to its noninvasive nature. However, there are three major challenges in understanding and improving an OCT system: (1) Obtaining an OCT image is not easy. It either takes a real medical experiment or requires days of computer simulation. Without much data, it is difficult to study the physical processes underlying OCT imaging of different objects simply because there aren't many imaged objects. (2) Interpretation of an OCT image is also hard. This challenge is more profound than it appears. For instance, it would require a trained expert to tell from an OCT image of human skin whether there is a lesion or not. This is expensive in its own right, but even the expert cannot be sure about the exact size of the lesion or the width of the various skin layers. The take-away message is that analyzing an OCT image even from a high level would usually require a trained expert, and pixel-level interpretation is simply unrealistic. The reason is simple: we have OCT images but not their underlying ground-truth structure, so there is nothing to learn from. (3) The imaging depth of OCT is very limited (millimeter or sub-millimeter on human tissues). While OCT utilizes infrared light for illumination to stay noninvasive, the downside of this is that photons at such long wavelengths can only penetrate a limited depth into the tissue before getting back-scattered. To image a particular region of a tissue, photons first need to reach that region. As a result, OCT signals from deeper regions of the tissue are both weak (since few photons reached there) and distorted (due to multiple scatterings of the contributing photons). This fact alone makes OCT images very hard to interpret.

This thesis addresses the above challenges by successfully developing an advanced Monte Carlo simulation platform which is 10000 times faster than the state-of-the-art simulator in the literature, bringing down the simulation time from 360 hours to a single minute. This powerful simulation tool not only enables us to efficiently generate as many OCT images of objects with arbitrary structure and shape as we want on a common desktop computer, but it also provides us the underlying ground-truth of the simulated images at the same time because we dictate them at the beginning of the simulation. This is one of the key contributions of this thesis. What allows us to build such a powerful simulation tool includes a thorough understanding of the signal formation process, clever implementation of the importance sampling/photon splitting procedure, efficient use of a voxel-based mesh system in determining photon-mesh interception, and a parallel computation of different A-scans that consist a full OCT image, among other programming and mathematical tricks, which will be explained in detail later in the thesis.

Next we aim at the inverse problem: given an OCT image, predict/reconstruct its ground-truth structure on a pixel level. By solving this problem we would be able to interpret an OCT image completely and precisely without the help from a trained expert. It turns out that we can do much better. For simple structures we are able to reconstruct the ground-truth of an OCT image more than 98% correctly, and for more complicated structures (e.g., a multi-layered brain structure) we are looking at 93%. We achieved this through extensive uses of Machine Learning. The success of the Monte Carlo simulation already puts us in a great position by providing us with a great deal of data (effectively unlimited), in the form of (image, truth) pairs. Through a transformation of the high-dimensional response variable, we convert the learning task into a multi-output multi-class classification problem and a multi-output regression problem. We then build a hierarchy architecture of machine learning models (committee of experts) and train different parts of the architecture with specifically designed data sets. In prediction, an unseen OCT image first goes through a classification model to determine its structure (e.g., the number and the types of layers present in the image); then the image is handed to a regression model that is trained specifically for that particular structure to predict the length of the different layers and by doing so reconstruct the ground-truth of the image. We also demonstrate that ideas from Deep Learning can be useful to further improve the performance.

It is worth pointing out that solving the inverse problem automatically improves the imaging depth, since previously the lower half of an OCT image (i.e., greater depth) can be hardly seen but now becomes fully resolved. Interestingly, although OCT signals consisting the lower half of the image are weak, messy, and uninterpretable to human eyes, they still carry enough information which when fed into a well-trained machine learning model spits out precisely the true structure of the object being imaged. This is just another case where Artificial Intelligence (AI) outperforms human. To the best knowledge of the author, this thesis is not only a success but also the first attempt to reconstruct an OCT image at a pixel level. To even give a try on this kind of task, it would require fully annotated OCT images and a lot of them (hundreds or even thousands). This is clearly impossible without a powerful simulation tool like the one developed in this thesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O uso de técnicas com o funcional de Tikhonov em processamento de imagens tem sido amplamente usado nos últimos anos. A ideia básica nesse processo é modificar uma imagem inicial via equação de convolução e encontrar um parâmetro que minimize esse funcional afim de obter uma aproximação da imagem original. Porém, um problema típico neste método consiste na seleção do parâmetro de regularização adequado para o compromisso entre a acurácia e a estabilidade da solução. Um método desenvolvido por pesquisadores do IPRJ e UFRJ, atuantes na área de problemas inversos, consiste em minimizar um funcional de resíduos através do parâmetro de regularização de Tikhonov. Uma estratégia que emprega a busca iterativa deste parâmetro visando obter um valor mínimo para o funcional na iteração seguinte foi adotada recentemente em um algoritmo serial de restauração. Porém, o custo computacional é um fator problema encontrado ao empregar o método iterativo de busca. Com esta abordagem, neste trabalho é feita uma implementação em linguagem C++ que emprega técnicas de computação paralela usando MPI (Message Passing Interface) para a estratégia de minimização do funcional com o método de busca iterativa, reduzindo assim, o tempo de execução requerido pelo algoritmo. Uma versão modificada do método de Jacobi é considerada em duas versões do algoritmo, uma serial e outra em paralelo. Este algoritmo é adequado para implementação paralela por não possuir dependências de dados como de Gauss-Seidel que também é mostrado a convergir. Como indicador de desempenho para avaliação do algoritmo de restauração, além das medidas tradicionais, uma nova métrica que se baseia em critérios subjetivos denominada IWMSE (Information Weighted Mean Square Error) é empregada. Essas métricas foram introduzidas no programa serial de processamento de imagens e permitem fazer a análise da restauração a cada passo de iteração. Os resultados obtidos através das duas versões possibilitou verificar a aceleração e a eficiência da implementação paralela. A método de paralelismo apresentou resultados satisfatórios em um menor tempo de processamento e com desempenho aceitável.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Model Predictive Control (MPC) is increasingly being proposed for application to miniaturized devices, fast and/or embedded systems. A major obstacle to this is its computation time requirement. Continuing our previous studies of implementing constrained MPC on Field Programmable Gate Arrays (FPGA), this paper begins to exploit the possibilities of parallel computation, with the aim of speeding up the MPC implementation. Simulation studies on a realistic example show that it is possible to implement constrained MPC on an FPGA chip with a 25MHz clock and achieve MPC implementation rates comparable to those achievable on a Pentium 3.0 GHz PC. Copyright © 2007 International Federation of Automatic Control All Rights Reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a video-based system which interactively captures the geometry of a 3D object in the form of a point cloud, then recognizes and registers known objects in this point cloud in a matter of seconds (fig. 1). In order to achieve interactive speed, we exploit both efficient inference algorithms and parallel computation, often on a GPU. The system can be broken down into two distinct phases: geometry capture, and object inference. We now discuss these in further detail. © 2011 IEEE.