962 resultados para Parallel track model
Resumo:
The development of atherosclerosis in the aorta is associated with low and oscillatory wall shear stress for normal patients. Moreover, localized differences in wall shear stress heterogeneity have been correlated with the presence of complex plaques in the descending aorta. While it is known that coarctation of the aorta can influence indices of wall shear stress, it is unclear how the degree of narrowing influences resulting patterns. We hypothesized that the degree of coarctation would have a strong influence on focal heterogeneity of wall shear stress. To test this hypothesis, we modeled the fluid dynamics in a patient-specific aorta with varied degrees of coarctation. We first validated a massively parallel computational model against experimental results for the patient geometry and then evaluated local shear stress patterns for a range of degrees of coarctation. Wall shear stress patterns at two cross sectional slices prone to develop atherosclerotic plaques were evaluated. Levels at different focal regions were compared to the conventional measure of average circumferential shear stress to enable localized quantification of coarctation-induced shear stress alteration. We find that the coarctation degree causes highly heterogeneous changes in wall shear stress.
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
Résumé : Les ions hydronium (H3O + ) sont formés, à temps courts, dans les grappes ou le long des trajectoires de la radiolyse de l'eau par des rayonnements ionisants à faible transfert d’énergie linéaire (TEL) ou à TEL élevé. Cette formation in situ de H3O + rend la région des grappes/trajectoires du rayonnement temporairement plus acide que le milieu environnant. Bien que des preuves expérimentales de l’acidité d’une grappe aient déjà été signalées, il n'y a que des informations fragmentaires quant à son ampleur et sa dépendance en temps. Dans ce travail, nous déterminons les concentrations en H3O + et les valeurs de pH correspondantes en fonction du temps à partir des rendements de H3O + calculés à l’aide de simulations Monte Carlo de la chimie intervenant dans les trajectoires. Quatre ions incidents de différents TEL ont été sélectionnés et deux modèles de grappe/trajectoire ont été utilisés : 1) un modèle de grappe isolée "sphérique" (faible TEL) et 2) un modèle de trajectoire "cylindrique" (TEL élevé). Dans tous les cas étudiés, un effet de pH acide brusque transitoire, que nous appelons un effet de "pic acide", est observé immédiatement après l’irradiation. Cet effet ne semble pas avoir été exploré dans l'eau ou un milieu cellulaire soumis à un rayonnement ionisant, en particulier à haut TEL. À cet égard, ce travail soulève des questions sur les implications possibles de cet effet en radiobiologie, dont certaines sont évoquées brièvement. Nos calculs ont ensuite été étendus à l’étude de l'influence de la température, de 25 à 350 °C, sur la formation in situ d’ions H3O + et l’effet de pic acide qui intervient à temps courts lors de la radiolyse de l’eau à faible TEL. Les résultats montrent une augmentation marquée de la réponse de pic acide à hautes températures. Comme de nombreux processus intervenant dans le cœur d’un réacteur nucléaire refroidi à l'eau dépendent de façon critique du pH, la question ici est de savoir si ces fortes variations d’acidité, même si elles sont hautement localisées et transitoires, contribuent à la corrosion et l’endommagement des matériaux.
Resumo:
This paper discusses the integrated design of parallel manipulators, which exhibit varying dynamics. This characteristic affects the machine stability and performance. The design methodology consists of four main steps: (i) the system modeling using flexible multibody technique, (ii) the synthesis of reduced-order models suitable for control design, (iii) the systematic flexible model-based input signal design, and (iv) the evaluation of some possible machine designs. The novelty in this methodology is to take structural flexibilities into consideration during the input signal design; therefore, enhancing the standard design process which mainly considers rigid bodies dynamics. The potential of the proposed strategy is exploited for the design evaluation of a two degree-of-freedom high-speed parallel manipulator. The results are experimentally validated. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The 30th ACM/SIGAPP Symposium On Applied Computing (SAC 2015). 13 to 17, Apr, 2015, Embedded Systems. Salamanca, Spain.
Resumo:
In this study we propose an evaluation of the angular effects altering the spectral response of the land-cover over multi-angle remote sensing image acquisitions. The shift in the statistical distribution of the pixels observed in an in-track sequence of WorldView-2 images is analyzed by means of a kernel-based measure of distance between probability distributions. Afterwards, the portability of supervised classifiers across the sequence is investigated by looking at the evolution of the classification accuracy with respect to the changing observation angle. In this context, the efficiency of various physically and statistically based preprocessing methods in obtaining angle-invariant data spaces is compared and possible synergies are discussed.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
This thesis defines Pi, a parallel architecture interface that separates model and machine issues, allowing them to be addressed independently. This provides greater flexibility for both the model and machine builder. Pi addresses a set of common parallel model requirements including low latency communication, fast task switching, low cost synchronization, efficient storage management, the ability to exploit locality, and efficient support for sequential code. Since Pi provides generic parallel operations, it can efficiently support many parallel programming models including hybrids of existing models. Pi also forms a basis of comparison for architectural components.
Resumo:
This article describes a novel algorithmic development extending the contour advective semi-Lagrangian model to include nonconservative effects. The Lagrangian contour representation of finescale tracer fields, such as potential vorticity, allows for conservative, nondiffusive treatment of sharp gradients allowing very high numerical Reynolds numbers. It has been widely employed in accurate geostrophic turbulence and tracer advection simulations. In the present, diabatic version of the model the constraint of conservative dynamics is overcome by including a parallel Eulerian field that absorbs the nonconservative ( diabatic) tendencies. The diabatic buildup in this Eulerian field is limited through regular, controlled transfers of this field to the contour representation. This transfer is done with a fast newly developed contouring algorithm. This model has been implemented for several idealized geometries. In this paper a single-layer doubly periodic geometry is used to demonstrate the validity of the model. The present model converges faster than the analogous semi-Lagrangian models at increased resolutions. At the same nominal spatial resolution the new model is 40 times faster than the analogous semi-Lagrangian model. Results of an orographically forced idealized storm track show nontrivial dependency of storm-track statistics on resolution and on the numerical model employed. If this result is more generally applicable, this may have important consequences for future high-resolution climate modeling.
Resumo:
A high resolution regional atmosphere model is used to investigate the sensitivity of the North Atlantic storm track to the spatial and temporal resolution of the sea surface temperature (SST) data used as a lower boundary condition. The model is run over an unusually large domain covering all of the North Atlantic and Europe, and is shown to produce a very good simulation of the observed storm track structure. The model is forced at the lateral boundaries with 15–20 years of data from the ERA-40 reanalysis, and at the lower boundary by SST data of differing resolution. The impacts of increasing spatial and temporal resolution are assessed separately, and in both cases increasing the resolution leads to subtle, but significant changes in the storm track. In some, but not all cases these changes act to reduce the small storm track biases seen in the model when it is forced with low-resolution SSTs. In addition there are several clear mesoscale responses to increased spatial SST resolution, with surface heat fluxes and convective precipitation increasing by 10–20% along the Gulf Stream SST gradient.
Resumo:
The Danish Eulerian Model (DEM) is a powerful air pollution model, designed to calculate the concentrations of various dangerous species over a large geographical region (e.g. Europe). It takes into account the main physical and chemical processes between these species, the actual meteorological conditions, emissions, etc.. This is a huge computational task and requires significant resources of storage and CPU time. Parallel computing is essential for the efficient practical use of the model. Some efficient parallel versions of the model were created over the past several years. A suitable parallel version of DEM by using the Message Passing Interface library (AIPI) was implemented on two powerful supercomputers of the EPCC - Edinburgh, available via the HPC-Europa programme for transnational access to research infrastructures in EC: a Sun Fire E15K and an IBM HPCx cluster. Although the implementation is in principal, the same for both supercomputers, few modifications had to be done for successful porting of the code on the IBM HPCx cluster. Performance analysis and parallel optimization was done next. Results from bench marking experiments will be presented in this paper. Another set of experiments was carried out in order to investigate the sensitivity of the model to variation of some chemical rate constants in the chemical submodel. Certain modifications of the code were necessary to be done in accordance with this task. The obtained results will be used for further sensitivity analysis Studies by using Monte Carlo simulation.
Resumo:
Large scale air pollution models are powerful tools, designed to meet the increasing demand in different environmental studies. The atmosphere is the most dynamic component of the environment, where the pollutants can be moved quickly on far distnce. Therefore the air pollution modeling must be done in a large computational domain. Moreover, all relevant physical, chemical and photochemical processes must be taken into account. In such complex models operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The Danish Eulerian Model (DEM) is one of the most advanced such models. Its space domain (4800 × 4800 km) covers Europe, most of the Mediterian and neighboring parts of Asia and the Atlantic Ocean. Efficient parallelization is crucial for the performance and practical capabilities of this huge computational model. Different splitting schemes, based on the main processes mentioned above, have been implemented and tested with respect to accuracy and performance in the new version of DEM. Some numerical results of these experiments are presented in this paper.
Resumo:
A connection between a fuzzy neural network model with the mixture of experts network (MEN) modelling approach is established. Based on this linkage, two new neuro-fuzzy MEN construction algorithms are proposed to overcome the curse of dimensionality that is inherent in the majority of associative memory networks and/or other rule based systems. The first construction algorithm employs a function selection manager module in an MEN system. The second construction algorithm is based on a new parallel learning algorithm in which each model rule is trained independently, for which the parameter convergence property of the new learning method is established. As with the first approach, an expert selection criterion is utilised in this algorithm. These two construction methods are equivalent in their effectiveness in overcoming the curse of dimensionality by reducing the dimensionality of the regression vector, but the latter has the additional computational advantage of parallel processing. The proposed algorithms are analysed for effectiveness followed by numerical examples to illustrate their efficacy for some difficult data based modelling problems.
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.