868 resultados para PARALLEL WORKSTATIONS


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Alewife, Alosa pseudoharengus, populations occur in two discrete life-history variants, an anadromous form and a landlocked (freshwater resident) form. Landlocked populations display a consistent pattern of life-history divergence from anadromous populations, including earlier age at maturity, smaller adult body size, and reduced fecundity. In Connecticut (USA), dams constructed on coastal streams separate anadromous spawning runs from lake-resident landlocked populations. Here, we used sequence data from the mtDNA control region and allele frequency data from five microsatellite loci to ask whether coastal Connecticut landlocked alewife populations are independently evolved from anadromous populations or whether they share a common freshwater ancestor. We then used microsatellite data to estimate the timing of the divergence between anadromous and landlocked populations. Finally, we examined anadromous and landlocked populations for divergence in foraging morphology and used divergence time estimates to calculate the rate of evolution for foraging traits. Our results indicate that landlocked populations have evolved multiple times independently. Tests of population divergence and estimates of gene flow show that landlocked populations are genetically isolated, whereas anadromous populations exchange genes. These results support a 'phylogenetic raceme' model of landlocked alewife divergence, with anadromous populations forming an ancestral core from which landlocked populations independently diverged. Divergence time estimates suggest that landlocked populations diverged from a common anadromous ancestor no longer than 5000 years ago and perhaps as recently as 300 years ago, depending on the microsatellite mutation rate assumed. Examination of foraging traits reveals landlocked populations to have significantly narrower gapes and smaller gill raker spacings than anadromous populations, suggesting that they are adapted to foraging on smaller prey items. Estimates of evolutionary rates (in haldanes) indicate rapid evolution of foraging traits, possibly in response to changes in available resources.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An abstract of this work will be presented at the Compiler, Architecture and Tools Conference (CATC), Intel Development Center, Haifa, Israel November 23, 2015.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The availability of a very accurate dependence graph for a scalar code is the basis for the automatic generation of an efficient parallel implementation. The strategy for this task which is encapsulated in a comprehensive data partitioning code generation algorithm is described. This algorithm involves the data partition, calculation of assignment ranges for partitioned arrays, addition of a comprehensive set of execution control masks, altering loop limits, addition and optimisation of communications for all data. In this context, the development and implementation of strategies to merge communications wherever possible has proved an important feature in producing efficient parallel implementations for numerical mesh based codes. The code generation strategies described here are embedded within the Computer Aided Parallelisation tools (CAPTools) software as a key part of a toolkit for automating as much as possible of the parallelisation process for mesh based numerical codes. The algorithms used enables parallelisation of real computational mechanics codes with only minor user interaction and without any prior manual customisation of the serial code to suit the parallelisation tool.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

User supplied knowledge and interaction is a vital component of a toolkit for producing high quality parallel implementations of scalar FORTRAN numerical code. In this paper we consider the necessary components that such a parallelisation toolkit should possess to provide an effective environment to identify, extract and embed user relevant user knowledge. We also examine to what extent these facilities are available in leading parallelisation tools; in particular we discuss how these issues have been addressed in the development of the user interface of the Computer Aided Parallelisation Tools (CAPTools). The CAPTools environment has been designed to enable user exploration, interaction and insertion of user knowledge to facilitate the automatic generation of very efficient parallel code. A key issue in the user's interaction is control of the volume of information so that the user is focused on only that which is needed. User control over the level and extent of information revealed at any phase is supplied using a wide variety of filters. Another issue is the way in which information is communicated. Dependence analysis and its resulting graphs involve a lot of sophisticated rather abstract concepts unlikely to be familiar to most users of parallelising tools. As such, considerable effort has been made to communicate with the user in terms that they will understand. These features, amongst others, and their use in the parallelisation process are described and their effectiveness discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A parallel method for the dynamic partitioning of unstructured meshes is described. The method introduces a new iterative optimization technique known as relative gain optimization which both balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Temperature distributions involved in some metal-cutting or surface-milling processes may be obtained by solving a non-linear inverse problem. A two-level concept on parallelism is introduced to compute such temperature distribution. The primary level is based on a problem-partitioning concept driven by the nature and properties of the non-linear inverse problem. Such partitioning results to a coarse-grained parallel algorithm. A simplified 2-D metal-cutting process is used as an example to illustrate the concept. A secondary level exploitation of further parallel properties based on the concept of domain-data parallelism is explained and implemented using MPI. Some experiments were performed on a network of loosely coupled machines consist of SUN Sparc Classic workstations and a network of tightly coupled processors, namely the Origin 2000.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel computing is now widely used in numerical simulation, particularly for application codes based on finite difference and finite element methods. A popular and successful technique employed to parallelize such codes onto large distributed memory systems is to partition the mesh into sub-domains that are then allocated to processors. The code then executes in parallel, using the SPMD methodology, with message passing for inter-processor interactions. In order to improve the parallel efficiency of an imbalanced structured mesh CFD code, a new dynamic load balancing (DLB) strategy has been developed in which the processor partition range limits of just one of the partitioned dimensions uses non-coincidental limits, as opposed to coincidental limits. The ‘local’ partition limit change allows greater flexibility in obtaining a balanced load distribution, as the workload increase, or decrease, on a processor is no longer restricted by the ‘global’ (coincidental) limit change. The automatic implementation of this generic DLB strategy within an existing parallel code is presented in this chapter, along with some preliminary results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The demands of the process of engineering design, particularly for structural integrity, have exploited computational modelling techniques and software tools for decades. Frequently, the shape of structural components or assemblies is determined to optimise the flow distribution or heat transfer characteristics, and to ensure that the structural performance in service is adequate. From the perspective of computational modelling these activities are typically separated into: • fluid flow and the associated heat transfer analysis (possibly with chemical reactions), based upon Computational Fluid Dynamics (CFD) technology • structural analysis again possibly with heat transfer, based upon finite element analysis (FEA) techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier–Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright © 2000 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.