839 resultados para Simplex. CPLEXR. Parallel Efficiency. Parallel Scalability. Linear Programming


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Comparative studies with a new 17 m parallel twin-body trawl and a 17 m bulged belly trawl conducted off Cochin during 1974-77 are reported. The parallel twin body trawl showed an increase of 28% in catch over that of bulged belly with a break up of 39.9% and 23.1 % for prawns and fishes respectively. The increase in catch is attributed to the extra wide mouth opening (26.6 %) of the parallel twin-body trawl. Parallel twin-body trawl had 8.96% lesser resistance which resulted in lower utilization of horse power.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In application of the Balancing Domain Decomposition by Constraints (BDDC) to a case with many substructures, solving the coarse problem exactly becomes the bottleneck which spoils scalability of the solver. However, it is straightforward for BDDC to substitute the exact solution of the coarse problem by another step of BDDC method with subdomains playing the role of elements. In this way, the algorithm of three-level BDDC method is obtained. If this approach is applied recursively, multilevel BDDC method is derived. We present a detailed description of a recently developed parallel implementation of this algorithm. The implementation is applied to an engineering problem of linear elasticity and a benchmark problem of Stokes flow in a cavity. Results by the multilevel approach are compared to those by the standard (two-level) BDDC method.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The scattering of linear water waves by an infinitely long rectangular structure parallel to a vertical wall in oblique seas is investigated. Analytical expressions for the diffracted potentials are derived using the method of separation of variables. The unknown coefficients in the expressions are determined through the application of the eigenfunction expansion matching method. The expressions for wave forces on the structure are given. The calculated results are compared with those obtained by the boundary element method. In addition, the influences of the wall, the angle of wave incidence, the width of the structure, and the distance between the structure and the wall on wave forces are discussed. The method presented here can be easily extended to the study of the diffraction of obliquely incident waves by multiple rectangular structures.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A three-dimensional MHD solver is described in the paper. The solver simulates reacting flows with nonequilibrium between translational-rotational, vibrational and electron translational modes. The conservation equations are discretized with implicit time marching and the second-order modified Steger-Warming scheme, and the resulted linear system is solved iteratively with Newton-Krylov-Schwarz method that is implemented by PETS,: package. The results of convergence tests arc plotted, which show good scalability and convergence around twice faster when compared with the DPLR method. Then five test runs are conducted simulating the experiments done at the NASA Ames MHD channel, and the calculated pressures, temperatures, electrical conductivity, back EMF, load factors and flow accelerations are shown to agree with the experimental data. Our computation shows that the electrical conductivity distribution is not uniform in the powered section of the MHD channel, and that it is important to include Joule heating in order to calculate the correct conductivity and the MHD acceleration.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

By incorporating two phosphorescent dyes, namely, iridium(III)[bis(4,6-difluorophenyl)-pyridinato-N,C-2']picolinate (Flrpic) for blue emission and bis(2-(9,9-diethyl-9H-fluoren-2-yl)-1-phenyl-1 H-benzoimidazol-N,C-3) iridium(acetylacetonate) ((fbi)(2)Ir(acac)) for orange emission, into a single-energy well-like emissive layer, an extremely high-efficiency white organic light-emitting diode (WOLED) with excellent color stability is demonstrated. This device can achieve a peak forward-viewing power efficiency of 42.5 lm W-1, corresponding to an external quantum efficiency (EQE) of 19.3% and a current efficiency of 52.8 cd A(-1). Systematic studies of the dopants, host and dopant-doped host films in terms of photophysical properties (including absorption, photoluminescence, and excitation spectra), transient photoluminescence, current density-voltage characteristics, and temperature-dependent electroluminescence spectra are subsequently performed, from which it is concluded that the emission natures of Flrpic and (fbi)(2)Ir(acac) are, respectively, host-guest energy transfer and a direct exciton formation process. These two parallel pathways serve to channel the overall excitons to both dopants, greatly reducing unfavorable energy losses.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second, a programming model, Mermera, is proposed. It provides programmers with a choice of hierarchically related non-coherent behaviors along with one coherent behavior. Thus, one can trade-off the ease of programming with coherent memory for improved performance with non-coherent memory. As an example, I present a program to solve a linear system of equations using an asynchronous iterative algorithm. This program uses all the behaviors offered by Mermera. Third, I describe the implementation of Mermera on a BBN Butterfly TC2000 and on a network of workstations. The performance of a version of the equation solving program that uses all the behaviors of Mermera is compared with that of a version that uses coherent behavior only. For a system of 1000 equations the former exhibits at least a 5-fold improvement in convergence time over the latter. The version using coherent behavior only does not benefit from employing more than one workstation to solve the problem while the program using non-coherent behavior continues to achieve improved performance as the number of workstations is increased from 1 to 6. This measurement corroborates our belief that non-coherent shared memory can be a performance boon for some applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Programmers of parallel processes that communicate through shared globally distributed data structures (DDS) face a difficult choice. Either they must explicitly program DDS management, by partitioning or replicating it over multiple distributed memory modules, or be content with a high latency coherent (sequentially consistent) memory abstraction that hides the DDS' distribution. We present Mermera, a new formalism and system that enable a smooth spectrum of noncoherent shared memory behaviors to coexist between the above two extremes. Our approach allows us to define known noncoherent memories in a new simple way, to identify new memory behaviors, and to characterize generic mixed-behavior computations. The latter are useful for programming using multiple behaviors that complement each others' advantages. On the practical side, we show that the large class of programs that use asynchronous iterative methods (AIM) can run correctly on slow memory, one of the weakest, and hence most efficient and fault-tolerant, noncoherence conditions. An example AIM program to solve linear equations, is developed to illustrate: (1) the need for concurrently mixing memory behaviors, and, (2) the performance gains attainable via noncoherence. Other program classes tolerate weak memory consistency by synchronizing in such a way as to yield executions indistinguishable from coherent ones. AIM computations on noncoherent memory yield noncoherent, yet correct, computations. We report performance data that exemplifies the potential benefits of noncoherence, in terms of raw memory performance, as well as application speed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An abstract of this work will be presented at the Compiler, Architecture and Tools Conference (CATC), Intel Development Center, Haifa, Israel November 23, 2015.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we first demonstrate that the classical Purcell's vector method when combined with row pivoting yields a consistently small growth factor in comparison to the well-known Gauss elimination method, the Gauss–Jordan method and the Gauss–Huard method with partial pivoting. We then present six parallel algorithms of the Purcell method that may be used for direct solution of linear systems. The algorithms differ in ways of pivoting and load balancing. We recommend algorithms V and VI for their reliability and algorithms III and IV for good load balance if local pivoting is acceptable. Some numerical results are presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider a problem of scheduling jobs on m parallel machines. The machines are dedicated, i.e., for each job the processing machine is known in advance. We mainly concentrate on the model in which at any time there is one unit of an additional resource. Any job may be assigned the resource and this reduces its processing time. A job that is given the resource uses it at each time of its processing. No two jobs are allowed to use the resource simultaneously. The objective is to minimize the makespan. We prove that the two-machine problem is NP-hard in the ordinary sense, describe a pseudopolynomial dynamic programming algorithm and convert it into an FPTAS. For the problem with an arbitrary number of machines we present an algorithm with a worst-case ratio close to 3/2, and close to 3, if a job can be given several units of the resource. For the problem with a fixed number of machines we give a PTAS. Virtually all algorithms rely on a certain variant of the linear knapsack problem (maximization, minimization, multiple-choice, bicriteria). © 2008 Wiley Periodicals, Inc. Naval Research Logistics, 2008