Biblioteca Digital

157 resultados para parallelization

Randomized Parallelization – a New Method for Solving Hard Combinatorial Problems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new method for solving some hard combinatorial optimization problems is suggested, admitting a certain reformulation. Considering such a problem, several different similar problems are prepared which have the same set of solutions. They are solved on computer in parallel until one of them will be solved, and that solution is accepted. Notwithstanding the evident overhead, the whole run-time could be significantly reduced due to dispersion of velocities of combinatorial search in regarded cases. The efficiency of this approach is investigated on the concrete problem of finding short solutions of non-deterministic system of linear logical equations.

Parallelization of Logical Inference for Confluent Rule-based System

Relevância:

20.00% 20.00%

Publicador:

Resumo:

* This paper was made according to the program № 14 of fundamental scientific research of the Presidium of the Russian Academy of Sciences, the project 06-I-П14-052

Distributed parallelization of greedy Mobile Network Optimization algorithms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Mobile Network Optimization (MNO) technologies have advanced at a tremendous pace in recent years. And the Dynamic Network Optimization (DNO) concept emerged years ago, aimed to continuously optimize the network in response to variations in network traffic and conditions. Yet, DNO development is still at its infancy, mainly hindered by a significant bottleneck of the lengthy optimization runtime. This paper identifies parallelism in greedy MNO algorithms and presents an advanced distributed parallel solution. The solution is designed, implemented and applied to real-life projects whose results yield a significant, highly scalable and nearly linear speedup up to 6.9 and 14.5 on distributed 8-core and 16-core systems respectively. Meanwhile, optimization outputs exhibit self-consistency and high precision compared to their sequential counterpart. This is a milestone in realizing the DNO. Further, the techniques may be applied to similar greedy optimization algorithm based applications.

Asynchronous distributed parallelization of mobile network optimization algorithms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It has been years since the introduction of the Dynamic Network Optimization (DNO) concept, yet the DNO development is still at its infant stage, largely due to a lack of breakthrough in minimizing the lengthy optimization runtime. Our previous work, a distributed parallel solution, has achieved a significant speed gain. To cater for the increased optimization complexity pressed by the uptake of smartphones and tablets, however, this paper examines the potential areas for further improvement and presents a novel asynchronous distributed parallel design that minimizes the inter-process communications. The new approach is implemented and applied to real-life projects whose results demonstrate an augmented acceleration of 7.5 times on a 16-core distributed system compared to 6.1 of our previous solution. Moreover, there is no degradation in the optimization outcome. This is a solid sprint towards the realization of DNO.

Dynamic Workload Parallelization and System Calibration on Single-Chip Cloud Computer

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many-core systems are emerging from the need of more computational power and power efficiency. However there are many issues which still revolve around the many-core systems. These systems need specialized software before they can be fully utilized and the hardware itself may differ from the conventional computational systems. To gain efficiency from many-core system, programs need to be parallelized. In many-core systems the cores are small and less powerful than cores used in traditional computing, so running a conventional program is not an efficient option. Also in Network-on-Chip based processors the network might get congested and the cores might work at different speeds. In this thesis is, a dynamic load balancing method is proposed and tested on Intel 48-core Single-Chip Cloud Computer by parallelizing a fault simulator. The maximum speedup is difficult to obtain due to severe bottlenecks in the system. In order to exploit all the available parallelism of the Single-Chip Cloud Computer, a runtime approach capable of dynamically balancing the load during the fault simulation process is used. The proposed dynamic fault simulation approach on the Single-Chip Cloud Computer shows up to 45X speedup compared to a serial fault simulation approach. Many-core systems can draw enormous amounts of power, and if this power is not controlled properly, the system might get damaged. One way to manage power is to set power budget for the system. But if this power is drawn by just few cores of the many, these few cores get extremely hot and might get damaged. Due to increase in power density multiple thermal sensors are deployed on the chip area to provide realtime temperature feedback for thermal management techniques. Thermal sensor accuracy is extremely prone to intra-die process variation and aging phenomena. These factors lead to a situation where thermal sensor values drift from the nominal values. This necessitates efficient calibration techniques to be applied before the sensor values are used. In addition, in modern many-core systems cores have support for dynamic voltage and frequency scaling. Thermal sensors located on cores are sensitive to the core's current voltage level, meaning that dedicated calibration is needed for each voltage level. In this thesis a general-purpose software-based auto-calibration approach is also proposed for thermal sensors to calibrate thermal sensors on different range of voltages.

Parallel Dynamic Optimization of Steel Risers

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper addresses the use of optimization techniques in the design of a steel riser. Two methods are used: the genetic algorithm, which imitates the process of natural selection, and the simulated annealing, which is based on the process of annealing of a metal. Both of them are capable of searching a given solution space for the best feasible riser configuration according to predefined criteria. Optimization issues are discussed, such as problem codification, parameter selection, definition of objective function, and restrictions. A comparison between the results obtained for economic and structural objective functions is made for a case study. Optimization method parallelization is also addressed. [DOI: 10.1115/1.4001955]

Recycling Krylov subspaces for efficient large-scale electrical impedance tomography

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Electrical impedance tomography (EIT) captures images of internal features of a body. Electrodes are attached to the boundary of the body, low intensity alternating currents are applied, and the resulting electric potentials are measured. Then, based on the measurements, an estimation algorithm obtains the three-dimensional internal admittivity distribution that corresponds to the image. One of the main goals of medical EIT is to achieve high resolution and an accurate result at low computational cost. However, when the finite element method (FEM) is employed and the corresponding mesh is refined to increase resolution and accuracy, the computational cost increases substantially, especially in the estimation of absolute admittivity distributions. Therefore, we consider in this work a fast iterative solver for the forward problem, which was previously reported in the context of structural optimization. We propose several improvements to this solver to increase its performance in the EIT context. The solver is based on the recycling of approximate invariant subspaces, and it is applied to reduce the EIT computation time for a constant and high resolution finite element mesh. In addition, we consider a powerful preconditioner and provide a detailed pseudocode for the improved iterative solver. The numerical results show the effectiveness of our approach: the proposed algorithm is faster than the preconditioned conjugate gradient (CG) algorithm. The results also show that even on a standard PC without parallelization, a high mesh resolution (more than 150,000 degrees of freedom) can be used for image estimation at a relatively low computational cost. (C) 2010 Elsevier B.V. All rights reserved.

XSophe-Sophe-XeprView (R). A computer simulation software suite (v. 1.1.3) for the analysis of continuous wave EPR spectra

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The XSophe-Sophe-XeprView((R)) computer simulation software suite enables scientists to easily determine spin Hamiltonian parameters from isotropic, randomly oriented and single crystal continuous wave electron paramagnetic resonance (CW EPR) spectra from radicals and isolated paramagnetic metal ion centers or clusters found in metalloproteins, chemical systems and materials science. XSophe provides an X-windows graphical user interface to the Sophe programme and allows: creation of multiple input files, local and remote execution of Sophe, the display of sophelog (output from Sophe) and input parameters/files. Sophe is a sophisticated computer simulation software programme employing a number of innovative technologies including; the Sydney OPera HousE (SOPHE) partition and interpolation schemes, a field segmentation algorithm, the mosaic misorientation linewidth model, parallelization and spectral optimisation. In conjunction with the SOPHE partition scheme and the field segmentation algorithm, the SOPHE interpolation scheme and the mosaic misorientation linewidth model greatly increase the speed of simulations for most spin systems. Employing brute force matrix diagonalization in the simulation of an EPR spectrum from a high spin Cr(III) complex with the spin Hamiltonian parameters g(e) = 2.00, D = 0.10 cm(-1), E/D = 0.25, A(x) = 120.0, A(y) = 120.0, A(z) = 240.0 x 10(-4) cm(-1) requires a SOPHE grid size of N = 400 (to produce a good signal to noise ratio) and takes 229.47 s. In contrast the use of either the SOPHE interpolation scheme or the mosaic misorientation linewidth model requires a SOPHE grid size of only N = 18 and takes 44.08 and 0.79 s, respectively. Results from Sophe are transferred via the Common Object Request Broker Architecture (CORBA) to XSophe and subsequently to XeprView((R)) where the simulated CW EPR spectra (1D and 2D) can be compared to the experimental spectra. Energy level diagrams, transition roadmaps and transition surfaces aid the interpretation of complicated randomly oriented CW EPR spectra and can be viewed with a web browser and an OpenInventor scene graph viewer.

Theory manual to OctVCE – a cartesian cell CFD code with special application to blast wave problems, Department of Mechanical Engineering Report 2007/12

Relevância:

10.00% 10.00%

Publicador:

Resumo:

OctVCE is a cartesian cell CFD code produced especially for numerical simulations of shock and blast wave interactions with complex geometries, in particular, from explosions. Virtual Cell Embedding (VCE) was chosen as its cartesian cell kernel for its simplicity and sufficiency for practical engineering design problems. The code uses a finite-volume formulation of the unsteady Euler equations with a second order explicit Runge-Kutta Godonov (MUSCL) scheme. Gradients are calculated using a least-squares method with a minmod limiter. Flux solvers used are AUSM, AUSMDV and EFM. No fluid-structure coupling or chemical reactions are allowed, but gas models can be perfect gas and JWL or JWLB for the explosive products. This report also describes the code’s ‘octree’ mesh adaptive capability and point-inclusion query procedures for the VCE geometry engine. Finally, some space will also be devoted to describing code parallelization using the shared-memory OpenMP paradigm. The user manual to the code is to be found in the companion report 2007/13.

Practical parallel coset enumeration

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coset enumeration is a most important procedure for investigating finitely presented groups. We present a practical parallel procedure for coset enumeration on shared memory processors. The shared memory architecture is particularly interesting because such parallel computation is both faster and cheaper. The lower cost comes when the program requires large amounts of memory, and additional CPU's. allow us to lower the time that the expensive memory is being used. Rather than report on a suite of test cases, we take a single, typical case, and analyze the performance factors in-depth. The parallelization is achieved through a master-slave architecture. This results in an interesting phenomenon, whereby the CPU time is divided into a sequential and a parallel portion, and the parallel part demonstrates a speedup that is linear in the number of processors. We describe an early version for which only 40% of the program was parallelized, and we describe how this was modified to achieve 90% parallelization while using 15 slave processors and a master. In the latter case, a sequential time of 158 seconds was reduced to 29 seconds using 15 slaves.

The extended unsymmetric frontal solution for multiple-point constraints

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The purpose of this paper is to discuss the linear solution of equality constrained problems by using the Frontal solution method without explicit assembling. Design/methodology/approach - Re-written frontal solution method with a priori pivot and front sequence. OpenMP parallelization, nearly linear (in elimination and substitution) up to 40 threads. Constraints enforced at the local assembling stage. Findings - When compared with both standard sparse solvers and classical frontal implementations, memory requirements and code size are significantly reduced. Research limitations/implications - Large, non-linear problems with constraints typically make use of the Newton method with Lagrange multipliers. In the context of the solution of problems with large number of constraints, the matrix transformation methods (MTM) are often more cost-effective. The paper presents a complete solution, with topological ordering, for this problem. Practical implications - A complete software package in Fortran 2003 is described. Examples of clique-based problems are shown with large systems solved in core. Social implications - More realistic non-linear problems can be solved with this Frontal code at the core of the Newton method. Originality/value - Use of topological ordering of constraints. A-priori pivot and front sequences. No need for symbolic assembling. Constraints treated at the core of the Frontal solver. Use of OpenMP in the main Frontal loop, now quantified. Availability of Software.

Suporte à computação paralela e distribuída em Java: distribuição e execução de trabalho

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nos últimos anos começaram a ser vulgares os computadores dotados de multiprocessadores e multi-cores. De modo a aproveitar eficientemente as novas características desse hardware começaram a surgir ferramentas para facilitar o desenvolvimento de software paralelo, através de linguagens e frameworks, adaptadas a diferentes linguagens. Com a grande difusão de redes de alta velocidade, tal como Gigabit Ethernet e a última geração de redes Wi-Fi, abre-se a oportunidade de, além de paralelizar o processamento entre processadores e cores, poder em simultâneo paralelizá-lo entre máquinas diferentes. Ao modelo que permite paralelizar processamento localmente e em simultâneo distribuí-lo para máquinas que também têm capacidade de o paralelizar, chamou-se “modelo paralelo distribuído”. Nesta dissertação foram analisadas técnicas e ferramentas utilizadas para fazer programação paralela e o trabalho que está feito dentro da área de programação paralela e distribuída. Tendo estes dois factores em consideração foi proposta uma framework que tenta aplicar a simplicidade da programação paralela ao conceito paralelo distribuído. A proposta baseia-se na disponibilização de uma framework em Java com uma interface de programação simples, de fácil aprendizagem e legibilidade que, de forma transparente, é capaz de paralelizar e distribuir o processamento. Apesar de simples, existiu um esforço para a tornar configurável de forma a adaptar-se ao máximo de situações possível. Nesta dissertação serão exploradas especialmente as questões relativas à execução e distribuição de trabalho, e a forma como o código é enviado de forma automática pela rede, para outros nós cooperantes, evitando assim a instalação manual das aplicações em todos os nós da rede. Para confirmar a validade deste conceito e das ideias defendidas nesta dissertação foi implementada esta framework à qual se chamou DPF4j (Distributed Parallel Framework for JAVA) e foram feitos testes e retiradas métricas para verificar a existência de ganhos de performance em relação às soluções já existentes.

A pattern language for parallelizing irregular algorithms

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática

A Multi-DAG Model for Real-Time Parallel Applications with Conditional Execution

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The 30th ACM/SIGAPP Symposium On Applied Computing (SAC 2015). 13 to 17, Apr, 2015, Embedded Systems. Salamanca, Spain.

Avaliação dos mecanismos de concorrência na API do Java 8

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As plataformas com múltiplos núcleos tornaram a programação paralela/concorrente num tópico de interesse geral. Diversos modelos de programação têm vindo a ser propostos, facilitando aos programadores a identificação de regiões de código potencialmente paralelizáveis, deixando ao sistema operativo a tarefa de as escalonar dinamicamente em tempo de execução, explorando o maior grau possível de paralelismo. O Java não foge a esta tendência, disponibilizando ao programador um número crescente de bibliotecas de mecanismos de sincronização e paralelização de código. Neste contexto, esta tese apresenta e discute um conjunto de resultados obtidos através de testes intensivos à eficiência de algoritmos de ordenação implementados com recurso aos mecanismos de concorrência da API do Java 8 (Threads, Threadpools, ExecutorService, CountdownLach, ExecutorCompletionService e ForkJoinPools) em sistemas com um número de núcleos variável. Para cada um dos mecanismos, são apresentadas conclusões sobre o seu funcionamento e discutidos os cenários em que o seu uso pode ser rentabilizado de modo a serem obtidos melhores tempos de execução.

«
1
2
3
4
5
6
7
8
9
10
11
»