Biblioteca Digital

92 resultados para Gzip OpenMP

Seafloor brightness map of the Great Barrier Reef, Australia, derived from biodiversity data

Relevância:

10.00% 10.00%

Publicador:

Veja mais

Simulated climatically disturbed emergence of agricultures in Western Eurasia 8500-3000 BC

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Was the spread of agropastoralism from the Fertile Crescent throughout Europe influenced by rapid climatic shifts? We here generate idealized climate events using palaeoclimate records. In a mathematical model of regional sociocultural development, these events disturb the subsistence base of simulated forager and farmer societies. We evaluate the regional simulated transition timings and durations against a published large set of radiocarbon dates for western Eurasia; the model is able to realistically hindcast much of the inhomogeneous space-time evolution of regional Neolithic transitions. Our study shows that the inclusion of climate events improves the simulation of typical lags between cultural complexes, but that the overall difference to a model without climate events is not significant. Climate events may not have been as important for early sociocultural dynamics as endogenous factors.

Veja mais

Simulated transition to agropastoralism in the Indus valley 7500-3000 BC

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Indus Valley Civilization (IVC) was one of the first great civilizations in prehistory. This bronze age civilization flourished from the end of the fourth millennium BC. It disintegrated during the second millennium BC; despite much research effort, this decline is not well understood. Less research has been devoted to the emergence of the IVC, which shows continuous cultural precursors since at least the seventh millennium BC. To understand the decline, we believe it is necessary to investigate the rise of the IVC, i.e., the establishment of agriculture and livestock, dense populations and technological developments 7000-3000 BC. Although much archaeologically typed information is available, our capability to investigate the system is hindered by poorly resolved chronology, and by a lack of field work in the intermediate areas between the Indus valley and Mesopotamia. We thus employ a complementary numerical simulation to develop a consistent picture of technology, agropastoralism and population developments in the IVC domain. Results from this Global Land Use and technological Evolution Simulator show that there is (1) fair agreement between the simulated timing of the agricultural transition and radiocarbon dates from early agricultural sites, but the transition is simulated first in India then Pakistan; (2) an independent agropas- toralism developing on the Indian subcontinent; and (3) a positive relationship between archeological artifact richness and simulated population density which remains to be quantified.

Veja mais

Bathymetric map of Heron Reef, Australia, derived from airborne hyperspectral data at 1 m resolution

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A simple method for efficient inversion of arbitrary radiative transfer models for image analysis is presented. The method operates by representing the shape of the function that maps model parameters to spectral reflectance by an adaptive look-up tree (ALUT) that evenly distributes the discretization error of tabulated reflectances in spectral space. A post-processing step organizes the data into a binary space partitioning tree that facilitates an efficient inversion search algorithm. In an example shallow water remote sensing application, the method performs faster than an implementation of previously published methodology and has the same accuracy in bathymetric retrievals. The method has no user configuration parameters requiring expert knowledge and minimizes the number of forward model runs required, making it highly suitable for routine operational implementation of image analysis methods. For the research community, straightforward and robust inversion allows research to focus on improving the radiative transfer models themselves without the added complication of devising an inversion strategy.

Veja mais

On the automatic integration of hardware accelerators into FPGA-based embedded systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes an automatic framework for the seamless integration of hardware accelerators, starting from an OpenMP-based application and an XML file describing the HW/SW partitioning. It extends a fully software architecture by generating and integrating the cores, along with the proper interfaces, and the code for scheduling and synchronization. Experimental results show that it is possible to validate different solutions only by varying the input code.

Veja mais

Simulación en tiempo real de vehículos industriales con modelos multicuerpo de gran complejidad

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El objetivo de esta Tesis ha sido la consecución de simulaciones en tiempo real de vehículos industriales modelizados como sistemas multicuerpo complejos formados por sólidos rígidos. Para el desarrollo de un programa de simulación deben considerarse cuatro aspectos fundamentales: la modelización del sistema multicuerpo (tipos de coordenadas, pares ideales o impuestos mediante fuerzas), la formulación a utilizar para plantear las ecuaciones diferenciales del movimiento (coordenadas dependientes o independientes, métodos globales o topológicos, forma de imponer las ecuaciones de restricción), el método de integración numérica para resolver estas ecuaciones en el tiempo (integradores explícitos o implícitos) y finalmente los detalles de la implementación realizada (lenguaje de programación, librerías matemáticas, técnicas de paralelización). Estas cuatro etapas están interrelacionadas entre sí y todas han formado parte de este trabajo. Desde la generación de modelos de una furgoneta y de camión con semirremolque, el uso de tres formulaciones dinámicas diferentes, la integración de las ecuaciones diferenciales del movimiento mediante métodos explícitos e implícitos, hasta el uso de funciones BLAS, de técnicas de matrices sparse y la introducción de paralelización para utilizar los distintos núcleos del procesador. El trabajo presentado en esta Tesis ha sido organizado en 8 capítulos, dedicándose el primero de ellos a la Introducción. En el Capítulo 2 se presentan dos formulaciones semirrecursivas diferentes, de las cuales la primera está basada en una doble transformación de velocidades, obteniéndose las ecuaciones diferenciales del movimiento en función de las aceleraciones relativas independientes. La integración numérica de estas ecuaciones se ha realizado con el método de Runge-Kutta explícito de cuarto orden. La segunda formulación está basada en coordenadas relativas dependientes, imponiendo las restricciones por medio de penalizadores en posición y corrigiendo las velocidades y aceleraciones mediante métodos de proyección. En este segundo caso la integración de las ecuaciones del movimiento se ha llevado a cabo mediante el integrador implícito HHT (Hilber, Hughes and Taylor), perteneciente a la familia de integradores estructurales de Newmark. En el Capítulo 3 se introduce la tercera formulación utilizada en esta Tesis. En este caso las uniones entre los sólidos del sistema se ha realizado mediante uniones flexibles, lo que obliga a imponer los pares por medio de fuerzas. Este tipo de uniones impide trabajar con coordenadas relativas, por lo que la posición del sistema y el planteamiento de las ecuaciones del movimiento se ha realizado utilizando coordenadas Cartesianas y parámetros de Euler. En esta formulación global se introducen las restricciones mediante fuerzas (con un planteamiento similar al de los penalizadores) y la estabilización del proceso de integración numérica se realiza también mediante proyecciones de velocidades y aceleraciones. En el Capítulo 4 se presenta una revisión de las principales herramientas y estrategias utilizadas para aumentar la eficiencia de las implementaciones de los distintos algoritmos. En primer lugar se incluye una serie de consideraciones básicas para aumentar la eficiencia numérica de las implementaciones. A continuación se mencionan las principales características de los analizadores de códigos utilizados y también las librerías matemáticas utilizadas para resolver los problemas de álgebra lineal tanto con matrices densas como sparse. Por último se desarrolla con un cierto detalle el tema de la paralelización en los actuales procesadores de varios núcleos, describiendo para ello el patrón empleado y las características más importantes de las dos herramientas propuestas, OpenMP y las TBB de Intel. Hay que señalar que las características de los sistemas multicuerpo problemas de pequeño tamaño, frecuente uso de la recursividad, y repetición intensiva en el tiempo de los cálculos con fuerte dependencia de los resultados anteriores dificultan extraordinariamente el uso de técnicas de paralelización frente a otras áreas de la mecánica computacional, tales como por ejemplo el cálculo por elementos finitos. Basándose en los conceptos mencionados en el Capítulo 4, el Capítulo 5 está dividido en tres secciones, una para cada formulación propuesta en esta Tesis. En cada una de estas secciones se describen los detalles de cómo se han realizado las distintas implementaciones propuestas para cada algoritmo y qué herramientas se han utilizado para ello. En la primera sección se muestra el uso de librerías numéricas para matrices densas y sparse en la formulación topológica semirrecursiva basada en la doble transformación de velocidades. En la segunda se describe la utilización de paralelización mediante OpenMP y TBB en la formulación semirrecursiva con penalizadores y proyecciones. Por último, se describe el uso de técnicas de matrices sparse y paralelización en la formulación global con uniones flexibles y parámetros de Euler. El Capítulo 6 describe los resultados alcanzados mediante las formulaciones e implementaciones descritas previamente. Este capítulo comienza con una descripción de la modelización y topología de los dos vehículos estudiados. El primer modelo es un vehículo de dos ejes del tipo chasis-cabina o furgoneta, perteneciente a la gama de vehículos de carga medianos. El segundo es un vehículo de cinco ejes que responde al modelo de un camión o cabina con semirremolque, perteneciente a la categoría de vehículos industriales pesados. En este capítulo además se realiza un estudio comparativo entre las simulaciones de estos vehículos con cada una de las formulaciones utilizadas y se presentan de modo cuantitativo los efectos de las mejoras alcanzadas con las distintas estrategias propuestas en esta Tesis. Con objeto de extraer conclusiones más fácilmente y para evaluar de un modo más objetivo las mejoras introducidas en la Tesis, todos los resultados de este capítulo se han obtenido con el mismo computador, que era el top de la gama Intel Xeon en 2007, pero que hoy día está ya algo obsoleto. Por último los Capítulos 7 y 8 están dedicados a las conclusiones finales y las futuras líneas de investigación que pueden derivar del trabajo realizado en esta Tesis. Los objetivos de realizar simulaciones en tiempo real de vehículos industriales de gran complejidad han sido alcanzados con varias de las formulaciones e implementaciones desarrolladas. ABSTRACT The objective of this Dissertation has been the achievement of real time simulations of industrial vehicles modeled as complex multibody systems made up by rigid bodies. For the development of a simulation program, four main aspects must be considered: the modeling of the multibody system (types of coordinates, ideal joints or imposed by means of forces), the formulation to be used to set the differential equations of motion (dependent or independent coordinates, global or topological methods, ways to impose constraints equations), the method of numerical integration to solve these equations in time (explicit or implicit integrators) and the details of the implementation carried out (programming language, mathematical libraries, parallelization techniques). These four stages are interrelated and all of them are part of this work. They involve the generation of models for a van and a semitrailer truck, the use of three different dynamic formulations, the integration of differential equations of motion through explicit and implicit methods, the use of BLAS functions and sparse matrix techniques, and the introduction of parallelization to use the different processor cores. The work presented in this Dissertation has been structured in eight chapters, the first of them being the Introduction. In Chapter 2, two different semi-recursive formulations are shown, of which the first one is based on a double velocity transformation, thus getting the differential equations of motion as a function of the independent relative accelerations. The numerical integration of these equations has been made with the Runge-Kutta explicit method of fourth order. The second formulation is based on dependent relative coordinates, imposing the constraints by means of position penalty coefficients and correcting the velocities and accelerations by projection methods. In this second case, the integration of the motion equations has been carried out by means of the HHT implicit integrator (Hilber, Hughes and Taylor), which belongs to the Newmark structural integrators family. In Chapter 3, the third formulation used in this Dissertation is presented. In this case, the joints between the bodies of the system have been considered as flexible joints, with forces used to impose the joint conditions. This kind of union hinders to work with relative coordinates, so the position of the system bodies and the setting of the equations of motion have been carried out using Cartesian coordinates and Euler parameters. In this global formulation, constraints are introduced through forces (with a similar approach to the penalty coefficients) are presented. The stabilization of the numerical integration is carried out also by velocity and accelerations projections. In Chapter 4, a revision of the main computer tools and strategies used to increase the efficiency of the implementations of the algorithms is presented. First of all, some basic considerations to increase the numerical efficiency of the implementations are included. Then the main characteristics of the code’ analyzers used and also the mathematical libraries used to solve linear algebra problems (both with dense and sparse matrices) are mentioned. Finally, the topic of parallelization in current multicore processors is developed thoroughly. For that, the pattern used and the most important characteristics of the tools proposed, OpenMP and Intel TBB, are described. It needs to be highlighted that the characteristics of multibody systems small size problems, frequent recursion use and intensive repetition along the time of the calculation with high dependencies of the previous results complicate extraordinarily the use of parallelization techniques against other computational mechanics areas, as the finite elements computation. Based on the concepts mentioned in Chapter 4, Chapter 5 is divided into three sections, one for each formulation proposed in this Dissertation. In each one of these sections, the details of how these different proposed implementations have been made for each algorithm and which tools have been used are described. In the first section, it is shown the use of numerical libraries for dense and sparse matrices in the semirecursive topological formulation based in the double velocity transformation. In the second one, the use of parallelization by means OpenMP and TBB is depicted in the semi-recursive formulation with penalization and projections. Lastly, the use of sparse matrices and parallelization techniques is described in the global formulation with flexible joints and Euler parameters. Chapter 6 depicts the achieved results through the formulations and implementations previously described. This chapter starts with a description of the modeling and topology of the two vehicles studied. The first model is a two-axle chassis-cabin or van like vehicle, which belongs to the range of medium charge vehicles. The second one is a five-axle vehicle belonging to the truck or cabin semi-trailer model, belonging to the heavy industrial vehicles category. In this chapter, a comparative study is done between the simulations of these vehicles with each one of the formulations used and the improvements achieved are presented in a quantitative way with the different strategies proposed in this Dissertation. With the aim of deducing the conclusions more easily and to evaluate in a more objective way the improvements introduced in the Dissertation, all the results of this chapter have been obtained with the same computer, which was the top one among the Intel Xeon range in 2007, but which is rather obsolete today. Finally, Chapters 7 and 8 are dedicated to the final conclusions and the future research projects that can be derived from the work presented in this Dissertation. The objectives of doing real time simulations in high complex industrial vehicles have been achieved with the formulations and implementations developed.

Veja mais

Efficient parallelization of a regional ocean model for the western Mediterranean Sea

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper focuses on the parallelization of an ocean model applying current multicore processor-based cluster architectures to an irregular computational mesh. The aim is to maximize the efficiency of the computational resources used. To make the best use of the resources offered by these architectures, this parallelization has been addressed at all the hardware levels of modern supercomputers: firstly, exploiting the internal parallelism of the CPU through vectorization; secondly, taking advantage of the multiple cores of each node using OpenMP; and finally, using the cluster nodes to distribute the computational mesh, using MPI for communication within the nodes. The speedup obtained with each parallelization technique as well as the combined overall speedup have been measured for the western Mediterranean Sea for different cluster configurations, achieving a speedup factor of 73.3 using 256 processors. The results also show the efficiency achieved in the different cluster nodes and the advantages obtained by combining OpenMP and MPI versus using only OpenMP or MPI. Finally, the scalability of the model has been analysed by examining computation and communication times as well as the communication and synchronization overhead due to parallelization.

Veja mais

A parallel method for impulsive image noise removal on hybrid CPU/GPU systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A parallel algorithm for image noise removal is proposed. The algorithm is based on peer group concept and uses a fuzzy metric. An optimization study on the use of the CUDA platform to remove impulsive noise using this algorithm is presented. Moreover, an implementation of the algorithm on multi-core platforms using OpenMP is presented. Performance is evaluated in terms of execution time and a comparison of the implementation parallelised in multi-core, GPUs and the combination of both is conducted. A performance analysis with large images is conducted in order to identify the amount of pixels to allocate in the CPU and GPU. The observed time shows that both devices must have work to do, leaving the most to the GPU. Results show that parallel implementations of denoising filters on GPUs and multi-cores are very advisable, and they open the door to use such algorithms for real-time processing.

Veja mais

Parallel relaxed and extrapolated algorithms for computing PageRank

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, parallel Relaxed and Extrapolated algorithms based on the Power method for accelerating the PageRank computation are presented. Different parallel implementations of the Power method and the proposed variants are analyzed using different data distribution strategies. The reported experiments show the behavior and effectiveness of the designed algorithms for realistic test data using either OpenMP, MPI or an hybrid OpenMP/MPI approach to exploit the benefits of shared memory inside the nodes of current SMP supercomputers.

Veja mais

Image Noise Removal on Heterogeneous CPU-GPU Configurations

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A parallel algorithm to remove impulsive noise in digital images using heterogeneous CPU/GPU computing is proposed. The parallel denoising algorithm is based on the peer group concept and uses an Euclidean metric. In order to identify the amount of pixels to be allocated in multi-core and GPUs, a performance analysis using large images is presented. A comparison of the parallel implementation in multi-core, GPUs and a combination of both is performed. Performance has been evaluated in terms of execution time and Megapixels/second. We present several optimization strategies especially effective for the multi-core environment, and demonstrate significant performance improvements. The main advantage of the proposed noise removal methodology is its computational speed, which enables efficient filtering of color images in real-time applications.

Veja mais

Parallel implementation of stochastic simulation for large scale cellular processes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Experimental and theoretical studies have shown the importance of stochastic processes in genetic regulatory networks and cellular processes. Cellular networks and genetic circuits often involve small numbers of key proteins such as transcriptional factors and signaling proteins. In recent years stochastic models have been used successfully for studying noise in biological pathways, and stochastic modelling of biological systems has become a very important research field in computational biology. One of the challenge problems in this field is the reduction of the huge computing time in stochastic simulations. Based on the system of the mitogen-activated protein kinase cascade that is activated by epidermal growth factor, this work give a parallel implementation by using OpenMP and parallelism across the simulation. Special attention is paid to the independence of the generated random numbers in parallel computing, that is a key criterion for the success of stochastic simulations. Numerical results indicate that parallel computers can be used as an efficient tool for simulating the dynamics of large-scale genetic regulatory networks and cellular processes

Veja mais

Um algoritmo paralelo eficiente de migração reversa no tempo (rtm) 3d com granularidade fina

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The reverse time migration algorithm (RTM) has been widely used in the seismic industry to generate images of the underground and thus reduce the risk of oil and gas exploration. Its widespread use is due to its high quality in underground imaging. The RTM is also known for its high computational cost. Therefore, parallel computing techniques have been used in their implementations. In general, parallel approaches for RTM use a coarse granularity by distributing the processing of a subset of seismic shots among nodes of distributed systems. Parallel approaches with coarse granularity for RTM have been shown to be very efficient since the processing of each seismic shot can be performed independently. For this reason, RTM algorithm performance can be considerably improved by using a parallel approach with finer granularity for the processing assigned to each node. This work presents an efficient parallel algorithm for 3D reverse time migration with fine granularity using OpenMP. The propagation algorithm of 3D acoustic wave makes up much of the RTM. Different load balancing were analyzed in order to minimize possible losses parallel performance at this stage. The results served as a basis for the implementation of other phases RTM: backpropagation and imaging condition. The proposed algorithm was tested with synthetic data representing some of the possible underground structures. Metrics such as speedup and efficiency were used to analyze its parallel performance. The migrated sections show that the algorithm obtained satisfactory performance in identifying subsurface structures. As for the parallel performance, the analysis clearly demonstrate the scalability of the algorithm achieving a speedup of 22.46 for the propagation of the wave and 16.95 for the RTM, both with 24 threads.

Veja mais

Ein Programm für die Parallelisierung dynamisch adaptiver Mehrgitterverfahren

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In dieser Arbeit werden dynamisch adaptive Mehrgitterverfahren parallelisiert. Bei dynamisch adaptiven Mehrgitterverfahren wird ein Gebiet mit einem Gitter überdeckt, und auf diesem Gitter wird gerechnet, indem Gitterpunkte in der Umgebung herangezogen werden, um den Wert des nächsten Zeitpunktes zu bestimmen. Dann werden gröbere und feinere Gitter erzeugt und verwendet, wobei die feineren Gitter sich auf Teilgebiete konzentrieren. Diese Teilgebiete ändern sich im Verlauf der Zeit. Durch die Verwendung der zusätzlichen Gitter werden die numerischen Eigenschaften verbessert. Die Parallelisierung solcher Verfahren geschieht in der Regel durch Bisektion. In der vorliegenden Arbeit wird die Umverteilung der Gebiete realisiert, indem Mengen von einzelnen Gitterpunkten verschickt werden. Das ist ein Scheduling-Verfahren. Die Mehrgitterstrukturen sind so aufgebaut, dass fast beliebige Gitterpunktverteilungen auf den Gitterebenen vorliegen können. Die Strukturen werden einmal erzeugt, und nur bei Bedarf geändert, sodass keine Speicherallokationen während der Iterationen nötig sind. Neben dem Gitter sind zusätzliche Strukturen, wie zum Beispiel die Randstrukturen, erforderlich. Eine Struktur Farbenfeld verzeichnet, auf welchem Kern sich ein Außenrandpunkt befindet. In der parallelen adaptiven Verfeinerung werden für einzelne durch ein Entscheidungskriterium ausgewählte Gitterpunkte 5 x 5 Punktüberdeckungen vorgenommen. Dazu werden die verfügbaren Entscheidungsinformationen zur Bestimmung von komplexeren Strukturen herangezogen. Damit muss das Verfeinerungsgitter nicht komplett abgebaut und dann wieder aufgebaut werden, sondern nur die Änderungen am Gitter sind vorzunehmen. Das spart viel Berechnungszeit. Der letzte Schritt besteht darin, den Lastausgleich durchzuführen. Zunächst werden die Lasttransferwerte bestimmt, die angeben, wie viele Gitterpunkte von wo nach wo zu verschicken sind. Das geschieht mit Hilfe einer PLB genannten Methode bzw. einer Variante. PLB wurde bisher vor allem für kombinatorische Probleme eingesetzt. Dann erfolgt eine Auswahl der zu verschickenden Gitterpunkte mit einer Strategie, welche Punkte eines Kerns zu welchen Nachbarkernen transferiert werden sollen. Im letzten Schritt werden schließlich die ausgewählten Punkte migriert, wobei alle Gitterpunktstrukturen umgebaut werden und solche Informationen gepackt werden müssen, sodass ein Umbau seiner Gitterpunktstrukturen bei dem Empfänger möglich wird. Neben den Gitterpunktstrukturen müssen auch Strukturen für die parallele adaptive Verfeinerung verändert werden. Es muss ein Weiterverschicken von Gitterpunkten möglich sein, wenn über die Lastkanten in mehreren Runden Last verschickt wird. Während des Lastausgleichs wird noch Arbeit durch eine Struktur Zwischenkorrektur durchgeführt, die es ermöglicht, das Farbenfeld intakt zu halten, wenn benachbarte Gitterpunkte gleichzeitig verschickt werden.

Veja mais

GPRM: a high performance programming framework for manycore processors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.

Veja mais

Problemas de corte : métodos exactos y aproximados para formulaciones mono y multi-objetivo

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Los problemas de corte y empaquetado son una familia de problemas de optimización combinatoria que han sido ampliamente estudiados en numerosas áreas de la industria y la investigación, debido a su relevancia en una enorme variedad de aplicaciones reales. Son problemas que surgen en muchas industrias de producción donde se debe realizar la subdivisión de un material o espacio disponible en partes más pequeñas. Existe una gran variedad de métodos para resolver este tipo de problemas de optimización. A la hora de proponer un método de resolución para un problema de optimización, es recomendable tener en cuenta el enfoque y las necesidades que se tienen en relación al problema y su solución. Las aproximaciones exactas encuentran la solución óptima, pero sólo es viable aplicarlas a instancias del problema muy pequeñas. Las heurísticas manejan conocimiento específico del problema para obtener soluciones de alta calidad sin necesitar un excesivo esfuerzo computacional. Por otra parte, las metaheurísticas van un paso más allá, ya que son capaces de resolver una clase muy general de problemas computacionales. Finalmente, las hiperheurísticas tratan de automatizar, normalmente incorporando técnicas de aprendizaje, el proceso de selección, combinación, generación o adaptación de heurísticas más simples para resolver eficientemente problemas de optimización. Para obtener lo mejor de estos métodos se requiere conocer, además del tipo de optimización (mono o multi-objetivo) y el tamaño del problema, los medios computacionales de los que se dispone, puesto que el uso de máquinas e implementaciones paralelas puede reducir considerablemente los tiempos para obtener una solución. En las aplicaciones reales de los problemas de corte y empaquetado en la industria, la diferencia entre usar una solución obtenida rápidamente y usar propuestas más sofisticadas para encontrar la solución óptima puede determinar la supervivencia de la empresa. Sin embargo, el desarrollo de propuestas más sofisticadas y efectivas normalmente involucra un gran esfuerzo computacional, que en las aplicaciones reales puede provocar una reducción de la velocidad del proceso de producción. Por lo tanto, el diseño de propuestas efectivas y, al mismo tiempo, eficientes es fundamental. Por esta razón, el principal objetivo de este trabajo consiste en el diseño e implementación de métodos efectivos y eficientes para resolver distintos problemas de corte y empaquetado. Además, si estos métodos se definen como esquemas lo más generales posible, se podrán aplicar a diferentes problemas de corte y empaquetado sin realizar demasiados cambios para adaptarlos a cada uno. Así, teniendo en cuenta el amplio rango de metodologías de resolución de problemas de optimización y las técnicas disponibles para incrementar su eficiencia, se han diseñado e implementado diversos métodos para resolver varios problemas de corte y empaquetado, tratando de mejorar las propuestas existentes en la literatura. Los problemas que se han abordado han sido: el Two-Dimensional Cutting Stock Problem, el Two-Dimensional Strip Packing Problem, y el Container Loading Problem. Para cada uno de estos problemas se ha realizado una amplia y minuciosa revisión bibliográfica, y se ha obtenido la solución de las distintas variantes escogidas aplicando diferentes métodos de resolución: métodos exactos mono-objetivo y paralelizaciones de los mismos, y métodos aproximados multi-objetivo y paralelizaciones de los mismos. Los métodos exactos mono-objetivo aplicados se han basado en técnicas de búsqueda en árbol. Por otra parte, como métodos aproximados multi-objetivo se han seleccionado unas metaheurísticas multi-objetivo, los MOEAs. Además, para la representación de los individuos utilizados por estos métodos se han empleado codificaciones directas mediante una notación postfija, y codificaciones que usan heurísticas de colocación e hiperheurísticas. Algunas de estas metodologías se han mejorado utilizando esquemas paralelos haciendo uso de las herramientas de programación OpenMP y MPI.

Veja mais

92 resultados para Gzip OpenMP

Filtro por publicador