999 resultados para data-parallelism


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract is not available.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Much work has been done in the áreas of and-parallelism and data parallelism in Logic Programs. Such work has proceeded to a certain extent in an independent fashion. Both types of parallelism offer advantages and disadvantages. Traditional (and-) parallel models offer generality, being able to exploit parallelism in a large class of programs (including that exploited by data parallelism techniques). Data parallelism techniques on the other hand offer increased performance for a restricted class of programs. The thesis of this paper is that these two forms of parallelism are not fundamentally different and that relating them opens the possibility of obtaining the advantages of both within the same system. Some relevant issues are discussed and solutions proposed. The discussion is illustrated through visualizations of actual parallel executions implementing the ideas proposed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este documento refleja el estudio de investigación para la detección de factores que afectan al rendimiento en entornos multicore. Debido a la gran diversidad de arquitecturas multicore se ha definido un marco de trabajo, que consiste en la adopción de una arquitectura específica, un modelo de programación basado en paralelismo de datos, y aplicaciones del tipo Single Program Multiple Data. Una vez definido el marco de trabajo, se han evaluado los factores de rendimiento con especial atención al modelo de programación. Por este motivo, se ha analizado la librería de threads y la API OpenMP para detectar aquellas funciones sensibles de ser sintonizadas al permitir un comportamiento adaptativo de la aplicación al entorno, y que dependiendo de su adecuada utilización han de mejorar el rendimiento de la aplicación.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a bridge between two important parallel programming paradigms: data parallelism and communicating sequential processes (CSP). Data parallel pipelined architectures obtained with the Alpha language can be embedded in a control intensive application expressed in CSP-based Handel formalism. The interface is formally defined from the semantics of the languages Alpha and Handel. This work will ease the design of compute intensive applications on FPGAs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L’obiettivo di questa tesi `e l’estensione della conoscenza di un argomento già ampliamente conosciuto e ricercato. Questo lavoro focalizza la propria attenzione su una nicchia dell’ampio mondo della virtualizzazione, del machine learning e delle tecniche di apprendimento parallelo. Nella prima parte verranno spiegati alcuni concetti teorici chiave per la virtualizzazione, ponendo una maggior attenzione verso argomenti di maggior importanza per questo lavoro. La seconda parte si propone di illustrare, in modo teorico, le tecniche usate nelle fasi di training di reti neurali. La terza parte, attraverso una parte progettuale, analizza le diverse tecniche individuate applicandole ad un ambiente containerizzato.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: In premolar extraction cases, root parallelism is recommended to preserve the stability of space closures. The influence of the degree of root parallelism on relapse of tooth extraction spaces has been a controversial topic in the literature. The aim of this study was to compare the angle between the long axes of the canine and the second premolarin patients with and without stability of extraction-space closures. Methods: A sample of 56 patients, treated with 4 premolar extractions, was divided into 2 groups: group 1, consisting of 25 patients with reopening of extraction spaces; and group 2, consisting of 31 patients without reopening of extraction spaces. Panoramic radiographs of each patient were analyzed at the posttreatment and 1-year posttreatment stages. The data were statistically analyzed by using chi-square tests, t tests, analysis of variance (ANOVA), and Pearson correlation coefficients. Results: The results showed that the groups did not differ regarding the angle between the canine and the second premolar, and there was no correlation between angular changes and reopening of extraction spaces, showing that dental angular changes are not determining factors for relapse, and other factors should be investigated. Conclusions: The final angle and the posttreatment changes observed in the angle between the long axes of the canine and the second premolar showed no influence on the relapse of extraction spaces. (Am J Orthod Dentofacial Orthop 2011; 139: e505-e510)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The simulated annealing approach to crystal structure determination from powder diffraction data, as implemented in the DASH program, is readily amenable to parallelization at the individual run level. Very large scale increases in speed of execution can be achieved by distributing individual DASH runs over a network of computers. The CDASH program delivers this by using scalable on-demand computing clusters built on the Amazon Elastic Compute Cloud service. By way of example, a 360 vCPU cluster returned the crystal structure of racemic ornidazole (Z0 = 3, 30 degrees of freedom) ca 40 times faster than a typical modern quad-core desktop CPU. Whilst used here specifically for DASH, this approach is of general applicability to other packages that are amenable to coarse-grained parallelism strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article we explore the NVIDIA graphical processing units (GPU) computational power in cryptography using CUDA (Compute Unified Device Architecture) technology. CUDA makes the general purpose computing easy using the parallel processing presents in GPUs. To do this, the NVIDIA GPUs architectures and CUDA are presented, besides cryptography concepts. Furthermore, we do the comparison between the versions executed in CPU with the parallel version of the cryptography algorithms Advanced Encryption Standard (AES) and Message-digest Algorithm 5 (MD5) wrote in CUDA. © 2011 AISTI.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although studies of a number of parallel implementations of logic programming languages are now available, their results are difficult to interpret due to the multiplicity of factors involved, the effect of each of which is difficult to sepárate. In this paper we present the results of a high-level simulation study of or- and independent and-parallelism with a wide selection of Prolog programs that aims to determine the intrinsic amount of parallelism, independently of implementation factors, thus facilitating this separation. We expect this study will be instrumental in better understanding and comparing results from actual implementations, as shown by some examples provided in the paper. In addition, the paper examines some of the issues and tradeoffs associated with the combination of and- and or-parallelism and proposes reasonable solutions based on the simulation data obtained.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The &-Prolog system, a practical implementation of a parallel execution niodel for Prolog exploiting strict and non-strict independent and-parallelism, is described. Both automatic and manual parallelization of programs is supported. This description includes a summary of the system's language and architecture, some details of its execution model (based on the RAP-WAM model), and data on its performance on sequential workstations and shared memory multiprocessors, which is compared to that of current Prolog systems. The results to date show significant speed advantages over state-of-the-art sequential systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although studies of a number of parallel implementations of logic programming languages are now available, the results are difficult to interpret due to the multiplicity of factors involved, the effect of each of which is difficult to sepárate. In this paper we present the results of a highlevel simulation study of or- and independent and-parallelism with a wide selection of Prolog programs that aims to facilítate this separation. We hope this study will be instrumental in better understanding and comparing results from actual implementations, as shown by an example in the paper. In addition, the paper examines some of the issues and tradeoffs associated with the combination of and- and or-parallelism and proposes reasonable solutions based on the simulation data.