503 resultados para algorithmic skeletons
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite its specific purpose design, they have been increasingly used for general computations with very good results. Hence, there is a growing effort from the community to seamlessly integrate this kind of devices in everyday computing. However, to fully exploit the potential of a system comprising GPUs and CPUs, these devices should be presented to the programmer as a single platform. The efficient combination of the power of CPU and GPU devices is highly dependent on each device’s characteristics, resulting in platform specific applications that cannot be ported to different systems. Also, the most efficient work balance among devices is highly dependable on the computations to be performed and respective data sizes. In this work, we propose a solution for heterogeneous environments based on the abstraction level provided by algorithmic skeletons. Our goal is to take full advantage of the power of all CPU and GPU devices present in a system, without the need for different kernel implementations nor explicit work-distribution.To that end, we extended Marrow, an algorithmic skeleton framework for multi-GPUs, to support CPU computations and efficiently balance the work-load between devices. Our approach is based on an offline training execution that identifies the ideal work balance and platform configurations for a given application and input data size. The evaluation of this work shows that the combination of CPU and GPU devices can significantly boost the performance of our benchmarks in the tested environments, when compared to GPU-only executions.
Resumo:
The Intel R Xeon PhiTM is the first processor based on Intel’s MIC (Many Integrated Cores) architecture. It is a co-processor specially tailored for data-parallel computations, whose basic architectural design is similar to the ones of GPUs (Graphics Processing Units), leveraging the use of many integrated low computational cores to perform parallel computations. The main novelty of the MIC architecture, relatively to GPUs, is its compatibility with the Intel x86 architecture. This enables the use of many of the tools commonly available for the parallel programming of x86-based architectures, which may lead to a smaller learning curve. However, programming the Xeon Phi still entails aspects intrinsic to accelerator-based computing, in general, and to the MIC architecture, in particular. In this thesis we advocate the use of algorithmic skeletons for programming the Xeon Phi. Algorithmic skeletons abstract the complexity inherent to parallel programming, hiding details such as resource management, parallel decomposition, inter-execution flow communication, thus removing these concerns from the programmer’s mind. In this context, the goal of the thesis is to lay the foundations for the development of a simple but powerful and efficient skeleton framework for the programming of the Xeon Phi processor. For this purpose we build upon Marrow, an existing framework for the orchestration of OpenCLTM computations in multi-GPU and CPU environments. We extend Marrow to execute both OpenCL and C++ parallel computations on the Xeon Phi. We evaluate the newly developed framework, several well-known benchmarks, like Saxpy and N-Body, will be used to compare, not only its performance to the existing framework when executing on the co-processor, but also to assess the performance on the Xeon Phi versus a multi-GPU environment.
Resumo:
A Digital Breast Tomosynthesis (DBT) é uma técnica que permite obter imagens mamárias 3D de alta qualidade, que só podem ser obtidas através de métodos de re-construção. Os métodos de reconstrução mais rápidos são os iterativos, sendo no en-tanto computacionalmente exigentes, necessitando de sofrer muitas optimizações. Exis-tem optimizações que usam computação paralela através da implementação em GPUs usando CUDA. Como é sabido, o desenvolvimento de programas eficientes que usam GPUs é ainda uma tarefa demorada, dado que os modelos de programação disponíveis são de baixo nível, e a portabilidade do código para outras arquitecturas não é imedia-ta. É uma mais valia poder criar programas paralelos de forma rápida, com possibili-dade de serem usados em diferentes arquitecturas, sem exigir muitos conhecimentos sobre a arquitectura subjacente e sobre os modelos de programação de baixo nível. Para resolver este problema, propomos a utilização de soluções existentes que reduzam o esforço de paralelização, permitindo a sua portabilidade, garantindo ao mesmo tempo um desempenho aceitável. Para tal, vamos utilizar um framework (FastFlow) com suporte para Algorithmic Skeletons, que tiram partido da programação paralela estruturada, capturando esquemas/padrões recorrentes que são comuns na programação paralela. O trabalho realizado centrou-se na paralelização de uma das fases de reconstru-ção da imagem 3D – geração da matriz de sistema – que é uma das mais demoradas do processo de reconstrução; esse trabalho incluiu um método de ordenação modificado em relação ao existente. Foram realizadas diferentes implementações em CPU e GPU (usando OpenMP, CUDA e FastFlow) o que permitiu comparar estes ambientes de programação em termos de facilidade de desenvolvimento e eficiência da solução. A comparação feita permite concluir que o desempenho das soluções baseadas no FastFlow não é muito diferente das tradicionais o que sugere que ferramentas deste tipo podem simplificar e agilizar a implementação de um algoritmos na área de recons-trução de imagens 3D, mantendo um bom desempenho.
Resumo:
Structured parallel programming, and in particular programming models using the algorithmic skeleton or parallel design pattern concepts, are increasingly considered to be the only viable means of supporting effective development of scalable and efficient parallel programs. Structured parallel programming models have been assessed in a number of works in the context of performance. In this paper we consider how the use of structured parallel programming models allows knowledge of the parallel patterns present to be harnessed to address both performance and energy consumption. We consider different features of structured parallel programming that may be leveraged to impact the performance/energy trade-off and we discuss a preliminary set of experiments validating our claims.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Background: Several plasma membrane transporters have been shown to mediate the cellular influx and/or efflux of iodothyronines, including the sodium-independent organic anion co-transporting polypeptide 1 (OATP1), the sodium taurocholate co-transporting polypeptide (NTCP), the L-type amino acid transporter 1 (LAT1) and 2 (LAT2), and the monocarboxylate transporter 8 (MCT8). The aim of this study was to investigate if the mRNAs of these transporters were expressed and regulated by thyroid hormone (TH) in mouse calvaria-derived osteoblastic MC3T3-E1 cells and in the fetal and postnatal bones of mice. Methods: The mRNA expression of the iodothyronine transporters was investigated with real-time polymerase chain reaction analysis in euthyroid and hypothyroid fetuses and litters of mice and in MC3T3-E1 cells treated with increasing doses of triiodothyronine (T(3); 10(-10) to 10(-6) M) or with 10(-8) M T(3) for 1-9 days. Results: MCT8, LAT1, and LAT2 mRNAs were detected in fetal and postnatal femurs and in MC3T3-E1 cells, while OATP1 and NTCP mRNAs were not. LAT1 and LAT2 mRNAs were not affected by TH status in vivo or in vitro or by the stage of bone development or osteoblast maturation (analyzed by the expression of osteocalcin and alkaline phosphatase, which are key markers of osteoblastic differentiation). In contrast, the femoral mRNA expression of MCT8 decreased significantly during post-natal development, whereas MCT8 mRNA expression increased as MC3T3-E1 cells differentiated. We also showed that MCT8 mRNA was up-regulated in the femur of hypothyroid animals, and that it was down-regulated by treatment with T(3) in MC3T3-E1 cells. Conclusions: This is the first study to demonstrate the mRNA expression of LAT1, LAT2, and MCT8 in the bone tissue of mice and in osteoblast-like cells. In addition, the pattern of MCT8 expression observed in vivo and in vitro suggests that MCT8 may be important to modulate TH effects on osteoblast differentiation and on bone development and metabolism.
Resumo:
An (n, d)-expander is a graph G = (V, E) such that for every X subset of V with vertical bar X vertical bar <= 2n - 2 we have vertical bar Gamma(G)(X) vertical bar >= (d + 1) vertical bar X vertical bar. A tree T is small if it has at most n vertices and has maximum degree at most d. Friedman and Pippenger (1987) proved that any ( n; d)- expander contains every small tree. However, their elegant proof does not seem to yield an efficient algorithm for obtaining the tree. In this paper, we give an alternative result that does admit a polynomial time algorithm for finding the immersion of any small tree in subgraphs G of (N, D, lambda)-graphs Lambda, as long as G contains a positive fraction of the edges of Lambda and lambda/D is small enough. In several applications of the Friedman-Pippenger theorem, including the ones in the original paper of those authors, the (n, d)-expander G is a subgraph of an (N, D, lambda)-graph as above. Therefore, our result suffices to provide efficient algorithms for such previously non-constructive applications. As an example, we discuss a recent result of Alon, Krivelevich, and Sudakov (2007) concerning embedding nearly spanning bounded degree trees, the proof of which makes use of the Friedman-Pippenger theorem. We shall also show a construction inspired on Wigderson-Zuckerman expander graphs for which any sufficiently dense subgraph contains all trees of sizes and maximum degrees achieving essentially optimal parameters. Our algorithmic approach is based on a reduction of the tree embedding problem to a certain on-line matching problem for bipartite graphs, solved by Aggarwal et al. (1996).
Resumo:
For more than 30 million years, in early Mesozoic Pangea, ""rauisuchian"" archosaurs were the apex predators in most terrestrial ecosystems, but their biology and evolutionary history remain poorly understood. We describe a new ""rauisuchian"" based on ten individuals found in a single locality from the Middle Triassic (Ladinian) Santa Maria Formation of southern Brazil. Nine articulated and associated skeletons were discovered, three of which have nearly complete skulls. Along with sedimentological and taphonomic data, this suggests that those highly successful predators exhibited some kind of intraspecific interaction. Other monotaxic assemblages of Triassic archosaurs are Late Triassic (Norian-Rhaetian) in age, approximately 10 million years younger than the material described here. Indeed, the studied assemblage may represent the earliest evidence of gregariousness among archosaurs, adding to our knowledge on the origin of a behavior pattern typical of extant taxa.
Resumo:
IEEE International Symposium on Circuits and Systems, pp. 220 – 223, Seattle, EUA
Resumo:
For a given self-map f of M, a closed smooth connected and simply-connected manifold of dimension m ≥ 4, we provide an algorithm for estimating the values of the topological invariant Dm r [f], which equals the minimal number of r-periodic points in the smooth homotopy class of f. Our results are based on the combinatorial scheme for computing Dm r [f] introduced by G. Graff and J. Jezierski [J. Fixed Point Theory Appl. 13 (2013), 63–84]. An open-source implementation of the algorithm programmed in C++ is publicly available at http://www.pawelpilarczyk.com/combtop/.
Resumo:
Magdeburg, Univ., Fak. für Maschinenbau, Diss., 2013
Resumo:
This article presents a feasibility study with the objective of investigating the potential of multi-detector computed tomography (MDCT) to estimate the bone age and sex of deceased persons. To obtain virtual skeletons, the bodies of 22 deceased persons with known age at death were scanned by MDCT using a special protocol that consisted of high-resolution imaging of the skull, shoulder girdle (including the upper half of the humeri), the symphysis pubis and the upper halves of the femora. Bone and soft-tissue reconstructions were performed in two and three dimensions. The resulting data were investigated by three anthropologists with different professional experience. Sex was determined by investigating three-dimensional models of the skull and pelvis. As a basic orientation for the age estimation, the complex method according to Nemeskéri and co-workers was applied. The final estimation was effected using additional parameters like the state of dentition, degeneration of the spine, etc., which where chosen individually by the three observers according to their experience. The results of the study show that the estimation of sex and age is possible by the use of MDCT. Virtual skeletons present an ideal collection for anthropological studies, because they are obtained in a non-invasive way and can be investigated ad infinitum.
Resumo:
We present building blocks for algorithms for the efficient reduction of square factor, i.e. direct repetitions in strings. So the basic problem is this: given a string, compute all strings that can be obtained by reducing factors of the form zz to z. Two types of algorithms are treated: an offline algorithm is one that can compute a data structure on the given string in advance before the actual search for the square begins; in contrast, online algorithms receive all input only at the time when a request is made. For offline algorithms we treat the following problem: Let u and w be two strings such that w is obtained from u by reducing a square factor zz to only z. If we further are given the suffix table of u, how can we derive the suffix table for w without computing it from scratch? As the suffix table plays a key role in online algorithms for the detection of squares in a string, this derivation can make the iterated reduction of squares more efficient. On the other hand, we also show how a suffix array, used for the offline detection of squares, can be adapted to the new string resulting from the deletion of a square. Because the deletion is a very local change, this adaption is more eficient than the computation of the new suffix array from scratch.
Resumo:
A mobile ad hoc network (MANET) is a decentralized and infrastructure-less network. This thesis aims to provide support at the system-level for developers of applications or protocols in such networks. To do this, we propose contributions in both the algorithmic realm and in the practical realm. In the algorithmic realm, we contribute to the field by proposing different context-aware broadcast and multicast algorithms in MANETs, namely six-shot broadcast, six-shot multicast, PLAN-B and ageneric algorithmic approach to optimize the power consumption of existing algorithms. For each algorithm we propose, we compare it to existing algorithms that are either probabilistic or context-aware, and then we evaluate their performance based on simulations. We demonstrate that in some cases, context-aware information, such as location or signal-strength, can improve the effciency. In the practical realm, we propose a testbed framework, namely ManetLab, to implement and to deploy MANET-specific protocols, and to evaluate their performance. This testbed framework aims to increase the accuracy of performance evaluation compared to simulations, while keeping the ease of use offered by the simulators to reproduce a performance evaluation. By evaluating the performance of different probabilistic algorithms with ManetLab, we observe that both simulations and testbeds should be used in a complementary way. In addition to the above original contributions, we also provide two surveys about system-level support for ad hoc communications in order to establish a state of the art. The first is about existing broadcast algorithms and the second is about existing middleware solutions and the way they deal with privacy and especially with location privacy. - Un réseau mobile ad hoc (MANET) est un réseau avec une architecture décentralisée et sans infrastructure. Cette thèse vise à fournir un support adéquat, au niveau système, aux développeurs d'applications ou de protocoles dans de tels réseaux. Dans ce but, nous proposons des contributions à la fois dans le domaine de l'algorithmique et dans celui de la pratique. Nous contribuons au domaine algorithmique en proposant différents algorithmes de diffusion dans les MANETs, algorithmes qui sont sensibles au contexte, à savoir six-shot broadcast,six-shot multicast, PLAN-B ainsi qu'une approche générique permettant d'optimiser la consommation d'énergie de ces algorithmes. Pour chaque algorithme que nous proposons, nous le comparons à des algorithmes existants qui sont soit probabilistes, soit sensibles au contexte, puis nous évaluons leurs performances sur la base de simulations. Nous montrons que, dans certains cas, des informations liées au contexte, telles que la localisation ou l'intensité du signal, peuvent améliorer l'efficience de ces algorithmes. Sur le plan pratique, nous proposons une plateforme logicielle pour la création de bancs d'essai, intitulé ManetLab, permettant d'implémenter, et de déployer des protocoles spécifiques aux MANETs, de sorte à évaluer leur performance. Cet outil logiciel vise à accroître la précision desévaluations de performance comparativement à celles fournies par des simulations, tout en conservant la facilité d'utilisation offerte par les simulateurs pour reproduire uneévaluation de performance. En évaluant les performances de différents algorithmes probabilistes avec ManetLab, nous observons que simulateurs et bancs d'essai doivent être utilisés de manière complémentaire. En plus de ces contributions principales, nous fournissons également deux états de l'art au sujet du support nécessaire pour les communications ad hoc. Le premier porte sur les algorithmes de diffusion existants et le second sur les solutions de type middleware existantes et la façon dont elles traitent de la confidentialité, en particulier celle de la localisation.