933 resultados para Parallel computing


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The scale down of transistor technology allows microelectronics manufacturers such as Intel and IBM to build always more sophisticated systems on a single microchip. The classical interconnection solutions based on shared buses or direct connections between the modules of the chip are becoming obsolete as they struggle to sustain the increasing tight bandwidth and latency constraints that these systems demand. The most promising solution for the future chip interconnects are the Networks on Chip (NoC). NoCs are network composed by routers and channels used to inter- connect the different components installed on the single microchip. Examples of advanced processors based on NoC interconnects are the IBM Cell processor, composed by eight CPUs that is installed on the Sony Playstation III and the Intel Teraflops pro ject composed by 80 independent (simple) microprocessors. On chip integration is becoming popular not only in the Chip Multi Processor (CMP) research area but also in the wider and more heterogeneous world of Systems on Chip (SoC). SoC comprehend all the electronic devices that surround us such as cell-phones, smart-phones, house embedded systems, automotive systems, set-top boxes etc... SoC manufacturers such as ST Microelectronics , Samsung, Philips and also Universities such as Bologna University, M.I.T., Berkeley and more are all proposing proprietary frameworks based on NoC interconnects. These frameworks help engineers in the switch of design methodology and speed up the development of new NoC-based systems on chip. In this Thesis we propose an introduction of CMP and SoC interconnection networks. Then focusing on SoC systems we propose: • a detailed analysis based on simulation of the Spidergon NoC, a ST Microelectronics solution for SoC interconnects. The Spidergon NoC differs from many classical solutions inherited from the parallel computing world. Here we propose a detailed analysis of this NoC topology and routing algorithms. Furthermore we propose aEqualized a new routing algorithm designed to optimize the use of the resources of the network while also increasing its performance; • a methodology flow based on modified publicly available tools that combined can be used to design, model and analyze any kind of System on Chip; • a detailed analysis of a ST Microelectronics-proprietary transport-level protocol that the author of this Thesis helped developing; • a simulation-based comprehensive comparison of different network interface designs proposed by the author and the researchers at AST lab, in order to integrate shared-memory and message-passing based components on a single System on Chip; • a powerful and flexible solution to address the time closure exception issue in the design of synchronous Networks on Chip. Our solution is based on relay stations repeaters and allows to reduce the power and area demands of NoC interconnects while also reducing its buffer needs; • a solution to simplify the design of the NoC by also increasing their performance and reducing their power and area consumption. We propose to replace complex and slow virtual channel-based routers with multiple and flexible small Multi Plane ones. This solution allows us to reduce the area and power dissipation of any NoC while also increasing its performance especially when the resources are reduced. This Thesis has been written in collaboration with the Advanced System Technology laboratory in Grenoble France, and the Computer Science Department at Columbia University in the city of New York.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ultrasound imaging is widely used in medical diagnostics as it is the fastest, least invasive, and least expensive imaging modality. However, ultrasound images are intrinsically difficult to be interpreted. In this scenario, Computer Aided Detection (CAD) systems can be used to support physicians during diagnosis providing them a second opinion. This thesis discusses efficient ultrasound processing techniques for computer aided medical diagnostics, focusing on two major topics: (i) Ultrasound Tissue Characterization (UTC), aimed at characterizing and differentiating between healthy and diseased tissue; (ii) Ultrasound Image Segmentation (UIS), aimed at detecting the boundaries of anatomical structures to automatically measure organ dimensions and compute clinically relevant functional indices. Research on UTC produced a CAD tool for Prostate Cancer detection to improve the biopsy protocol. In particular, this thesis contributes with: (i) the development of a robust classification system; (ii) the exploitation of parallel computing on GPU for real-time performance; (iii) the introduction of both an innovative Semi-Supervised Learning algorithm and a novel supervised/semi-supervised learning scheme for CAD system training that improve system performance reducing data collection effort and avoiding collected data wasting. The tool provides physicians a risk map highlighting suspect tissue areas, allowing them to perform a lesion-directed biopsy. Clinical validation demonstrated the system validity as a diagnostic support tool and its effectiveness at reducing the number of biopsy cores requested for an accurate diagnosis. For UIS the research developed a heart disease diagnostic tool based on Real-Time 3D Echocardiography. Thesis contributions to this application are: (i) the development of an automated GPU based level-set segmentation framework for 3D images; (ii) the application of this framework to the myocardium segmentation. Experimental results showed the high efficiency and flexibility of the proposed framework. Its effectiveness as a tool for quantitative analysis of 3D cardiac morphology and function was demonstrated through clinical validation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Vrancea region, at the south-eastern bend of the Carpathian Mountains in Romania, represents one of the most puzzling seismically active zones of Europe. Beside some shallow seismicity spread across the whole Romanian territory, Vrancea is the place of an intense seismicity with the presence of a cluster of intermediate-depth foci placed in a narrow nearly vertical volume. Although large-scale mantle seismic tomographic studies have revealed the presence of a narrow, almost vertical, high-velocity body in the upper mantle, the nature and the geodynamic of this deep intra-continental seismicity is still questioned. High-resolution seismic tomography could help to reveal more details in the subcrustal structure of Vrancea. Recent developments in computational seismology as well as the availability of parallel computing now allow to potentially retrieve more information out of seismic waveforms and to reach such high-resolution models. This study was aimed to evaluate the application of a full waveform inversion tomography at regional scale for the Vrancea lithosphere using data from the 1999 six months temporary local network CALIXTO. Starting from a detailed 3D Vp, Vs and density model, built on classical travel-time tomography together with gravity data, I evaluated the improvements obtained with the full waveform inversion approach. The latter proved to be highly problem dependent and highly computational expensive. The model retrieved after the first two iterations does not show large variations with respect to the initial model but remains in agreement with previous tomographic models. It presents a well-defined downgoing slab shape high velocity anomaly, composed of a N-S horizontal anomaly in the depths between 40 and 70km linked to a nearly vertical NE-SW anomaly from 70 to 180km.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In questa tesi sono stati apportati due importanti contributi nel campo degli acceleratori embedded many-core. Abbiamo implementato un runtime OpenMP ottimizzato per la gestione del tasking model per sistemi a processori strettamente accoppiati in cluster e poi interconnessi attraverso una network on chip. Ci siamo focalizzati sulla loro scalabilità e sul supporto di task di granularità fine, come è tipico nelle applicazioni embedded. Il secondo contributo di questa tesi è stata proporre una estensione del runtime di OpenMP che cerca di prevedere la manifestazione di errori dati da fenomeni di variability tramite una schedulazione efficiente del carico di lavoro.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Uno dei principali settori di studio nell’ambito della visione artificiale riguarda lo sviluppo e la continua ricerca di tecniche e metodologie atte alla ricostruzione di ambienti 3D. Una di queste è il Kinect Fusion, la quale utilizza il dispositivo Kinect per catturare ed elaborare informazioni provenienti da mappe di profondità relative a una particolare scena, per creare un modello 3D dell’ambiente individuato dal sensore. Il funzionamento generale del sistema “Kinect Fusion” consiste nella ricostruzione di superfici dense attraverso l’integrazione delle informazioni di profondità dei vari frame all’interno di un cubo virtuale, che a sua volta viene partizionato in piccoli volumi denominati voxel, e che rappresenta il volume della scena che si intende ricostruire. Per ognuno di tali voxel viene memorizzata la distanza (TSDF) rispetto alla superficie più vicina. Durante lo svolgimento di questo lavoro di tesi ci si è concentrati innanzitutto nell’analisi dell’algoritmo Voxel Hashing, una tecnica che mira a rendere l'algoritmo Kinect Fusion scalabile, attraverso una migliore gestione della struttura dati dei voxel allocando questi ultimi all'interno di una tabella di hash solo se strettamente necessario (TSDF inferiore a una soglia). In una prima fase del progetto si è quindi studiato in dettaglio il funzionamento di suddetta tecnica, fino a giungere alla fase della sua implementazione all’interno di un framework di ricostruzione 3D, basato su Kinect Fusion; si è quindi reso il sistema realizzato più robusto tramite l’applicazione di diverse migliorie. In una fase successiva sono stati effettuati test quantitativi e qualitativi per valutarne l'efficienza e la robustezza. Nella parte finale del progetto sono stati delineati i possibili sviluppi di future applicazioni.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Irregular computations pose sorne of the most interesting and challenging problems in automatic parallelization. Irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. Such computations often use dynamic data structures, which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. Starting in the mid 80s there has been significant progress in the development of parallelizing compilers for logic pro­gramming (and more recently, constraint programming) resulting in quite capable paralle­lizers. The typical applications of these paradigms frequently involve irregular computations, and make heavy use of dynamic data structures with pointers, since logical variables represent in practice a well-behaved form of pointers. This arguably makes the techniques used in these compilers potentially interesting. In this paper, we introduce in a tutoríal way, sorne of the problems faced by parallelizing compilers for logic and constraint programs and provide pointers to sorne of the significant progress made in the area. In particular, this work has resulted in a series of achievements in the areas of inter-procedural pointer aliasing analysis for independence detection, cost models and cost analysis, cactus-stack memory management, techniques for managing speculative and irregular computations through task granularity control and dynamic task allocation such as work-stealing schedulers), etc.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Modeling the evolution of the state of program memory during program execution is critical to many parallehzation techniques. Current memory analysis techniques either provide very accurate information but run prohibitively slowly or produce very conservative results. An approach based on abstract interpretation is presented for analyzing programs at compile time, which can accurately determine many important program properties such as aliasing, logical data structures and shape. These properties are known to be critical for transforming a single threaded program into a versión that can be run on múltiple execution units in parallel. The analysis is shown to be of polynomial complexity in the size of the memory heap. Experimental results for benchmarks in the Jolden suite are given. These results show that in practice the analysis method is efflcient and is capable of accurately determining shape information in programs that créate and manipúlate complex data structures.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tissue P systems generalize the membrane structure tree usual in original models of P systems to an arbitrary graph. Basic opera- tions in these systems are communication rules, enriched in some variants with cell division or cell separation. Several variants of tissue P systems were recently studied, together with the concept of uniform families of these systems. Their computational power was shown to range between P and NP ? co-NP , thus characterizing some interesting borderlines between tractability and intractability. In this paper we show that com- putational power of these uniform families in polynomial time is limited by the class PSPACE . This class characterizes the power of many clas- sical parallel computing models

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Reproducible research in scientic work ows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and nal results, improves understanding, and permits replaying a work ow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We dene a process for documenting the work ow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation sing a real work ow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predened virtual machine image on both computing platforms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El tiempo de concentración de una cuenca sigue siendo relativamente desconocido para los ingenieros. El procedimiento habitual en un estudio hidrológico es calcularlo según varias fórmulas escogidas entre las existentes para después emplear el valor medio obtenido. De esta media se derivan los demás resultados hidrológicos, resultados que influirán en el futuro dimensionamiento de las infraestructuras. Este trabajo de investigación comenzó con el deseo de conseguir un método más fiable y objetivo que permitiera obtener el tiempo de concentración. Dada la imposibilidad de poner en práctica ensayos hidrológicos en una cuenca física real, ya que no resulta viable monitorizar perfectamente la precipitación ni los caudales de salida, se planteó llevar a cabo los ensayos de forma simulada, con el empleo de modelos hidráulicos bidimensionales de lluvia directa sobre malla 2D de volúmenes finitos. De entre todos los disponibles, se escogió InfoWorks ICM, por su rapidez y facilidad de uso. En una primera fase se efectuó la validación del modelo hidráulico elegido, contrastando los resultados de varias simulaciones con la formulación analítica existente. Posteriormente, se comprobaron los valores de los tiempos de concentración obtenidos con las expresiones referenciadas en la bibliografía, consiguiéndose resultados muy satisfactorios. Una vez verificado, se ejecutaron 690 simulaciones de cuencas tanto naturales como sintéticas, incorporando variaciones de área, pendiente, rugosidad, intensidad y duración de las precipitaciones, a fin de obtener sus tiempos de concentración y retardo. Esta labor se realizó con ayuda de la aceleración del cálculo vectorial que ofrece la tecnología CUDA (Arquitectura Unificada de Dispositivos de Cálculo). Basándose en el análisis dimensional, se agruparon los resultados del tiempo de concentración en monomios adimensionales. Utilizando regresión lineal múltiple, se obtuvo una nueva formulación para el tiempo de concentración. La nueva expresión se contrastó con las formulaciones clásicas, habiéndose obtenido resultados equivalentes. Con la exposición de esta nueva metodología se pretende ayudar al ingeniero en la realización de estudios hidrológicos. Primero porque proporciona datos de manera sencilla y objetiva que se pueden emplear en modelos globales como HEC-HMS. Y segundo porque en sí misma se ha comprobado como una alternativa realmente válida a la metodología hidrológica habitual. Time of concentration remains still fairly imprecise to engineers. A normal hydrological study goes through several formulae, obtaining concentration time as the median value. Most of the remaining hydrologic results will be derived from this value. Those results will determine how future infrastructures will be designed. This research began with the aim to acquire a more reliable and objective method to estimate concentration times. Given the impossibility of carrying out hydrological tests in a real watershed, due to the difficulties related to accurate monitoring of rainfall and derived outflows, a model-based approach was proposed using bidimensional hydraulic simulations of direct rainfall over a 2D finite-volume mesh. Amongst all of the available software packages, InfoWorks ICM was chosen for its speed and ease of use. As a preliminary phase, the selected hydraulic model was validated, checking the outcomes of several simulations over existing analytical formulae. Next, concentration time values were compared to those resulting from expressions referenced in the technical literature. They proved highly satisfactory. Once the model was properly verified, 690 simulations of both natural and synthetic basins were performed, incorporating variations of area, slope, roughness, intensity and duration of rainfall, in order to obtain their concentration and lag times. This job was carried out in a reasonable time lapse with the aid of the parallel computing platform technology CUDA (Compute Unified Device Architecture). Performing dimensional analysis, concentration time results were isolated in dimensionless monomials. Afterwards, a new formulation for the time of concentration was obtained using multiple linear regression. This new expression was checked against classical formulations, obtaining equivalent results. The publication of this new methodology intends to further assist the engineer while carrying out hydrological studies. It is effective to provide global parameters that will feed global models as HEC-HMS on a simple and objective way. It has also been proven as a solid alternative to usual hydrology methodology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Debido al gran incremento de datos digitales que ha tenido lugar en los últimos años, ha surgido un nuevo paradigma de computación paralela para el procesamiento eficiente de grandes volúmenes de datos. Muchos de los sistemas basados en este paradigma, también llamados sistemas de computación intensiva de datos, siguen el modelo de programación de Google MapReduce. La principal ventaja de los sistemas MapReduce es que se basan en la idea de enviar la computación donde residen los datos, tratando de proporcionar escalabilidad y eficiencia. En escenarios libres de fallo, estos sistemas generalmente logran buenos resultados. Sin embargo, la mayoría de escenarios donde se utilizan, se caracterizan por la existencia de fallos. Por tanto, estas plataformas suelen incorporar características de tolerancia a fallos y fiabilidad. Por otro lado, es reconocido que las mejoras en confiabilidad vienen asociadas a costes adicionales en recursos. Esto es razonable y los proveedores que ofrecen este tipo de infraestructuras son conscientes de ello. No obstante, no todos los enfoques proporcionan la misma solución de compromiso entre las capacidades de tolerancia a fallo (o de manera general, las capacidades de fiabilidad) y su coste. Esta tesis ha tratado la problemática de la coexistencia entre fiabilidad y eficiencia de los recursos en los sistemas basados en el paradigma MapReduce, a través de metodologías que introducen el mínimo coste, garantizando un nivel adecuado de fiabilidad. Para lograr esto, se ha propuesto: (i) la formalización de una abstracción de detección de fallos; (ii) una solución alternativa a los puntos únicos de fallo de estas plataformas, y, finalmente, (iii) un nuevo sistema de asignación de recursos basado en retroalimentación a nivel de contenedores. Estas contribuciones genéricas han sido evaluadas tomando como referencia la arquitectura Hadoop YARN, que, hoy en día, es la plataforma de referencia en la comunidad de los sistemas de computación intensiva de datos. En la tesis se demuestra cómo todas las contribuciones de la misma superan a Hadoop YARN tanto en fiabilidad como en eficiencia de los recursos utilizados. ABSTRACT Due to the increase of huge data volumes, a new parallel computing paradigm to process big data in an efficient way has arisen. Many of these systems, called dataintensive computing systems, follow the Google MapReduce programming model. The main advantage of these systems is based on the idea of sending the computation where the data resides, trying to provide scalability and efficiency. In failure-free scenarios, these frameworks usually achieve good results. However, these ones are not realistic scenarios. Consequently, these frameworks exhibit some fault tolerance and dependability techniques as built-in features. On the other hand, dependability improvements are known to imply additional resource costs. This is reasonable and providers offering these infrastructures are aware of this. Nevertheless, not all the approaches provide the same tradeoff between fault tolerant capabilities (or more generally, reliability capabilities) and cost. In this thesis, we have addressed the coexistence between reliability and resource efficiency in MapReduce-based systems, looking for methodologies that introduce the minimal cost and guarantee an appropriate level of reliability. In order to achieve this, we have proposed: (i) a formalization of a failure detector abstraction; (ii) an alternative solution to single points of failure of these frameworks, and finally (iii) a novel feedback-based resource allocation system at the container level. Finally, our generic contributions have been instantiated for the Hadoop YARN architecture, which is the state-of-the-art framework in the data-intensive computing systems community nowadays. The thesis demonstrates how all our approaches outperform Hadoop YARN in terms of reliability and resource efficiency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A simple evolutionary process can discover sophisticated methods for emergent information processing in decentralized spatially extended systems. The mechanisms underlying the resulting emergent computation are explicated by a technique for analyzing particle-based logic embedded in pattern-forming systems. Understanding how globally coordinated computation can emerge in evolution is relevant both for the scientific understanding of natural information processing and for engineering new forms of parallel computing systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Devido às tendências de crescimento da quantidade de dados processados e a crescente necessidade por computação de alto desempenho, mudanças significativas estão acontecendo no projeto de arquiteturas de computadores. Com isso, tem-se migrado do paradigma sequencial para o paralelo, com centenas ou milhares de núcleos de processamento em um mesmo chip. Dentro desse contexto, o gerenciamento de energia torna-se cada vez mais importante, principalmente em sistemas embarcados, que geralmente são alimentados por baterias. De acordo com a Lei de Moore, o desempenho de um processador dobra a cada 18 meses, porém a capacidade das baterias dobra somente a cada 10 anos. Esta situação provoca uma enorme lacuna, que pode ser amenizada com a utilização de arquiteturas multi-cores heterogêneas. Um desafio fundamental que permanece em aberto para estas arquiteturas é realizar a integração entre desenvolvimento de código embarcado, escalonamento e hardware para gerenciamento de energia. O objetivo geral deste trabalho de doutorado é investigar técnicas para otimização da relação desempenho/consumo de energia em arquiteturas multi-cores heterogêneas single-ISA implementadas em FPGA. Nesse sentido, buscou-se por soluções que obtivessem o melhor desempenho possível a um consumo de energia ótimo. Isto foi feito por meio da combinação de mineração de dados para a análise de softwares baseados em threads aliadas às técnicas tradicionais para gerenciamento de energia, como way-shutdown dinâmico, e uma nova política de escalonamento heterogeneity-aware. Como principais contribuições pode-se citar a combinação de técnicas de gerenciamento de energia em diversos níveis como o nível do hardware, do escalonamento e da compilação; e uma política de escalonamento integrada com uma arquitetura multi-core heterogênea em relação ao tamanho da memória cache L1.