19 resultados para Parallel Architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Breast cancer is the most common cancer among women, being a major public health problem. Worldwide, X-ray mammography is the current gold-standard for medical imaging of breast cancer. However, it has associated some well-known limitations. The false-negative rates, up to 66% in symptomatic women, and the false-positive rates, up to 60%, are a continued source of concern and debate. These drawbacks prompt the development of other imaging techniques for breast cancer detection, in which Digital Breast Tomosynthesis (DBT) is included. DBT is a 3D radiographic technique that reduces the obscuring effect of tissue overlap and appears to address both issues of false-negative and false-positive rates. The 3D images in DBT are only achieved through image reconstruction methods. These methods play an important role in a clinical setting since there is a need to implement a reconstruction process that is both accurate and fast. This dissertation deals with the optimization of iterative algorithms, with parallel computing through an implementation on Graphics Processing Units (GPUs) to make the 3D reconstruction faster using Compute Unified Device Architecture (CUDA). Iterative algorithms have shown to produce the highest quality DBT images, but since they are computationally intensive, their clinical use is currently rejected. These algorithms have the potential to reduce patient dose in DBT scans. A method of integrating CUDA in Interactive Data Language (IDL) is proposed in order to accelerate the DBT image reconstructions. This method has never been attempted before for DBT. In this work the system matrix calculation, the most computationally expensive part of iterative algorithms, is accelerated. A speedup of 1.6 is achieved proving the fact that GPUs can accelerate the IDL implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Intel R Xeon PhiTM is the first processor based on Intel’s MIC (Many Integrated Cores) architecture. It is a co-processor specially tailored for data-parallel computations, whose basic architectural design is similar to the ones of GPUs (Graphics Processing Units), leveraging the use of many integrated low computational cores to perform parallel computations. The main novelty of the MIC architecture, relatively to GPUs, is its compatibility with the Intel x86 architecture. This enables the use of many of the tools commonly available for the parallel programming of x86-based architectures, which may lead to a smaller learning curve. However, programming the Xeon Phi still entails aspects intrinsic to accelerator-based computing, in general, and to the MIC architecture, in particular. In this thesis we advocate the use of algorithmic skeletons for programming the Xeon Phi. Algorithmic skeletons abstract the complexity inherent to parallel programming, hiding details such as resource management, parallel decomposition, inter-execution flow communication, thus removing these concerns from the programmer’s mind. In this context, the goal of the thesis is to lay the foundations for the development of a simple but powerful and efficient skeleton framework for the programming of the Xeon Phi processor. For this purpose we build upon Marrow, an existing framework for the orchestration of OpenCLTM computations in multi-GPU and CPU environments. We extend Marrow to execute both OpenCL and C++ parallel computations on the Xeon Phi. We evaluate the newly developed framework, several well-known benchmarks, like Saxpy and N-Body, will be used to compare, not only its performance to the existing framework when executing on the co-processor, but also to assess the performance on the Xeon Phi versus a multi-GPU environment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Combinatorial Optimization Problems occur in a wide variety of contexts and generally are NP-hard problems. At a corporate level solving this problems is of great importance since they contribute to the optimization of operational costs. In this thesis we propose to solve the Public Transport Bus Assignment problem considering an heterogeneous fleet and line exchanges, a variant of the Multi-Depot Vehicle Scheduling Problem in which additional constraints are enforced to model a real life scenario. The number of constraints involved and the large number of variables makes impracticable solving to optimality using complete search techniques. Therefore, we explore metaheuristics, that sacrifice optimality to produce solutions in feasible time. More concretely, we focus on the development of algorithms based on a sophisticated metaheuristic, Ant-Colony Optimization (ACO), which is based on a stochastic learning mechanism. For complex problems with a considerable number of constraints, sophisticated metaheuristics may fail to produce quality solutions in a reasonable amount of time. Thus, we developed parallel shared-memory (SM) synchronous ACO algorithms, however, synchronism originates the straggler problem. Therefore, we proposed three SM asynchronous algorithms that break the original algorithm semantics and differ on the degree of concurrency allowed while manipulating the learned information. Our results show that our sequential ACO algorithms produced better solutions than a Restarts metaheuristic, the ACO algorithms were able to learn and better solutions were achieved by increasing the amount of cooperation (number of search agents). Regarding parallel algorithms, our asynchronous ACO algorithms outperformed synchronous ones in terms of speedup and solution quality, achieving speedups of 17.6x. The cooperation scheme imposed by asynchronism also achieved a better learning rate than the original one.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modern fully integrated transceivers architectures, require circuits with low area, low cost, low power, and high efficiency. A key block in modern transceivers is the power amplifier, which is deeply studied in this thesis. First, we study the implementation of a classical Class-A amplifier, describing the basic operation of an RF power amplifier, and analysing the influence of the real models of the reactive components in its operation. Secondly, the Class-E amplifier is deeply studied. The different types of implementations are reviewed and theoretical equations are derived and compared with simulations. There were selected four modes of operation for the Class-E amplifier, in order to perform the implementation of the output stage, and the subsequent comparison of results. This led to the selection of the mode with the best trade-off between efficiency and harmonics distortion, lower power consumption and higher output power. The optimal choice was a parallel circuit containing an inductor with a finite value. To complete the implementation of the PA in switching mode, a driver was implemented. The final block (output stage together with the driver) got 20 % total efficiency (PAE) transmitting 8 dBm output power to a 50 W load with a total harmonic distortion (THD) of 3 % and a total consumption of 28 mW. All implementations are designed using standard 130 nm CMOS technology. The operating frequency is 2.4 GHz and it was considered an 1.2 V DC power supply. The proposed circuit is intended to be used in a Bluetooth transmitter, however, it has a wider range of applications.