988 resultados para parallel implementation


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a new parallel method for sparse spectral unmixing of remotely sensed hyperspectral data on commodity graphics processing units (GPUs) is presented. A semi-supervised approach is adopted, which relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. This method is based on the spectral unmixing by splitting and augmented Lagrangian (SUNSAL) that estimates the material's abundance fractions. The parallel method is performed in a pixel-by-pixel fashion and its implementation properly exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for simulated and real hyperspectral datasets reveal significant speedup factors, up to 1 64 times, with regards to optimized serial implementation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Face à estagnação da tecnologia uniprocessador registada na passada década, aos principais fabricantes de microprocessadores encontraram na tecnologia multi-core a resposta `as crescentes necessidades de processamento do mercado. Durante anos, os desenvolvedores de software viram as suas aplicações acompanhar os ganhos de performance conferidos por cada nova geração de processadores sequenciais, mas `a medida que a capacidade de processamento escala em função do número de processadores, a computação sequencial tem de ser decomposta em várias partes concorrentes que possam executar em paralelo, para que possam utilizar as unidades de processamento adicionais e completar mais rapidamente. A programação paralela implica um paradigma completamente distinto da programação sequencial. Ao contrário dos computadores sequenciais tipificados no modelo de Von Neumann, a heterogeneidade de arquiteturas paralelas requer modelos de programação paralela que abstraiam os programadores dos detalhes da arquitectura e simplifiquem o desenvolvimento de aplicações concorrentes. Os modelos de programação paralela mais populares incitam os programadores a identificar instruções concorrentes na sua lógica de programação, e a especificá-las sob a forma de tarefas que possam ser atribuídas a processadores distintos para executarem em simultâneo. Estas tarefas são tipicamente lançadas durante a execução, e atribuídas aos processadores pelo motor de execução subjacente. Como os requisitos de processamento costumam ser variáveis, e não são conhecidos a priori, o mapeamento de tarefas para processadores tem de ser determinado dinamicamente, em resposta a alterações imprevisíveis dos requisitos de execução. `A medida que o volume da computação cresce, torna-se cada vez menos viável garantir as suas restrições temporais em plataformas uniprocessador. Enquanto os sistemas de tempo real se começam a adaptar ao paradigma de computação paralela, há uma crescente aposta em integrar execuções de tempo real com aplicações interativas no mesmo hardware, num mundo em que a tecnologia se torna cada vez mais pequena, leve, ubíqua, e portável. Esta integração requer soluções de escalonamento que simultaneamente garantam os requisitos temporais das tarefas de tempo real e mantenham um nível aceitável de QoS para as restantes execuções. Para tal, torna-se imperativo que as aplicações de tempo real paralelizem, de forma a minimizar os seus tempos de resposta e maximizar a utilização dos recursos de processamento. Isto introduz uma nova dimensão ao problema do escalonamento, que tem de responder de forma correcta a novos requisitos de execução imprevisíveis e rapidamente conjeturar o mapeamento de tarefas que melhor beneficie os critérios de performance do sistema. A técnica de escalonamento baseado em servidores permite reservar uma fração da capacidade de processamento para a execução de tarefas de tempo real, e assegurar que os efeitos de latência na sua execução não afectam as reservas estipuladas para outras execuções. No caso de tarefas escalonadas pelo tempo de execução máximo, ou tarefas com tempos de execução variáveis, torna-se provável que a largura de banda estipulada não seja consumida por completo. Para melhorar a utilização do sistema, os algoritmos de partilha de largura de banda (capacity-sharing) doam a capacidade não utilizada para a execução de outras tarefas, mantendo as garantias de isolamento entre servidores. Com eficiência comprovada em termos de espaço, tempo, e comunicação, o mecanismo de work-stealing tem vindo a ganhar popularidade como metodologia para o escalonamento de tarefas com paralelismo dinâmico e irregular. O algoritmo p-CSWS combina escalonamento baseado em servidores com capacity-sharing e work-stealing para cobrir as necessidades de escalonamento dos sistemas abertos de tempo real. Enquanto o escalonamento em servidores permite partilhar os recursos de processamento sem interferências a nível dos atrasos, uma nova política de work-stealing que opera sobre o mecanismo de capacity-sharing aplica uma exploração de paralelismo que melhora os tempos de resposta das aplicações e melhora a utilização do sistema. Esta tese propõe uma implementação do algoritmo p-CSWS para o Linux. Em concordância com a estrutura modular do escalonador do Linux, ´e definida uma nova classe de escalonamento que visa avaliar a aplicabilidade da heurística p-CSWS em circunstâncias reais. Ultrapassados os obstáculos intrínsecos `a programação da kernel do Linux, os extensos testes experimentais provam que o p-CSWS ´e mais do que um conceito teórico atrativo, e que a exploração heurística de paralelismo proposta pelo algoritmo beneficia os tempos de resposta das aplicações de tempo real, bem como a performance e eficiência da plataforma multiprocessador.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Breast cancer is the most common cancer among women, being a major public health problem. Worldwide, X-ray mammography is the current gold-standard for medical imaging of breast cancer. However, it has associated some well-known limitations. The false-negative rates, up to 66% in symptomatic women, and the false-positive rates, up to 60%, are a continued source of concern and debate. These drawbacks prompt the development of other imaging techniques for breast cancer detection, in which Digital Breast Tomosynthesis (DBT) is included. DBT is a 3D radiographic technique that reduces the obscuring effect of tissue overlap and appears to address both issues of false-negative and false-positive rates. The 3D images in DBT are only achieved through image reconstruction methods. These methods play an important role in a clinical setting since there is a need to implement a reconstruction process that is both accurate and fast. This dissertation deals with the optimization of iterative algorithms, with parallel computing through an implementation on Graphics Processing Units (GPUs) to make the 3D reconstruction faster using Compute Unified Device Architecture (CUDA). Iterative algorithms have shown to produce the highest quality DBT images, but since they are computationally intensive, their clinical use is currently rejected. These algorithms have the potential to reduce patient dose in DBT scans. A method of integrating CUDA in Interactive Data Language (IDL) is proposed in order to accelerate the DBT image reconstructions. This method has never been attempted before for DBT. In this work the system matrix calculation, the most computationally expensive part of iterative algorithms, is accelerated. A speedup of 1.6 is achieved proving the fact that GPUs can accelerate the IDL implementation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Closest Vector Problem (CVP) and the Shortest Vector Problem (SVP) are prime problems in lattice-based cryptanalysis, since they underpin the security of many lattice-based cryptosystems. Despite the importance of these problems, there are only a few CVP-solvers publicly available, and their scalability was never studied. This paper presents a scalable implementation of an enumeration-based CVP-solver for multi-cores, which can be easily adapted to solve the SVP. In particular, it achieves super-linear speedups in some instances on up to 8 cores and almost linear speedups on 16 cores when solving the CVP on a 50-dimensional lattice. Our results show that enumeration-based CVP-solvers can be parallelized as effectively as enumeration-based solvers for the SVP, based on a comparison with a state of the art SVP-solver. In addition, we show that we can optimize the SVP variant of our solver in such a way that it becomes 35%-60% faster than the fastest enumeration-based SVP-solver to date.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays a huge attention of the academia and research teams is attracted to the potential of the usage of the 60 GHz frequency band in the wireless communications. The use of the 60GHz frequency band offers great possibilities for wide variety of applications that are yet to be implemented. These applications also imply huge implementation challenges. Such example is building a high data rate transceiver which at the same time would have very low power consumption. In this paper we present a prototype of Single Carrier -SC transceiver system, illustrating a brief overview of the baseband design, emphasizing the most important decisions that need to be done. A brief overview of the possible approaches when implementing the equalizer, as the most complex module in the SC transceiver, is also presented. The main focus of this paper is to suggest a parallel architecture for the receiver in a Single Carrier communication system. This would provide higher data rates that the communication system canachieve, for a price of higher power consumption. The suggested architecture of such receiver is illustrated in this paper,giving the results of its implementation in comparison with its corresponding serial implementation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in computer memory technology justify research towards new and different views on computer organization. This paper proposes a novel memory-centric computing architecture with the goal to merge memory and processing elements in order to provide better conditions for parallelization and performance. The paper introduces the architectural concepts and afterwards shows the design and implementation of a corresponding assembler and simulator.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper shows how a high level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pulsewidth-modulated (PWM) rectifier technology is increasingly used in industrial applications like variable-speed motor drives, since it offers several desired features such as sinusoidal input currents, controllable power factor, bidirectional power flow and high quality DC output voltage. To achieve these features,however, an effective control system with fast and accurate current and DC voltage responses is required. From various control strategies proposed to meet these control objectives, in most cases the commonly known principle of the synchronous-frame current vector control along with some space-vector PWM scheme have been applied. Recently, however, new control approaches analogous to the well-established direct torque control (DTC) method for electrical machines have also emerged to implement a high-performance PWM rectifier. In this thesis the concepts of classical synchronous-frame current control and DTC-based PWM rectifier control are combined and a new converter-flux-based current control (CFCC) scheme is introduced. To achieve sufficient dynamic performance and to ensure a stable operation, the proposed control system is thoroughly analysed and simple rules for the controller design are suggested. Special attention is paid to the estimationof the converter flux, which is the key element of converter-flux-based control. Discrete-time implementation is also discussed. Line-voltage-sensorless reactive reactive power control methods for the L- and LCL-type line filters are presented. For the L-filter an open-loop control law for the d-axis current referenceis proposed. In the case of the LCL-filter the combined open-loop control and feedback control is proposed. The influence of the erroneous filter parameter estimates on the accuracy of the developed control schemes is also discussed. A newzero vector selection rule for suppressing the zero-sequence current in parallel-connected PWM rectifiers is proposed. With this method a truly standalone and independent control of the converter units is allowed and traditional transformer isolation and synchronised-control-based solutions are avoided. The implementation requires only one additional current sensor. The proposed schemes are evaluated by the simulations and laboratory experiments. A satisfactory performance and good agreement between the theory and practice are demonstrated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The pumping processes requiring wide range of flow are often equipped with parallelconnected centrifugal pumps. In parallel pumping systems, the use of variable speed control allows that the required output for the process can be delivered with a varying number of operated pump units and selected rotational speed references. However, the optimization of the parallel-connected rotational speed controlled pump units often requires adaptive modelling of both parallel pump characteristics and the surrounding system in varying operation conditions. The available information required for the system modelling in typical parallel pumping applications such as waste water treatment and various cooling and water delivery pumping tasks can be limited, and the lack of real-time operation point monitoring often sets limits for accurate energy efficiency optimization. Hence, alternatives for easily implementable control strategies which can be adopted with minimum system data are necessary. This doctoral thesis concentrates on the methods that allow the energy efficient use of variable speed controlled parallel pumps in system scenarios in which the parallel pump units consist of a centrifugal pump, an electric motor, and a frequency converter. Firstly, the suitable operation conditions for variable speed controlled parallel pumps are studied. Secondly, methods for determining the output of each parallel pump unit using characteristic curve-based operation point estimation with frequency converter are discussed. Thirdly, the implementation of the control strategy based on real-time pump operation point estimation and sub-optimization of each parallel pump unit is studied. The findings of the thesis support the idea that the energy efficiency of the pumping can be increased without the installation of new, more efficient components in the systems by simply adopting suitable control strategies. An easily implementable and adaptive control strategy for variable speed controlled parallel pumping systems can be created by utilizing the pump operation point estimation available in modern frequency converters. Hence, additional real-time flow metering, start-up measurements, and detailed system model are unnecessary, and the pumping task can be fulfilled by determining a speed reference for each parallel-pump unit which suggests the energy efficient operation of the pumping system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to various advantages such as flexibility, scalability and updatability, software intensive systems are increasingly embedded in everyday life. The constantly growing number of functions executed by these systems requires a high level of performance from the underlying platform. The main approach to incrementing performance has been the increase of operating frequency of a chip. However, this has led to the problem of power dissipation, which has shifted the focus of research to parallel and distributed computing. Parallel many-core platforms can provide the required level of computational power along with low power consumption. On the one hand, this enables parallel execution of highly intensive applications. With their computational power, these platforms are likely to be used in various application domains: from home use electronics (e.g., video processing) to complex critical control systems. On the other hand, the utilization of the resources has to be efficient in terms of performance and power consumption. However, the high level of on-chip integration results in the increase of the probability of various faults and creation of hotspots leading to thermal problems. Additionally, radiation, which is frequent in space but becomes an issue also at the ground level, can cause transient faults. This can eventually induce a faulty execution of applications. Therefore, it is crucial to develop methods that enable efficient as well as resilient execution of applications. The main objective of the thesis is to propose an approach to design agentbased systems for many-core platforms in a rigorous manner. When designing such a system, we explore and integrate various dynamic reconfiguration mechanisms into agents functionality. The use of these mechanisms enhances resilience of the underlying platform whilst maintaining performance at an acceptable level. The design of the system proceeds according to a formal refinement approach which allows us to ensure correct behaviour of the system with respect to postulated properties. To enable analysis of the proposed system in terms of area overhead as well as performance, we explore an approach, where the developed rigorous models are transformed into a high-level implementation language. Specifically, we investigate methods for deriving fault-free implementations from these models into, e.g., a hardware description language, namely VHDL.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The whole research of the current Master Thesis project is related to Big Data transfer over Parallel Data Link and my main objective is to assist the Saint-Petersburg National Research University ITMO research team to accomplish this project and apply Green IT methods for the data transfer system. The goal of the team is to transfer Big Data by using parallel data links with SDN Openflow approach. My task as a team member was to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. In the context of this thesis work a comparison between 5 different utilities was done, which including Fast Data Transfer (FDT), BBCP, BBFTP, GridFTP, and FTS3. A number of scripts where developed which consist of creating random binary data to be incompressible to have fair comparison between utilities, execute the Utilities with specified parameters, create log files, results, system parameters, and plot graphs to compare the results. Transferring such an enormous variety of data can take a long time, and hence, the necessity appears to reduce the energy consumption to make them greener. In the context of Green IT approach, our team used Cloud Computing infrastructure called OpenStack. It’s more efficient to allocated specific amount of hardware resources to test different scenarios rather than using the whole resources from our testbed. Testing our implementation with OpenStack infrastructure results that the virtual channel does not consist of any traffic and we can achieve the highest possible throughput. After receiving the final results we are in place to identify which utilities produce faster data transfer in different scenarios with specific TCP parameters and we can use them in real network data links.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Animportant step in the residue number system(RNS) based signal processing is the conversion of signal into residue domain. Many implementations of this conversion have been proposed for various goals, and one of the implementations is by a direct conversion from an analogue input. A novel approach for analogue-to-residue conversion is proposed in this research using the most popular Sigma–Delta analogue-to-digital converter (SD-ADC). In this approach, the front end is the same as in traditional SD-ADC that uses Sigma–Delta (SD) modulator with appropriate dynamic range, but the filtering is doneby a filter implemented usingRNSarithmetic. Hence, the natural output of the filter is an RNS representation of the input signal. The resolution, conversion speed, hardware complexity and cost of implementation of the proposed SD based analogue-to-residue converter are compared with the existing analogue-to-residue converters based on Nyquist rate ADCs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The rapid growth of the optical communication branches and the enormous demand for more bandwidth require novel networks such as dense wavelength division multiplexing (DWDM). These networks enable higher bitrate transmission using the existing optical fibers. Micromechanically tunable optical microcavity devices like VCSELs, Fabry-Pérot filters and photodetectors are core components of these novel DWDM systems. Several air-gap based tunable devices were successfully implemented in the last years. Even though these concepts are very promising, two main disadvantages are still remaining. On the one hand, the high fabrication and integration cost and on the other hand the undesired adverse buckling of the suspended membranes. This thesis addresses these two problems and consists of two main parts: • PECVD dielectric material investigation and stress control resulting in membranes shape engineering. • Implementation and characterization of novel tunable optical devices with tailored shapes of the suspended membranes. For this purposes, low-cost PECVD technology is investigated and developed in detail. The macro- and microstress of silicon nitride and silicon dioxide are controlled over a wide range. Furthermore, the effect of stress on the optical and mechanical properties of the suspended membranes and on the microcavities is evaluated. Various membrane shapes (concave, convex and planar) with several radii of curvature are fabricated. Using this resonator shape engineering, microcavity devices such as non tunable and tunable Fabry-Pérot filters, VCSELs and PIN photodetectors are succesfully implemented. The fabricated Fabry-Pérot filters cover a spectral range of over 200nm and show resonance linewidths down to 1.5nm. By varying the stress distribution across the vertical direction within a DBR, the shape and the radius of curvature of the top membrane are explicitely tailored. By adjusting the incoming light beam waist to the curvature, the fundamental resonant mode is supported and the higher order ones are suppressed. For instance, a tunable VCSEL with 26 nm tuning range, 400µW maximal output power, 47nm free spectral range and over 57dB side mode suppresion ratio (SMSR) is demonstrated. Other technologies, such as introducing light emitting organic materials in microcavities are also investigated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Land use is a crucial link between human activities and the natural environment and one of the main driving forces of global environmental change. Large parts of the terrestrial land surface are used for agriculture, forestry, settlements and infrastructure. Given the importance of land use, it is essential to understand the multitude of influential factors and resulting land use patterns. An essential methodology to study and quantify such interactions is provided by the adoption of land-use models. By the application of land-use models, it is possible to analyze the complex structure of linkages and feedbacks and to also determine the relevance of driving forces. Modeling land use and land use changes has a long-term tradition. In particular on the regional scale, a variety of models for different regions and research questions has been created. Modeling capabilities grow with steady advances in computer technology, which on the one hand are driven by increasing computing power on the other hand by new methods in software development, e.g. object- and component-oriented architectures. In this thesis, SITE (Simulation of Terrestrial Environments), a novel framework for integrated regional sland-use modeling, will be introduced and discussed. Particular features of SITE are the notably extended capability to integrate models and the strict separation of application and implementation. These features enable efficient development, test and usage of integrated land-use models. On its system side, SITE provides generic data structures (grid, grid cells, attributes etc.) and takes over the responsibility for their administration. By means of a scripting language (Python) that has been extended by language features specific for land-use modeling, these data structures can be utilized and manipulated by modeling applications. The scripting language interpreter is embedded in SITE. The integration of sub models can be achieved via the scripting language or by usage of a generic interface provided by SITE. Furthermore, functionalities important for land-use modeling like model calibration, model tests and analysis support of simulation results have been integrated into the generic framework. During the implementation of SITE, specific emphasis was laid on expandability, maintainability and usability. Along with the modeling framework a land use model for the analysis of the stability of tropical rainforest margins was developed in the context of the collaborative research project STORMA (SFB 552). In a research area in Central Sulawesi, Indonesia, socio-environmental impacts of land-use changes were examined. SITE was used to simulate land-use dynamics in the historical period of 1981 to 2002. Analogous to that, a scenario that did not consider migration in the population dynamics, was analyzed. For the calculation of crop yields and trace gas emissions, the DAYCENT agro-ecosystem model was integrated. In this case study, it could be shown that land-use changes in the Indonesian research area could mainly be characterized by the expansion of agricultural areas at the expense of natural forest. For this reason, the situation had to be interpreted as unsustainable even though increased agricultural use implied economic improvements and higher farmers' incomes. Due to the importance of model calibration, it was explicitly addressed in the SITE architecture through the introduction of a specific component. The calibration functionality can be used by all SITE applications and enables largely automated model calibration. Calibration in SITE is understood as a process that finds an optimal or at least adequate solution for a set of arbitrarily selectable model parameters with respect to an objective function. In SITE, an objective function typically is a map comparison algorithm capable of comparing a simulation result to a reference map. Several map optimization and map comparison methodologies are available and can be combined. The STORMA land-use model was calibrated using a genetic algorithm for optimization and the figure of merit map comparison measure as objective function. The time period for the calibration ranged from 1981 to 2002. For this period, respective reference land-use maps were compiled. It could be shown, that an efficient automated model calibration with SITE is possible. Nevertheless, the selection of the calibration parameters required detailed knowledge about the underlying land-use model and cannot be automated. In another case study decreases in crop yields and resulting losses in income from coffee cultivation were analyzed and quantified under the assumption of four different deforestation scenarios. For this task, an empirical model, describing the dependence of bee pollination and resulting coffee fruit set from the distance to the closest natural forest, was integrated. Land-use simulations showed, that depending on the magnitude and location of ongoing forest conversion, pollination services are expected to decline continuously. This results in a reduction of coffee yields of up to 18% and a loss of net revenues per hectare of up to 14%. However, the study also showed that ecological and economic values can be preserved if patches of natural vegetation are conservated in the agricultural landscape. -----------------------------------------------------------------------