991 resultados para Compute Unified Device Architecture(CUDA)
Resumo:
A parallel formulation of an algorithm for the histogram computation of n data items using an on-the-fly data decomposition and a novel quantum-like representation (QR) is developed. The QR transformation separates multiple data read operations from multiple bin update operations thereby making it easier to bind data items into their corresponding histogram bins. Under this model the steps required to compute the histogram is n/s + t steps, where s is a speedup factor and t is associated with pipeline latency. Here, we show that an overall speedup factor, s, is available for up to an eightfold acceleration. Our evaluation also shows that each one of these cells requires less area/time complexity compared to similar proposals found in the literature.
Resumo:
The core processing step of the noise reduction median filter technique is to find the median within a window of integers. A four-step procedure method to compute the running median of the last N W-bit stream of integers showing area and time benefits is proposed. The method slices integers into groups of B-bit using a pipeline of W/B blocks. From the method, an architecture is developed giving a designer the flexibility to exchange area gains for faster frequency of operation, or vice versa, by adjusting N, W and B parameter values. Gains in area of around 40%, or in frequency of operation of around 20%, are clearly observed by FPGA circuit implementations compared to latest methods in the literature.
Resumo:
During the last decade, the Internet usage has been growing at an enormous rate which has beenaccompanied by the developments of network applications (e.g., video conference, audio/videostreaming, E-learning, E-Commerce and real-time applications) and allows several types ofinformation including data, voice, picture and media streaming. While end-users are demandingvery high quality of service (QoS) from their service providers, network undergoes a complex trafficwhich leads the transmission bottlenecks. Considerable effort has been made to study thecharacteristics and the behavior of the Internet. Simulation modeling of computer networkcongestion is a profitable and effective technique which fulfills the requirements to evaluate theperformance and QoS of networks. To simulate a single congested link, simulation is run with asingle load generator while for a larger simulation with complex traffic, where the nodes are spreadacross different geographical locations generating distributed artificial loads is indispensable. Onesolution is to elaborate a load generation system based on master/slave architecture.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Techniques of image combination, with extraction of objects to set a final scene, are very used in applications from photos montages to cinematographic productions. These techniques are called digital matting. With them is possible to decrease the cost of productions, because it is not necessary for the actor to be filmed in the location where the final scene occurs. This feature also favors its use in programs made to digital television, which demands a high quality image. Many digital matting algorithms use markings done on the images, to demarcate what is the foreground, the background and the uncertainty areas. This marking is called trimap, which is a triple map containing these three informations. The trimap is done, typically, from manual markings. In this project, methods were created that can be used in digital matting algorithms, with restriction of time and without human interaction, that is, the creation of an algorithm that generates the trimap automatically. This last one can be generated from the difference between a color of an arbitrary background and the foreground, or by using a depth map. It was also created a matting method, based on the Geodesic Matting (BAI; SAPIRO, 2009), which has an inferior processing time then the original one. Aiming to improve the performance of the applications that generates the trimap and of the algorithms that generates the alphamap (map that associates a value to the transparency of each pixel of the image), allowing its use in applications with time restrictions, it was used the CUDA architecture. Taking advantage, this way, of the computational power and the features of the GPGPU, which is massively parallel
Resumo:
Due to the lack of optical random access memory, optical fiber delay line (FDL) is currently the only way to implement optical buffering. Feed-forward and feedback are two kinds of FDL structures in optical buffering. Both have advantages and disadvantages. In this paper, we propose a more effective hybrid FDL architecture that combines the merits of both schemes. The core of this switch is the arrayed waveguide grating (AWG) and the tunable wavelength converter (TWC). It requires smaller optical device sizes and fewer wavelengths and has less noise than feedback architecture. At the same time, it can facilitate preemptive priority routing which feed-forward architecture cannot support. Our numerical results show that the new switch architecture significantly reduces packet loss probability.
Resumo:
Triple-gate devices are considered a promising solution for sub-20 nm era. Strain engineering has also been recognized as an alternative due to the increase in the carriers mobility it propitiates. The simulation of strained devices has the major drawback of the stress non-uniformity, which cannot be easily considered in a device TCAD simulation without the coupled process simulation that is time consuming and cumbersome task. However, it is mandatory to have accurate device simulation, with good correlation with experimental results of strained devices, allowing for in-depth physical insight as well as prediction on the stress impact on the device electrical characteristics. This work proposes the use of an analytic function, based on the literature, to describe accurately the strain dependence on both channel length and fin width in order to simulate adequately strained triple-gate devices. The maximum transconductance and the threshold voltage are used as the key parameters to compare simulated and experimental data. The results show the agreement of the proposed analytic function with the experimental results. Also, an analysis on the threshold voltage variation is carried out, showing that the stress affects the dependence of the threshold voltage on the temperature. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
The efficient emulation of a many-core architecture is a challenging task, each core could be emulated through a dedicated thread and such threads would be interleaved on an either single-core or a multi-core processor. The high number of context switches will results in an unacceptable performance. To support this kind of application, the GPU computational power is exploited in order to schedule the emulation threads on the GPU cores. This presents a non trivial divergence issue, since GPU computational power is offered through SIMD processing elements, that are forced to synchronously execute the same instruction on different memory portions. Thus, a new emulation technique is introduced in order to overcome this limitation: instead of providing a routine for each ISA opcode, the emulator mimics the behavior of the Micro Architecture level, here instructions are date that a unique routine takes as input. Our new technique has been implemented and compared with the classic emulation approach, in order to investigate the chance of a hybrid solution.
Resumo:
The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).
Resumo:
Background Retraction, atrophy and fatty infiltration are signs subsequent to chronic rotator cuff tendon tears. They are associated with an increased pennation angle and a shortening of the muscle fibers in series. These deleterious changes of the muscular architecture are not reversible with current repair techniques and are the main factors for failed rotator cuff tendon repair. Whereas fast stretching of the retracted musculotendinous unit results in proliferation of non-contractile fibrous tissue, slow stretching may lead to muscle regeneration in terms of sarcomerogenesis. To slowly stretch the retracted musculotendinous unit in a sheep model, two here described tensioning devices have been developed and mounted on the scapular spine of the sheep using an expandable threaded rod, which has been interposed between the retracted tendon end and the original insertion site at the humeral head. Traction is transmitted in line with the musculotendinous unit by sutures knotted on the expandable threaded rod. The threaded rod of the tensioner is driven within the body through a rotating axis, which enters the body on the opposite side. The tendon end, which was previously released (16 weeks prior) from its insertion site with a bone chip, was elongated with a velocity of 1 mm/day. Results After several steps of technical improvements, the tensioner proved to be capable of actively stretching the retracted and degenerated muscle back to the original length and to withstand the external forces acting on it. Conclusion This technical report describes the experimental technique for continuous elongation of the musculotendinous unit and reversion of the length of chronically shortened muscle.
Resumo:
In this paper the software architecture of a framework which simplifies the development of applications in the area of Virtual and Augmented Reality is presented. It is based on VRML/X3D to enable rendering of audio-visual information. We extended our VRML rendering system by a device management system that is based on the concept of a data-flow graph. The aim of the system is to create Mixed Reality (MR) applications simply by plugging together small prefabricated software components, instead of compiling monolithic C++ applications. The flexibility and the advantages of the presented framework are explained on the basis of an exemplary implementation of a classic Augmented Realityapplication and its extension to a collaborative remote expert scenario.
Resumo:
Kriging-based optimization relying on noisy evaluations of complex systems has recently motivated contributions from various research communities. Five strategies have been implemented in the DiceOptim package. The corresponding functions constitute a user-friendly tool for solving expensive noisy optimization problems in a sequential framework, while offering some flexibility for advanced users. Besides, the implementation is done in a unified environment, making this package a useful device for studying the relative performances of existing approaches depending on the experimental setup. An overview of the package structure and interface is provided, as well as a description of the strategies and some insight about the implementation challenges and the proposed solutions. The strategies are compared to some existing optimization packages on analytical test functions and show promising performances.
Resumo:
The Future Communication Architecture for Mobile Cloud Services: Mobile Cloud Networking (MCN) is a EU FP7 Large-scale Integrating Project (IP) funded by the European Commission. MCN project was launched in November 2012 for the period of 36 month. In total top-tier 19 partners from industry and academia commit to jointly establish the vision of Mobile Cloud Networking, to develop a fully cloud-based mobile communication and application platform.
Resumo:
A web service is a collection of industry standards to enable reusability of services and interoperability of heterogeneous applications. The UMLS Knowledge Source (UMLSKS) Server provides remote access to the UMLSKS and related resources. We propose a Web Services Architecture that encapsulates UMLSKS-API and makes it available in distributed and heterogeneous environments. This is the first step towards intelligent and automatic UMLS services discovery and invocation by computer systems in distributed environments such as web.
Resumo:
Software architecture consists of a set of design choices that can be partially expressed in form of rules that the implementation must conform to. Architectural rules are intended to ensure properties that fulfill fundamental non-functional requirements. Verifying architectural rules is often a non- trivial activity: available tools are often not very usable and support only a narrow subset of the rules that are commonly specified by practitioners. In this paper we present a new highly-readable declarative language for specifying architectural rules. With our approach, users can specify a wide variety of rules using a single uniform notation. Rules can get tested by third-party tools by conforming to pre-defined specification templates. Practitioners can take advantage of the capabilities of a growing number of testing tools without dealing with them directly.