953 resultados para General-purpose computing
Resumo:
Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary to hide the complexity of parallelism, data distribution and memory locality behind an abstract interface. The aim of this work is to build a scalable graph analytics framework that does not demand significant parallel programming experience based on NUMA-awareness.
The realization of such a system faces two key problems:
(i)~how to develop a scale-free parallel programming framework that scales efficiently across NUMA domains; (ii)~how to efficiently apply graph partitioning in order to create separate and largely independent work items that can be distributed among threads.
Resumo:
Vista la crescente necessità di svolgere progetti d'innovazione guidati dalle nuove tecnologie, la presente ricerca risponde all'esigenza di trovare una metodologia efficace per poterli affrontare proficuamente. In un primo momento, l'elaborato propone una metodologia che permette di elaborare una classificazione dei progetti d'innovazione technology-driven in classi ed un pacchetto di tools funzionali al riconoscimento dell'appartenenza alle stesse. In un secondo momento, giunti a comprensione del fatto che ad ognuna delle classi corrisponde uno specifico fine raggiungibile con una specifica metodologia, lo studio descrive la metodologia seguita per raggiungere una efficace e ripetibile elaborazione di principi progettuali, buone pratiche e strumenti che permettano ad una Organizzazione di appropriarsi del valore di una tecnologia General Purpose (GPT) attraverso l'ideazione di soluzioni innovativa. La progettazione è figlia di un approccio di Design Thinking (DT), sia poichè esso è stato usato nello svolgimento stesso della ricerca, sia perchè la metodologia DT è alla base della modellazione del processo proposto per la classe.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
In this article we explore the NVIDIA graphical processing units (GPU) computational power in cryptography using CUDA (Compute Unified Device Architecture) technology. CUDA makes the general purpose computing easy using the parallel processing presents in GPUs. To do this, the NVIDIA GPUs architectures and CUDA are presented, besides cryptography concepts. Furthermore, we do the comparison between the versions executed in CPU with the parallel version of the cryptography algorithms Advanced Encryption Standard (AES) and Message-digest Algorithm 5 (MD5) wrote in CUDA. © 2011 AISTI.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time. This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problem and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.
Resumo:
Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.
Resumo:
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems. © 2014 Technical University of Munich (TUM).
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
We present Tethered Monte Carlo, a simple, general purpose method of computing the effective potential of the order parameter (Helmholtz free energy). This formalism is based on a new statistical ensemble, closely related to the micromagnetic one, but with an extended configuration space (through Creutz-like demons). Canonical averages for arbitrary values of the external magnetic field are computed without additional simulations. The method is put to work in the two-dimensional Ising model, where the existence of exact results enables us to perform high precision checks. A rather peculiar feature of our implementation, which employs a local Metropolis algorithm, is the total absence, within errors, of critical slowing down for magnetic observables. Indeed, high accuracy results are presented for lattices as large as L = 1024.
Resumo:
We present Tethered Monte Carlo, a simple, general purpose method of computing the effective potential of the order parameter (Helmholtz free energy). This formalism is based on a new statistical ensemble, closely related to the micromagnetic one, but with an extended configuration space (through Creutz-like demons). Canonical averages for arbitrary values of the external magnetic field are computed without additional simulations. The method is put to work in the two-dimensional Ising model, where the existence of exact results enables us to perform high precision checks. A rather peculiar feature of our implementation, which employs a local Metropolis algorithm, is the total absence, within errors, of critical slowing down for magnetic observables. Indeed, high accuracy results are presented for lattices as large as L = 1024.
Resumo:
This thesis presents the formal definition of a novel Mobile Cloud Computing (MCC) extension of the Networked Autonomic Machine (NAM) framework, a general-purpose conceptual tool which describes large-scale distributed autonomic systems. The introduction of autonomic policies in the MCC paradigm has proved to be an effective technique to increase the robustness and flexibility of MCC systems. In particular, autonomic policies based on continuous resource and connectivity monitoring help automate context-aware decisions for computation offloading. We have also provided NAM with a formalization in terms of a transformational operational semantics in order to fill the gap between its existing Java implementation NAM4J and its conceptual definition. Moreover, we have extended NAM4J by adding several components with the purpose of managing large scale autonomic distributed environments. In particular, the middleware allows for the implementation of peer-to-peer (P2P) networks of NAM nodes. Moreover, NAM mobility actions have been implemented to enable the migration of code, execution state and data. Within NAM4J, we have designed and developed a component, denoted as context bus, which is particularly useful in collaborative applications in that, if replicated on each peer, it instantiates a virtual shared channel allowing nodes to notify and get notified about context events. Regarding the autonomic policies management, we have provided NAM4J with a rule engine, whose purpose is to allow a system to autonomously determine when offloading is convenient. We have also provided NAM4J with trust and reputation management mechanisms to make the middleware suitable for applications in which such aspects are of great interest. To this purpose, we have designed and implemented a distributed framework, denoted as DARTSense, where no central server is required, as reputation values are stored and updated by participants in a subjective fashion. We have also investigated the literature regarding MCC systems. The analysis pointed out that all MCC models focus on mobile devices, and consider the Cloud as a system with unlimited resources. To contribute in filling this gap, we defined a modeling and simulation framework for the design and analysis of MCC systems, encompassing both their sides. We have also implemented a modular and reusable simulator of the model. We have applied the NAM principles to two different application scenarios. First, we have defined a hybrid P2P/cloud approach where components and protocols are autonomically configured according to specific target goals, such as cost-effectiveness, reliability and availability. Merging P2P and cloud paradigms brings together the advantages of both: high availability, provided by the Cloud presence, and low cost, by exploiting inexpensive peers resources. As an example, we have shown how the proposed approach can be used to design NAM-based collaborative storage systems based on an autonomic policy to decide how to distribute data chunks among peers and Cloud, according to cost minimization and data availability goals. As a second application, we have defined an autonomic architecture for decentralized urban participatory sensing (UPS) which bridges sensor networks and mobile systems to improve effectiveness and efficiency. The developed application allows users to retrieve and publish different types of sensed information by using the features provided by NAM4J's context bus. Trust and reputation is managed through the application of DARTSense mechanisms. Also, the application includes an autonomic policy that detects areas characterized by few contributors, and tries to recruit new providers by migrating code necessary to sensing, through NAM mobility actions.
Resumo:
WorldFIP is standardised as European Norm EN 50170 - General Purpose Field Communication System. Field communication systems (fieldbuses) started to be widely used as the communication support for distributed computer-controlled systems (DCCS), and are being used in all sorts of process control and manufacturing applications within different types of industries. There are several advantages in using fieldbuses as a replacement of for the traditional point-to-point links between sensors/actuators and computer-based control systems. Indeed they concern economical ones (cable savings) but, importantly, fieldbuses allow an increased decentralisation and distribution of the processing power over the field. Typically DCCS have real-time requirements that must be fulfilled. By this, we mean that process data must be transferred between network computing nodes within a maximum admissible time span. WorldFIP has very interesting mechanisms to schedule data transfers. It explicit distinguishes to types of traffic: periodic and aperiodic. In this paper we describe how WorldFIP handles these two types of traffic, and more importantly, we provide a comprehensive analysis for guaranteeing the real-time requirements of both types of traffic. A major contribution is made in the analysis of worst-case response time of aperiodic transfer requests.
Resumo:
Graphics processors were originally developed for rendering graphics but have recently evolved towards being an architecture for general-purpose computations. They are also expected to become important parts of embedded systems hardware -- not just for graphics. However, this necessitates the development of appropriate timing analysis techniques which would be required because techniques developed for CPU scheduling are not applicable. The reason is that we are not interested in how long it takes for any given GPU thread to complete, but rather how long it takes for all of them to complete. We therefore develop a simple method for finding an upper bound on the makespan of a group of GPU threads executing the same program and competing for the resources of a single streaming multiprocessor (whose architecture is based on NVIDIA Fermi, with some simplifying assunptions). We then build upon this method to formulate the derivation of the exact worst-case makespan (and corresponding schedule) as an optimization problem. Addressing the issue of tractability, we also present a technique for efficiently computing a safe estimate of the worstcase makespan with minimal pessimism, which may be used when finding an exact value would take too long.
Resumo:
Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia