878 resultados para Parallel processing (Electronic computers)
Resumo:
Abstract not available
Resumo:
The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.
Resumo:
This chapter describes a parallel optimization technique that incorporates a distributed load-balancing algorithm and provides an extremely fast solution to the problem of load-balancing adaptive unstructured meshes. Moreover, a parallel graph contraction technique can be employed to enhance the partition quality and the resulting strategy outperforms or matches results from existing state-of-the-art static mesh partitioning algorithms. The strategy can also be applied to static partitioning problems. Dynamic procedures have been found to be much faster than static techniques, to provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data. The method employs a new iterative optimization technique that balances the workload and attempts to minimize the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more quickly. The dynamic evolution of load has three major influences on possible partitioning techniques; cost, reuse, and parallelism. The unstructured mesh may be modified every few time-steps and so the load-balancing must have a low cost relative to that of the solution algorithm in between remeshing.
Resumo:
Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering.
Resumo:
A method is outlined for optimising graph partitions which arise in mapping un- structured mesh calculations to parallel computers. The method employs a combination of iterative techniques to both evenly balance the workload and minimise the number and volume of interprocessor communications. They are designed to work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. The algorithms can also be used for dynamic load-balancing and a clustering technique can additionally be employed to speed up the whole process. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds.
Resumo:
Die Nützlichkeit des Einsatzes von Computern in Schule und Ausbildung ist schon seit einigen Jahren unbestritten. Uneinigkeit herrscht gegenwärtig allerdings darüber, welche Aufgaben von Computern eigenständig wahrgenommen werden können. Bewertet man die Übernahme von Lehrfunktionen durch computerbasierte Lehrsysteme, müssen häufig Mängel festgestellt werden. Das Ziel der vorliegenden Arbeit ist es, ausgehend von aktuellen Praxisrealisierungen computerbasierter Lehrsysteme unterschiedliche Klassen von zentralen Lehrkompetenzen (Schülermodellierung, Fachwissen und instruktionale Aktivitäten im engeren Sinne) zu bestimmen. Innerhalb jeder Klasse werden globale Leistungen der Lehrsysteme und notwendige, in komplementärer Relation stehende Tätigkeiten menschlicher Tutoren bestimmt. Das dabei entstandene Klassifikationsschema erlaubt sowohl die Einordnung typischer Lehrsysteme als auch die Feststellung von spezifischen Kompetenzen, die in der Lehrer- bzw. Trainerausbildung zukünftig vermehrt berücksichtigt werden sollten. (DIPF/Orig.)
Resumo:
A method is outlined for optimising graph partitions which arise in mapping unstructured mesh calculations to parallel computers. The method employs a relative gain iterative technique to both evenly balance the workload and minimise the number and volume of interprocessor communications. A parallel graph reduction technique is also briefly described and can be used to give a global perspective to the optimisation. The algorithms work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds. The algorithms can also be used for dynamic load-balancing, reusing existing partitions and in this case the procedures are much faster than static techniques, provide partitions of similar or higher quality and, in comparison, involve the migration of a fraction of the data.
Resumo:
The use of the term "Electronic Publishing" transcends any notions of the paperless office and of a purely electronic transfer and dissemination of information over networks. It now encompasses all computer-assisted methods for the production of documents and includes the imaging of a document on paper as one of the options to be provided by an integrated processing scheme. Electronic publishing draws heavily on techniques from computer science and information technology but technical, legal, financial and organisational problems have to be overcome before it can replace traditional publication mechanisms. These problems are illustrated with reference to the publication arrangements for the journal `Electronic Publishing Origination, Dissemination and Design'. The authors of this paper are the co-editors of this journal, which appears in traditional form and relies on a wide variety of support from electronic technologies in the pre-publication phase.
Resumo:
Electronic Publishing -- Origination, Dissemination and Design (EP-odd) is an academic journal which publishes refereed papers in the subject area of electronic publishing. The authors of the present paper are, respectively, editor-in-chief, system software consultant and senior production manager for the journal. EP-odd's policy is that editors, authors, referees and production staff will work closely together using electronic mail. Authors are also encouraged to originate their papers using one of the approved text-processing packages together with the appropriate set of macros which enforce the layout style for the journal. This same software will then be used by the publisher in the production phase. Our experiences with these strategies are presented, and two recently developed suites of software are described: one of these makes the macro sets available over electronic mail and the other automates the flow of papers through the refereeing process. The decision to produce EP-odd in this way means that the publisher has to adopt production procedures which differ markedly from those employed for a conventional journal.
Resumo:
Virtual Screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. Most VS methods suppose a unique binding site for the target, but it has been demonstrated that diverse ligands interact with unrelated parts of the target and many VS methods do not take into account this relevant fact. This problem is circumvented by a novel VS methodology named BINDSURF that scans the whole protein surface to find new hotspots, where ligands might potentially interact with, and which is implemented in massively parallel Graphics Processing Units, allowing fast processing of large ligand databases. BINDSURF can thus be used in drug discovery, drug design, drug repurposing and therefore helps considerably in clinical research. However, the accuracy of most VS methods is constrained by limitations in the scoring function that describes biomolecular interactions, and even nowadays these uncertainties are not completely understood. In order to solve this problem, we propose a novel approach where neural networks are trained with databases of known active (drugs) and inactive compounds, and later used to improve VS predictions.
Resumo:
Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.
Resumo:
Dissertação (mestrado)–Universidade de Brasília, Universidade UnB de Planaltina, Programa de Pós-Graduação em Ciência de Materiais, 2015.
Resumo:
Due to the sensitive nature of patient data, the secondary use of electronic health records (EHR) is restricted in scientific research and product development. Such restrictions pursue to preserve the privacy of respective patients by limiting the availability and variety of sensitive patient data. Current limitations do not correspond with the actual needs requested by the potential secondary users. In this thesis, the secondary use of Finnish and Swedish EHR data is explored for the purpose of enhancing the availability of such data for clinical research and product development. Involved EHR-related procedures and technologies are analysed to identify the issues limiting the secondary use of patient data. Successful secondary use of patient data increases the data value. To explore the identified circumstances, a case study of potential secondary users and use intentions regarding EHR data was carried out in Finland and Sweden. The data collection for the conducted case study was performed using semi-structured interviews. In total, 14 Finnish and Swedish experts representing scientific research, health management, and business were interviewed. The motivation for the corresponding interviews was to evaluate the protection of EHR data used for secondary purposes. The efficiency of implemented procedures and technologies was analysed in terms of data availability and privacy preserving. The results of the conducted case study show that the factors affecting EHR availability are divided to three categories: management of patient data, preservation of patients' privacy, and potential secondary users. Identified issues regarding data management included laborious and inconsistent data request procedures and the role and effect of external service providers. Based on the study findings, two secondary use approaches enabling the secondary use of EHR data are identified: data alteration and protected processing environment. Data alteration increases the availability of relevant EHR data, further decreasing the value of such data. Protected processing approach restricts the amount of potential users and use intentions while providing more valuable data content.
Resumo:
Integrated circuit scaling has enabled a huge growth in processing capability, which necessitates a corresponding increase in inter-chip communication bandwidth. As bandwidth requirements for chip-to-chip interconnection scale, deficiencies of electrical channels become more apparent. Optical links present a viable alternative due to their low frequency-dependent loss and higher bandwidth density in the form of wavelength division multiplexing. As integrated photonics and bonding technologies are maturing, commercialization of hybrid-integrated optical links are becoming a reality. Increasing silicon integration leads to better performance in optical links but necessitates a corresponding co-design strategy in both electronics and photonics. In this light, holistic design of high-speed optical links with an in-depth understanding of photonics and state-of-the-art electronics brings their performance to unprecedented levels. This thesis presents developments in high-speed optical links by co-designing and co-integrating the primary elements of an optical link: receiver, transmitter, and clocking.
In the first part of this thesis a 3D-integrated CMOS/Silicon-photonic receiver will be presented. The electronic chip features a novel design that employs a low-bandwidth TIA front-end, double-sampling and equalization through dynamic offset modulation. Measured results show -14.9dBm of sensitivity and energy efficiency of 170fJ/b at 25Gb/s. The same receiver front-end is also used to implement source-synchronous 4-channel WDM-based parallel optical receiver. Quadrature ILO-based clocking is employed for synchronization and a novel frequency-tracking method that exploits the dynamics of IL in a quadrature ring oscillator to increase the effective locking range. An adaptive body-biasing circuit is designed to maintain the per-bit-energy consumption constant across wide data-rates. The prototype measurements indicate a record-low power consumption of 153fJ/b at 32Gb/s. The receiver sensitivity is measured to be -8.8dBm at 32Gb/s.
Next, on the optical transmitter side, three new techniques will be presented. First one is a differential ring modulator that breaks the optical bandwidth/quality factor trade-off known to limit the speed of high-Q ring modulators. This structure maintains a constant energy in the ring to avoid pattern-dependent power droop. As a first proof of concept, a prototype has been fabricated and measured up to 10Gb/s. The second technique is thermal stabilization of micro-ring resonator modulators through direct measurement of temperature using a monolithic PTAT temperature sensor. The measured temperature is used in a feedback loop to adjust the thermal tuner of the ring. A prototype is fabricated and a closed-loop feedback system is demonstrated to operate at 20Gb/s in the presence of temperature fluctuations. The third technique is a switched-capacitor based pre-emphasis technique designed to extend the inherently low bandwidth of carrier injection micro-ring modulators. A measured prototype of the optical transmitter achieves energy efficiency of 342fJ/bit at 10Gb/s and the wavelength stabilization circuit based on the monolithic PTAT sensor consumes 0.29mW.
Lastly, a first-order frequency synthesizer that is suitable for high-speed on-chip clock generation will be discussed. The proposed design features an architecture combining an LC quadrature VCO, two sample-and-holds, a PI, digital coarse-tuning, and rotational frequency detection for fine-tuning. In addition to an electrical reference clock, as an extra feature, the prototype chip is capable of receiving a low jitter optical reference clock generated by a high-repetition-rate mode-locked laser. The output clock at 8GHz has an integrated RMS jitter of 490fs, peak-to-peak periodic jitter of 2.06ps, and total RMS jitter of 680fs. The reference spurs are measured to be –64.3dB below the carrier frequency. At 8GHz the system consumes 2.49mW from a 1V supply.
Resumo:
Physiological signals, which are controlled by the autonomic nervous system (ANS), could be used to detect the affective state of computer users and therefore find applications in medicine and engineering. The Pupil Diameter (PD) seems to provide a strong indication of the affective state, as found by previous research, but it has not been investigated fully yet. In this study, new approaches based on monitoring and processing the PD signal for off-line and on-line affective assessment (“relaxation” vs. “stress”) are proposed. Wavelet denoising and Kalman filtering methods are first used to remove abrupt changes in the raw Pupil Diameter (PD) signal. Then three features (PDmean, PDmax and PDWalsh) are extracted from the preprocessed PD signal for the affective state classification. In order to select more relevant and reliable physiological data for further analysis, two types of data selection methods are applied, which are based on the paired t-test and subject self-evaluation, respectively. In addition, five different kinds of the classifiers are implemented on the selected data, which achieve average accuracies up to 86.43% and 87.20%, respectively. Finally, the receiver operating characteristic (ROC) curve is utilized to investigate the discriminating potential of each individual feature by evaluation of the area under the ROC curve, which reaches values above 0.90. For the on-line affective assessment, a hard threshold is implemented first in order to remove the eye blinks from the PD signal and then a moving average window is utilized to obtain the representative value PDr for every one-second time interval of PD. There are three main steps for the on-line affective assessment algorithm, which are preparation, feature-based decision voting and affective determination. The final results show that the accuracies are 72.30% and 73.55% for the data subsets, which were respectively chosen using two types of data selection methods (paired t-test and subject self-evaluation). In order to further analyze the efficiency of affective recognition through the PD signal, the Galvanic Skin Response (GSR) was also monitored and processed. The highest affective assessment classification rate obtained from GSR processing is only 63.57% (based on the off-line processing algorithm). The overall results confirm that the PD signal should be considered as one of the most powerful physiological signals to involve in future automated real-time affective recognition systems, especially for detecting the “relaxation” vs. “stress” states.