937 resultados para Computing performance
Resumo:
Monimutkaisen tietokonejärjestelmän suorituskykyoptimointi edellyttää järjestelmän ajonaikaisen käyttäytymisen ymmärtämistä. Ohjelmiston koon ja monimutkaisuuden kasvun myötä suorituskykyoptimointi tulee yhä tärkeämmäksi osaksi tuotekehitysprosessia. Tehokkaampien prosessorien käytön myötä myös energiankulutus ja lämmöntuotto ovat nousseet yhä suuremmiksi ongelmiksi, erityisesti pienissä, kannettavissa laitteissa. Lämpö- ja energiaongelmien rajoittamiseksi on kehitetty suorituskyvyn skaalausmenetelmiä, jotka edelleen lisäävät järjestelmän kompleksisuutta ja suorituskykyoptimoinnin tarvetta. Tässä työssä kehitettiin visualisointi- ja analysointityökalu ajonaikaisen käyttäytymisen ymmärtämisen helpottamiseksi. Lisäksi kehitettiin suorituskyvyn mitta, joka mahdollistaa erilaisten skaalausmenetelmien vertailun ja arvioimisen suoritusympäristöstä riippumatta, perustuen joko suoritustallenteen tai teoreettiseen analyysiin. Työkalu esittää ajonaikaisesti kerätyn tallenteen helposti ymmärrettävällä tavalla. Se näyttää mm. prosessit, prosessorikuorman, skaalausmenetelmien toiminnan sekä energiankulutuksen kolmiulotteista grafiikkaa käyttäen. Työkalu tuottaa myös käyttäjän valitsemasta osasta suorituskuvaa numeerista tietoa, joka sisältää useita oleellisia suorituskykyarvoja ja tilastotietoa. Työkalun sovellettavuutta tarkasteltiin todellisesta laitteesta saatua suoritustallennetta sekä suorituskyvyn skaalauksen simulointia analysoimalla. Skaalausmekanismin parametrien vaikutus simuloidun laitteen suorituskykyyn analysoitiin.
Resumo:
Gaia is the most ambitious space astrometry mission currently envisaged and is a technological challenge in all its aspects. We describe a proposal for the payload data handling system of Gaia, as an example of a high-performance, real-time, concurrent, and pipelined data system. This proposal includes the front-end systems for the instrumentation, the data acquisition and management modules, the star data processing modules, and the payload data handling unit. We also review other payload and service module elements and we illustrate a data flux proposal.
Resumo:
Our efforts are directed towards the understanding of the coscheduling mechanism in a NOW system when a parallel job is executed jointly with local workloads, balancing parallel performance against the local interactive response. Explicit and implicit coscheduling techniques in a PVM-Linux NOW (or cluster) have been implemented. Furthermore, dynamic coscheduling remains an open question when parallel jobs are executed in a non-dedicated Cluster. A basis model for dynamic coscheduling in Cluster systems is presented in this paper. Also, one dynamic coscheduling algorithm for this model is proposed. The applicability of this algorithm has been proved and its performance analyzed by simulation. Finally, a new tool (named Monito) for monitoring the different queues of messages in such an environments is presented. The main aim of implementing this facility is to provide a mean of capturing the bottlenecks and overheads of the communication system in a PVM-Linux cluster.
Resumo:
We consider the numerical treatment of the optical flow problem by evaluating the performance of the trust region method versus the line search method. To the best of our knowledge, the trust region method is studied here for the first time for variational optical flow computation. Four different optical flow models are used to test the performance of the proposed algorithm combining linear and nonlinear data terms with quadratic and TV regularization. We show that trust region often performs better than line search; especially in the presence of non-linearity and non-convexity in the model.
Resumo:
Peer-reviewed
Resumo:
In accordance with the Moore's law, the increasing number of on-chip integrated transistors has enabled modern computing platforms with not only higher processing power but also more affordable prices. As a result, these platforms, including portable devices, work stations and data centres, are becoming an inevitable part of the human society. However, with the demand for portability and raising cost of power, energy efficiency has emerged to be a major concern for modern computing platforms. As the complexity of on-chip systems increases, Network-on-Chip (NoC) has been proved as an efficient communication architecture which can further improve system performances and scalability while reducing the design cost. Therefore, in this thesis, we study and propose energy optimization approaches based on NoC architecture, with special focuses on the following aspects. As the architectural trend of future computing platforms, 3D systems have many bene ts including higher integration density, smaller footprint, heterogeneous integration, etc. Moreover, 3D technology can signi cantly improve the network communication and effectively avoid long wirings, and therefore, provide higher system performance and energy efficiency. With the dynamic nature of on-chip communication in large scale NoC based systems, run-time system optimization is of crucial importance in order to achieve higher system reliability and essentially energy efficiency. In this thesis, we propose an agent based system design approach where agents are on-chip components which monitor and control system parameters such as supply voltage, operating frequency, etc. With this approach, we have analysed the implementation alternatives for dynamic voltage and frequency scaling and power gating techniques at different granularity, which reduce both dynamic and leakage energy consumption. Topologies, being one of the key factors for NoCs, are also explored for energy saving purpose. A Honeycomb NoC architecture is proposed in this thesis with turn-model based deadlock-free routing algorithms. Our analysis and simulation based evaluation show that Honeycomb NoCs outperform their Mesh based counterparts in terms of network cost, system performance as well as energy efficiency.
Resumo:
Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.
Resumo:
Systems which employ underwater acoustic energy for observation or communication are called sonar systems. The active and passive sonars are the two types of systems used for the detection and localisation of targets in underwater. Active sonar involves the transmission of an acoustic signal which, when reflected from a target, provides the sonar receiver with a basis for the detection and estimation. Passive sonar bases its detection and estimation on sounds which emanate from the target itself--Machinery noise, flow noise, transmission from its own active sonar etc.Electroacoustic transducers are used in sonar systems for the transmission and detection of acoustic energy. The transducer which is used for the transmission of acoustic energy is called projector and the one used for reception is called hydrophone. Since a single transducer is not sufficient enough for long range and directional transmission, a properly distributed array of transducers are to be used [9-11].The need and requirement for spatial processing to generate the most favourable directivity patterns for transducer systems used in underwater applications have already been analysed by several investigators [12-21].The desired directivity pattern can be either generated by the use of suitable focussing techniques or by an array of non-directional sensor elements, whose arrangements, spacing and the mode of excitation provide the required radiation pattern or by the combination of these.While computing that the directivity pattern, it is assumed strength of the elements are unaffected by the the source acoustic pressure at each source. However, in closely packed a r r a y s , the acoustic interaction effects experienced among the elements will modify the behaviour of individual elements and in turn will reduce the acoust ic source leve 1 wi t h respect to the maximum t heoret i cal va 1ue a s well as degrade the beam pa t tern. Th i s ef fect shou 1d be reduced in systems that are intended to generate high acoustic power output and unperturbed beam patterns [2,22-31].The work herein presented includes an approach for designing efficient and well behaved underwater transd~cer arrays, taking into account the acoustic interaction effect experienced among the closely packed multielement arrays.Architectural modifications reducing the interaction effect different radiating apertures.
Resumo:
The proliferation of wireless sensor networks in a large spectrum of applications had been spurered by the rapid advances in MEMS(micro-electro mechanical systems )based sensor technology coupled with low power,Low cost digital signal processors and radio frequency circuits.A sensor network is composed of thousands of low cost and portable devices bearing large sensing computing and wireless communication capabilities. This large collection of tiny sensors can form a robust data computing and communication distributed system for automated information gathering and distributed sensing.The main attractive feature is that such a sensor network can be deployed in remote areas.Since the sensor node is battery powered,all the sensor nodes should collaborate together to form a fault tolerant network so as toprovide an efficient utilization of precious network resources like wireless channel,memory and battery capacity.The most crucial constraint is the energy consumption which has become the prime challenge for the design of long lived sensor nodes.
Resumo:
Decimal multiplication is an integral part of financial, commercial, and internet-based computations. This paper presents a novel double digit decimal multiplication (DDDM) technique that offers low latency and high throughput. This design performs two digit multiplications simultaneously in one clock cycle. Double digit fixed point decimal multipliers for 7digit, 16 digit and 34 digit are simulated using Leonardo Spectrum from Mentor Graphics Corporation using ASIC Library. The paper also presents area and delay comparisons for these fixed point multipliers on Xilinx, Altera, Actel and Quick logic FPGAs. This multiplier design can be extended to support decimal floating point multiplication for IEEE 754- 2008 standard.
Resumo:
General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special-purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP-space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine- grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite- state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time-switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register-file building blocks interconnected via a byte-wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20x the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures.
Resumo:
The Java language first came to public attention in 1995. Within a year, it was being speculated that Java may be a good language for parallel and distributed computing. Its core features, including being objected oriented and platform independence, as well as having built-in network support and threads, has encouraged this view. Today, Java is being used in almost every type of computer-based system, ranging from sensor networks to high performance computing platforms, and from enterprise applications through to complex research-based.simulations. In this paper the key features that make Java a good language for parallel and distributed computing are first discussed. Two Java-based middleware systems, namely MPJ Express, an MPI-like Java messaging system, and Tycho, a wide-area asynchronous messaging framework with an integrated virtual registry are then discussed. The paper concludes by highlighting the advantages of using Java as middleware to support distributed applications.
Resumo:
This paper describes a region-based algorithm for deriving a concise description of a first order optical flow field. The algorithm described achieves performance improvements over existing algorithms without compromising the accuracy of the flow field values calculated. These improvements are brought about by not computing the entire flow field between two consecutive images, but by considering only the flow vectors of a selected subset of the images. The algorithm is presented in the context of a project to balance a bipedal robot using visual information.