910 resultados para Run-Time
Resumo:
QR decomposition (QRD) is a widely used Numerical Linear Algebra (NLA) kernel with applications ranging from SONAR beamforming to wireless MIMO receivers. In this paper, we propose a novel Givens Rotation (GR) based QRD (GR QRD) where we reduce the computational complexity of GR and exploit higher degree of parallelism. This low complexity Column-wise GR (CGR) can annihilate multiple elements of a column of a matrix simultaneously. The algorithm is first realized on a Two-Dimensional (2 D) systolic array and then implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA). We benchmark the proposed implementation against state-of-the-art implementations to report better throughput, convergence and scalability.
Resumo:
In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultaneously achieves several objectives: (i) improved cache hit ratio, (ii) moving the tag storage overhead to DRAM, (iii) lower cache hit latency than tags-in-SRAM, and (iv) reduction in off-chip bandwidth wastage. The Bi-Modal Cache addresses the miss rate versus off-chip bandwidth dilemma by organizing the data in a bi-modal fashion - blocks with high spatial locality are organized as large blocks and those with little spatial locality as small blocks. By adaptively selecting the right granularity of storage for individual blocks at run-time, the proposed DRAM cache organization is able to make judicious use of the available DRAM cache capacity as well as reduce the off-chip memory bandwidth consumption. The Bi-Modal Cache improves cache hit latency despite moving the metadata to DRAM by means of a small SRAM based Way Locator. Further by leveraging the tremendous internal bandwidth and capacity that stacked DRAM organizations provide, the Bi-Modal Cache enables efficient concurrent accesses to tags and data to reduce hit time. Through detailed simulations, we demonstrate that the Bi-Modal Cache achieves overall performance improvement (in terms of Average Normalized Turnaround Time (ANTT)) of 10.8%, 13.8% and 14.0% in 4-core, 8-core and 16-core workloads respectively.
Resumo:
It was demonstrated in earlier work that, by approximating its range kernel using shiftable functions, the nonlinear bilateral filter can be computed using a series of fast convolutions. Previous approaches based on shiftable approximation have, however, been restricted to Gaussian range kernels. In this work, we propose a novel approximation that can be applied to any range kernel, provided it has a pointwise-convergent Fourier series. More specifically, we propose to approximate the Gaussian range kernel of the bilateral filter using a Fourier basis, where the coefficients of the basis are obtained by solving a series of least-squares problems. The coefficients can be efficiently computed using a recursive form of the QR decomposition. By controlling the cardinality of the Fourier basis, we can obtain a good tradeoff between the run-time and the filtering accuracy. In particular, we are able to guarantee subpixel accuracy for the overall filtering, which is not provided by the most existing methods for fast bilateral filtering. We present simulation results to demonstrate the speed and accuracy of the proposed algorithm.
Resumo:
Transient test facilities offer the potential for the simultaneous study of turbine aerodynamic performance, unsteady flow phenomena and the heat transfer characteristics of a turbine stage. This paper describes the development of aerodynamic performance measurement techniques in the Oxford Rotor Facility (ORF). The solutions to the technological issues involved with transient testing presented in this paper are expected to achieve levels of precision uncertainty comparable with traditional steady flow test rigs. The theoretical background to the measurement of aerodynamic performance is presented together with a comprehensive pre-test uncertainty analysis. The instrumentation scheme for the measurement of stage mass flow rate is discussed in detail, the measurements of shaft power, total inlet enthalpy, and stage pressure ratio are also outlined. The current working section features a 62% scale, 1-1/2 stage, high-pressure shroudless transonic turbine. The required inlet flow conditions are provided by an Isentropic Light Piston Tunnel (ILPT) with a quasi-steady state run time of approximately 70ms. The testing is conducted at engine representative specific speed, pressure ratio, gas-to-wall temperature ratio, Mach number and Reynolds number.
Resumo:
A new three-dimensional Navier-Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes, but has been implemented to run on Graphics Processing Units (GPUs) instead of the traditional Central Processing Unit (CPU). The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. Scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 minutes on a cluster with four GPUs. Copyright © 2009 by ASME.
Resumo:
A type checking method for the functional language LFC is presented. A distinct feature of LFC is that it uses Context-Free (CF) languages as data types to represent compound data structures. This makes LFC a dynamically typed language. To improve efficiency, a practical type checking method is presented, which consists of both static and dynamic type checking. Although the inclusion relation of CF.languages is not decidable,a special subset of the relation is decidable, i.e., the sentential form relation, which can be statically checked.Moreover, most of the expressions in actual LFC programs appear to satisfy this relation according to the statistic data of experiments. So, despite that the static type checking is not complete, it undertakes most of the type checking task. Consequently the run-time efficiency is effectively improved. Another feature of the type checking is that it converts the expressions with implicit structures to structured representation. Structure reconstruction technique is presented.
Resumo:
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.
Resumo:
We have developed a compiler for the lexically-scoped dialect of LISP known as SCHEME. The compiler knows relatively little about specific data manipulation primitives such as arithmetic operators, but concentrates on general issues of environment and control. Rather than having specialized knowledge about a large variety of control and environment constructs, the compiler handles only a small basis set which reflects the semantics of lambda-calculus. All of the traditional imperative constructs, such as sequencing, assignment, looping, GOTO, as well as many standard LISP constructs such as AND, OR, and COND, are expressed in macros in terms of the applicative basis set. A small number of optimization techniques, coupled with the treatment of function calls as GOTO statements, serve to produce code as good as that produced by more traditional compilers. The macro approach enables speedy implementation of new constructs as desired without sacrificing efficiency in the generated code. A fair amount of analysis is devoted to determining whether environments may be stack-allocated or must be heap-allocated. Heap-allocated environments are necessary in general because SCHEME (unlike Algol 60 and Algol 68, for example) allows procedures with free lexically scoped variables to be returned as the values of other procedures; the Algol stack-allocation environment strategy does not suffice. The methods used here indicate that a heap-allocating generalization of the "display" technique leads to an efficient implementation of such "upward funargs". Moreover, compile-time optimization and analysis can eliminate many "funargs" entirely, and so far fewer environment structures need be allocated at run time than might be expected. A subset of SCHEME (rather than triples, for example) serves as the representation intermediate between the optimized SCHEME code and the final output code; code is expressed in this subset in the so-called continuation-passing style. As a subset of SCHEME, it enjoys the same theoretical properties; one could even apply the same optimizer used on the input code to the intermediate code. However, the subset is so chosen that all temporary quantities are made manifest as variables, and no control stack is needed to evaluate it. As a result, this apparently applicative representation admits an imperative interpretation which permits easy transcription to final imperative machine code. These qualities suggest that an applicative language like SCHEME is a better candidate for an UNCOL than the more imperative candidates proposed to date.
Resumo:
Generic object-oriented programming languages combine parametric polymorphism and nominal subtype polymorphism, thereby providing better data abstraction, greater code reuse, and fewer run-time errors. However, most generic object-oriented languages provide a straightforward combination of the two kinds of polymorphism, which prevents the expression of advanced type relationships. Furthermore, most generic object-oriented languages have a type-erasure semantics: instantiations of type parameters are not available at run time, and thus may not be used by type-dependent operations. This dissertation shows that two features, which allow the expression of many advanced type relationships, can be added to a generic object-oriented programming language without type erasure: 1. type variables that are not parameters of the class that declares them, and 2. extension that is dependent on the satisfiability of one or more constraints. We refer to the first feature as hidden type variables and the second feature as conditional extension. Hidden type variables allow: covariance and contravariance without variance annotations or special type arguments such as wildcards; a single type to extend, and inherit methods from, infinitely many instantiations of another type; a limited capacity to augment the set of superclasses after that class is defined; and the omission of redundant type arguments. Conditional extension allows the properties of a collection type to be dependent on the properties of its element type. This dissertation describes the semantics and implementation of hidden type variables and conditional extension. A sound type system is presented. In addition, a sound and terminating type checking algorithm is presented. Although designed for the Fortress programming language, hidden type variables and conditional extension can be incorporated into other generic object-oriented languages. Many of the same problems would arise, and solutions analogous to those we present would apply.
Resumo:
As the commoditization of sensing, actuation and communication hardware increases, so does the potential for dynamically tasked sense and respond networked systems (i.e., Sensor Networks or SNs) to replace existing disjoint and inflexible special-purpose deployments (closed-circuit security video, anti-theft sensors, etc.). While various solutions have emerged to many individual SN-centric challenges (e.g., power management, communication protocols, role assignment), perhaps the largest remaining obstacle to widespread SN deployment is that those who wish to deploy, utilize, and maintain a programmable Sensor Network lack the programming and systems expertise to do so. The contributions of this thesis centers on the design, development and deployment of the SN Workbench (snBench). snBench embodies an accessible, modular programming platform coupled with a flexible and extensible run-time system that, together, support the entire life-cycle of distributed sensory services. As it is impossible to find a one-size-fits-all programming interface, this work advocates the use of tiered layers of abstraction that enable a variety of high-level, domain specific languages to be compiled to a common (thin-waist) tasking language; this common tasking language is statically verified and can be subsequently re-translated, if needed, for execution on a wide variety of hardware platforms. snBench provides: (1) a common sensory tasking language (Instruction Set Architecture) powerful enough to express complex SN services, yet simple enough to be executed by highly constrained resources with soft, real-time constraints, (2) a prototype high-level language (and corresponding compiler) to illustrate the utility of the common tasking language and the tiered programming approach in this domain, (3) an execution environment and a run-time support infrastructure that abstract a collection of heterogeneous resources into a single virtual Sensor Network, tasked via this common tasking language, and (4) novel formal methods (i.e., static analysis techniques) that verify safety properties and infer implicit resource constraints to facilitate resource allocation for new services. This thesis presents these components in detail, as well as two specific case-studies: the use of snBench to integrate physical and wireless network security, and the use of snBench as the foundation for semester-long student projects in a graduate-level Software Engineering course.
Resumo:
The SNBENCH is a general-purpose programming environment and run-time system targeted towards a variety of Sensor applications such as environmental sensing, location sensing, video sensing, etc. In its current structure, the run-time engine of the SNBENCH namely, the Sensorium Execution Environment (SXE) processes the entities of execution in a single thread of operation. In order to effectively support applications that are time-sensitive and need priority, it is imperative to process the tasks discretely so that specific policies can be applied at a much granular level. The goal of this project was to modify the SXE to enable efficient use of system resources by way of multi-tasking the individual components. Additionally, the transformed SXE offers the ability to classify and employ different schemes of processing to the individual tasks.
Resumo:
We present two algorithms for computing distances along a non-convex polyhedral surface. The first algorithm computes exact minimal-geodesic distances and the second algorithm combines these distances to compute exact shortest-path distances along the surface. Both algorithms have been extended to compute the exact minimalgeodesic paths and shortest paths. These algorithms have been implemented and validated on surfaces for which the correct solutions are known, in order to verify the accuracy and to measure the run-time performance, which is cubic or less for each algorithm. The exact-distance computations carried out by these algorithms are feasible for large-scale surfaces containing tens of thousands of vertices, and are a necessary component of near-isometric surface flattening methods that accurately transform curved manifolds into flat representations.
Resumo:
Existing Building/Energy Management Systems (BMS/EMS) fail to convey holistic performance to the building manager. A 20% reduction in energy consumption can be achieved by efficiently operated buildings compared with current practice. However, in the majority of buildings, occupant comfort and energy consumption analysis is primarily restricted by available sensor and meter data. Installation of a continuous monitoring process can significantly improve the building systems’ performance. We present WSN-BMDS, an IP-based wireless sensor network building monitoring and diagnostic system. The main focus of WSN-BMDS is to obtain much higher degree of information about the building operation then current BMSs are able to provide. Our system integrates a heterogeneous set of wireless sensor nodes with IEEE 802.11 backbone routers and the Global Sensor Network (GSN) web server. Sensing data is stored in a database at the back office via UDP protocol and can be access over the Internet using GSN. Through this demonstration, we show that WSN-BMDS provides accurate measurements of air-temperature, air-humidity, light, and energy consumption for particular rooms in our target building. Our interactive graphical user interface provides a user-friendly environment showing live network topology, monitor network statistics, and run-time management actions on the network. We also demonstrate actuation by changing the artificial light level in one of the rooms.
Resumo:
The manual effort required to convert sequential computational mechanics programs into a useful, scalable parallel form is considerable. Tools that can assist in the conversion process are clearly required. Computer aided parallelisation tools (CAPTools) have been developed to generate efficient parallel code for real world structured grid application codes such as Computational Fluid Dynamics. Automatable single-program multi-data (SPMD) overlapping domain decomposition (DD) techniques established for structured grid codes have been adapted by the authors to manually parallelise unstructured mesh applications. Inspector loops have been used to provide generic techniques for the run-time support necessary to extend the capabilities of CAPTools to automatic implementation of SPMD DD techniques in the parallelisation of unstructured mesh codes. Copyright © 1999 John Wiley & Sons, Ltd.
Resumo:
Virtual manufacturing and design assessment increasingly involve the simulation of interacting phenomena, sic. multi-physics, an activity which is very computationally intensive. This chapter describes an attempt to address the parallel issues associated with a multi-physics simulation approach based upon a range of compatible procedures operating on one mesh using a single database - the distinct physics solvers can operate separately or coupled on sub-domains of the whole geometric space. Moreover, the finite volume unstructured mesh solvers use different discretization schemes (and, particularly, different ‘nodal’ locations and control volumes). A two-level approach to the parallelization of this simulation software is described: the code is restructured into parallel form on the basis of the mesh partitioning alone, that is, without regard to the physics. However, at run time, the mesh is partitioned to achieve a load balance, by considering the load per node/element across the whole domain. The latter of course is determined by the problem specific physics at a particular location.