932 resultados para inertial transformations
Resumo:
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 suite that we used in our work. Current GPUs therefore allow concurrent execution of kernels to improve utilization. In this work, we study concurrent execution of GPU kernels using multiprogram workloads on current NVIDIA Fermi GPUs. On two-program workloads from the Parboil2 benchmark suite we find concurrent execution is often no better than serialized execution. We identify that the lack of control over resource allocation to kernels is a major serialization bottleneck. We propose transformations that convert CUDA kernels into elastic kernels which permit fine-grained control over their resource usage. We then propose several elastic-kernel aware concurrency policies that offer significantly better performance and concurrency compared to the current CUDA policy. We evaluate our proposals on real hardware using multiprogrammed workloads constructed from benchmarks in the Parboil 2 suite. On average, our proposals increase system throughput (STP) by 1.21x and improve the average normalized turnaround time (ANTT) by 3.73x for two-program workloads when compared to the current CUDA concurrency implementation.
Resumo:
Porous fungus-like ZnO nanostructures have been synthesized by simple thermal annealing of the hydrothermally synthesized sheet-like ZnS(en)(0.5) complex precursor in air at 600 degrees C. Structural and morphological changes occurring during ZnS(en)(0.5) -> ZnS -> ZnO transformations have been observed closely by annealing the as-synthesized precursor at 100-600 degrees C. Wurtzite ZnS nanosheets and ZnS-ZnO composites are obtained at temperatures of 400 degrees C and 500 degrees C, respectively. Thermal decomposition and oxidation of the ZnS(en) 0.5 nanosheets have been confirmed by differential scanning calorimetry and thermo-gravimetric analysis. The visible light driven photocatalytic degradation of methylene blue dye has been demonstrated in the synthesized samples. ZnS-ZnO composite shows the highest dye degradation efficiency of 74% due to the formation of surface complex as well as higher visible light absorption as a result of band-gap narrowing effect. The porous ZnO nanostructures show efficient visible photoluminescence (PL) emission with a colour coordinate of (0.29, 0.35), which is close to that of white light (0.33, 0.33). The efficient visible PL emission as well as visible light driven photocatalytic activity of the materials synthesized in the present work might be very attractive for their applications in future optoelectronic devices, including in white light emitting devices.
Resumo:
In this report, we investigate the problem of applying a range constraint in order to reduce the systematic heading drift in a foot-mounted inertial navigation system (INS) (motion-tracking). We make use of two foot-mounted INS, one on each foot, which are aided with zero-velocity detectors. A novel algorithm is proposed in order to reduce the systematic heading drift. The proposed algorithm is based on the idea that the separation between the two feet at any given instance must always lie within a sphere of radius equal to the maximum possible spatial separation between the two feet. A Kalman filter, getting one measurement update and two observation updates is used in this algorithm.
Resumo:
Cu2SnS3 films have been processed by the sol-gel route. Differential Scanning Calorimetry (DSC) study was done to observe the phase transformations and to ascertain the deposition temperature. X-ray diffraction (XRD) confirms the phase formation of Cu2SnS3. The texture coefficient analysis shows the preferential orientation of the (112) facet. Scanning electron microscopy reveals the morphology of the film Energy Dispersive Spectroscopy (EDS) was used for compositional studies. Raman spectrum shows the peaks corresponding to the tetragonal phase of Cu2SnS3.
Resumo:
Experimental and simulation studies have uncovered at least two anomalous concentration regimes in water-dimethyl sulfoxide (DMSO) binary mixture whose precise origin has remained a subject of debate. In order to facilitate time domain experimental investigation of the dynamics of such binary mixtures, we explore strength or extent of influence of these anomalies in dipolar solvation dynamics by carrying out long molecular dynamics simulations over a wide range of DMSO concentration. The solvation time correlation function so calculated indeed displays strong composition dependent anomalies, reflected in pronounced non-exponential kinetics and non-monotonous composition dependence of the average solvation time constant. In particular, we find remarkable slow-down in the solvation dynamics around 10%-20% and 35%-50% mole percentage. We investigate microscopic origin of these two anomalies. The population distribution analyses of different structural morphology elucidate that these two slowing down are reflections of intriguing structural transformations in water-DMSO mixture. The structural transformations themselves can be explained in terms of a change in the relative coordination number of DMSO and water molecules, from 1DMSO:2H(2)O to 1H(2)O:1DMSO and 1H(2)O:2DMSO complex formation. Thus, while the emergence of first slow down (at 15% DMSO mole percentage) is due to the percolation among DMSO molecules supported by the water molecules (whose percolating network remains largely unaffected), the 2nd anomaly (centered on 40%-50%) is due to the formation of the network structure where the unit of 1DMSO:1H(2)O and 2DMSO:1H(2)O dominates to give rise to rich dynamical features. Through an analysis of partial solvation dynamics an interesting negative cross-correlation between water and DMSO is observed that makes an important contribution to relaxation at intermediate to longer times.
Resumo:
A droplet residing on a vibrating surface and in the pressure antinode of an asymmetric standing wave can spread radially outward and atomize. In this work, proper orthogonal decomposition through high speed imaging is shown to predict the likelihood of atomization for various viscous fluids based on prior information in the droplet spreading phase. Capillary instabilities are seen to affect ligament rupture. Viscous dissipation plays an important role in determining the wavelength of the most unstable mode during the inception phase of the ligaments. However, the highest ligament capillary number achieved was less than 1, and the influence of viscosity in the ligament growth and breakup phases is quite minimal. It is inferred from the data that the growth of a typical ligament is governed by a balance between the inertial force obtained from the inception phase and capillary forces. By including the effect of acoustic pressure field around the droplet, the dynamics of the ligament growth phase is revealed and the ligament growth profiles for different fluids are shown to collapse on a straight line using a new characteristic time scale.
Resumo:
Frohlich, Morchio and Strocchi long ago proved that the Lorentz invariance is spontaneously broken in QED because of infrared effects. We develop a simple model where the consequences of this breakdown can be explicitly and easily calculated. For this purpose, the superselected U(1) charge group of QED is extended to a superselected ``Sky'' group containing direction-dependent gauge transformations at infinity. It is the analog of the Spi group of gravity. As Lorentz transformations do not commute with Sky, they are spontaneously broken. These Abelian considerations and model are extended to non-Abelian gauge symmetries. Basic issues regarding the observability of twisted non-Abelian gauge symmetries and of the asymptotic ADM symmetries of quantum gravity are raised.
Resumo:
Software transactional memory(STM) is a promising programming paradigm for shared memory multithreaded programs. While STM offers the promise of being less error-prone and more programmer friendly compared to traditional lock-based synchronization, it also needs to be competitive in performance in order for it to be adopted in mainstream software. A major source of performance overheads in STM is transactional aborts. Conflict resolution and aborting a transaction typically happens at the transaction level which has the advantage that it is automatic and application agnostic. However it has a substantial disadvantage in that STM declares the entire transaction as conflicting and hence aborts it and re-executes it fully, instead of partially re-executing only those part(s) of the transaction, which have been affected due to the conflict. This "Re-execute Everything" approach has a significant adverse impact on STM performance. In order to mitigate the abort overheads, we propose a compiler aided Selective Reconciliation STM (SR-STM) scheme, wherein certain transactional conflicts can be reconciled by performing partial re-execution of the transaction. Ours is a selective hybrid approach which uses compiler analysis to identify those data accesses which are legal and profitable candidates for reconciliation and applies partial re-execution only to these candidates selectively while other conflicting data accesses are handled by the default STM approach of abort and full re-execution. We describe the compiler analysis and code transformations required for supporting selective reconciliation. We find that SR-STM is effective in reducing the transactional abort overheads by improving the performance for a set of five STAMP benchmarks by 12.58% on an average and up to 22.34%.
Resumo:
Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
Resumo:
In this paper we present an approach to build a prototype. model of a first-responder localization system intended for disaster relief operations. This system is useful to monitor and track the positions of the first-responders in an indoor environment, where GPS is not available. Each member of the first responder team is equipped with two zero-velocity-update-aided inertial navigation systems, one on each foot, a camera mounted on a helmet, and a processing platform strapped around the waist of the first responder, which fuses the data from the different sensors. The fusion algorithm runs real-time on the processing platform. The video is also processed using the DSP core of the computing machine. The processed data consisting of position, velocity, heading information along with video streams is transmitted to the command and control system via a local infrastructure WiFi network. A centralized cooperative localization algorithm, utilizing the information from Ultra Wideband based inter-agent ranging devices combined with the position estimates and uncertainties of each first responder, has also been implemented.
Resumo:
The basic requirement for an autopilot is fast response and minimum steady state error for better guidance performance. The highly nonlinear nature of the missile dynamics due to the severe kinematic and inertial coupling of the missile airframe as well as the aerodynamics has been a challenge for an autopilot that is required to have satisfactory performance for all flight conditions in probable engagements. Dynamic inversion is very popular nonlinear controller for this kind of scenario. But the drawback of this controller is that it is sensitive to parameter perturbation. To overcome this problem, neural network has been used to capture the parameter uncertainty on line. The choice of basis function plays the major role in capturing the unknown dynamics. Here in this paper, many basis function has been studied for approximation of unknown dynamics. Cosine basis function has yield the best response compared to any other basis function for capturing the unknown dynamics. Neural network with Cosine basis function has improved the autopilot performance as well as robustness compared to Dynamic inversion without Neural network.
Resumo:
Hit-to-kill interception of high velocity spiraling target requires accurate state estimation of relative kinematic parameters describing spiralling motion. In this pa- per, spiraling target motion is captured by representing target acceleration through sinusoidal function in inertial frame. A nine state unscented Kalman filter (UKF) formulation is presented here with three relative positions, three relative velocities, spiraling frequency of target, inverse of ballistic coefficient and maneuvering coef-ficient. A key advantage of the target model presented here is that it is of generic nature and can capture spiraling as well as pure ballistic motions without any change of tuning parameters. Extensive Six-DOF simulation experiments, which includes a modified PN guidance and dynamic inversion based autopilot, show that near Hit-to-Kill performance can be obtained with noisy RF seeker measurements of gimbal angles, gimbal angle rates, range and range rate.
Resumo:
Electrophilic halogen-induced reactions of unactivated olefins are an important class of transformations, whose catalytic enantioselective variants have surfaced during the past few years as effective means of olefin heterodifunctionalization. This article covers important developments in the area of enantioselective halocyclizations, specifically in the context of the synthesis of nitrogenous heterocycles.
Resumo:
Various leg exercises have been recommended to prevent deep vein thrombosis (DVT), a condition where a blood clot forms in the deep veins, especially during long-haul flights. Accessing the benefit of each of these exercises in avoiding the DVT, which can be fatal, is important in the context of suggesting the correct and the most beneficial exercises. Present work aims at demonstrating the fiber Bragg grating (FBG)-based sensing methodology for measuring surface strains generated on the skin of the calf muscle to evaluate the suggested airline exercises to avoid DVT. As the dataset in the experiment involves multiple subjects performing these exercises, an inertial measurement unit has been used to validate the repetitiveness of each of the exercises. The surface strain on the calf muscle obtained using the FBG sensor, which is a measure of the calf muscle deformation, has been compared against the variation of blood velocity in the femoral vein of the thigh measured using a commercial electronic-phased array color Doppler ultrasound system. Apart from analyzing the effectiveness of suggested exercises, a new exercise which is more effective in terms of strain generated to avoid DVT is proposed and evaluated. (C) 2013 Society of Photo-Optical Instrumentation Engineers (SPIE)
Resumo:
Multi-GPU machines are being increasingly used in high-performance computing. Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and manage data on each GPU. Existing works that propose to automate data allocations for GPUs have limitations and inefficiencies in terms of allocation sizes, exploiting reuse, transfer costs, and scalability. We propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding-Box-based Memory Manager (BBMM). BBMM can perform at runtime, during standard set operations like union, intersection, and difference, finding subset and superset relations on hyperrectangular regions of array data (bounding boxes). It uses these operations along with some compiler assistance to identify, allocate, and manage data required by applications in terms of disjoint bounding boxes. This allows it to (1) allocate exactly or nearly as much data as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence maximize data reuse across tiles and minimize data transfer overhead, and (3) and as a result, maximize utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a four-GPU machine with various scientific programs showed that BBMM reduces data allocations on each GPU by up to 75% compared to current allocation schemes, yields performance of at least 88% of manually written code, and allows excellent weak scaling.