945 resultados para execution traces
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.
Resumo:
This paper presents the after shock heated structural and morphological studies of chromium film coated on hypersonic test model as a passive drag reduction element. The structural changes and the composition of phases of chromium due to shock heating (2850 K) are characterized using X-ray diffraction studies. Surface morphology changes of chromium coating have been studied using scanning electron microscopy (SEM) before and after shock heating. Significant amount of chromium ablation and sublimation from the model surface is noticed from SEM micrographs. Traces of randomly oriented chromium oxides formed along the coated surface confirm surface reaction of chromium with oxygen present behind the shock. Large traces of amorphous chromium oxide phases are also observed.
Resumo:
Small quantity of energetic material coated on the inner wall of a polymer tube is proposed as a new method to generate micro-shock waves in the laboratory. These micro-shock waves have been harnessed to develop a novel method of delivering dry particle and liquid jet into the target. We have generated micro-shock waves with the help of reactive explosive compound high melting explosive (octahydro-1,3,5,7-tetranitro-1,3,5,7-tetrazocine) and traces of aluminium] coated polymer tube, utilising 9 J of energy. The detonation process is initiated electrically from one end of the tube, while the micro-shock wave followed by the products of detonation escape from the open end of the polymer tube. The energy available at the open end of the polymer tube is used to accelerate tungsten micro-particles coated on the other side of the diaphragm or force a liquid jet out of a small cavity filled with the liquid. The micro-particles deposited on a thin metal diaphragm (typically 100-mu m thick) were accelerated to high velocity using micro-shock waves to penetrate the target. Tungsten particles of 0.7 mu m diameter have been successfully delivered into agarose gel targets of various strengths (0.6-1.0 %). The device has been tested by delivering micro-particles into potato tuber and Arachis hypogaea Linnaeus (ground nut) stem tissue. Along similar lines, liquid jets of diameter 200-250 mu m (methylene blue, water and oils) have been successfully delivered into agarose gel targets of various strengths. Successful vaccination against murine salmonellosis was demonstrated as a biological application of this device. The penetration depths achieved in the experimental targets are very encouraging to develop a future device for biological and biomedical applications.
Resumo:
This paper presents a decentralized/peer-to-peer architecture-based parallel version of the vector evaluated particle swarm optimization (VEPSO) algorithm for multi-objective design optimization of laminated composite plates using message passing interface (MPI). The design optimization of laminated composite plates being a combinatorially explosive constrained non-linear optimization problem (CNOP), with many design variables and a vast solution space, warrants the use of non-parametric and heuristic optimization algorithms like PSO. Optimization requires minimizing both the weight and cost of these composite plates, simultaneously, which renders the problem multi-objective. Hence VEPSO, a multi-objective variant of the PSO algorithm, is used. Despite the use of such a heuristic, the application problem, being computationally intensive, suffers from long execution times due to sequential computation. Hence, a parallel version of the PSO algorithm for the problem has been developed to run on several nodes of an IBM P720 cluster. The proposed parallel algorithm, using MPI's collective communication directives, establishes a peer-to-peer relationship between the constituent parallel processes, deviating from the more common master-slave approach, in achieving reduction of computation time by factor of up to 10. Finally we show the effectiveness of the proposed parallel algorithm by comparing it with a serial implementation of VEPSO and a parallel implementation of the vector evaluated genetic algorithm (VEGA) for the same design problem. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Muscle development is a multistep process which includes myoblast diversification, proliferation, migration, fusion, differentiation and growth. A hierarchical exhibition of myogenic factors is important for dexterous execution of progressive events in muscle formation. EWG (erect wing) is a transcription factor known to have a role in indirect flight muscle development (IFM) in Drosophila. We marked out the precise spatio-temporal expression profile of EWG in the myoblasts, and in the developing muscles. Mutant adult flies null for EWG in myoblasts show variable number of IFM, suggesting that EWG is required for patterning of the IFM. The remnant muscle found in the EWG null flies show proper assembly of the structural proteins, which implies that some myoblasts manage to fuse, develop and differentiate normally indicating that EWG is not required for differentiation program per se. However, when EWG expression is extended beyond its expression window in a wild type background, muscle thinning is observed implying EWG function in protein synthesis inhibition. Mis-expression studies in wing disc myoblasts hinted at its role in myoblast proliferation. We thus conclude that EWG is important for regulating fusion events which in turn decides the IFM pattern. Also IFM in EWG null mutants show clumps containing broken fibres and an altered mitochondrial morphology. The vertebrate homolog of EWG is nuclear respiratory factor1 (NRF1) which is known to have a function in mitochondrial biogenesis and protection against oxidative stress. Gene expression for inner mitochondrial membrane protein, Opa1-like was found to be absent in these mutants. Also, these flies were more sensitive to oxidative stress, indicating a compromised mitochondrial functioning. Our results therefore demonstrate that EWG functions in maintaining muscles’ structural integrity by ensuing proper mitochondrial activity.
Resumo:
Knowledge about program worst case execution time (WCET) is essential in validating real-time systems and helps in effective scheduling. One popular approach used in industry is to measure execution time of program components on the target architecture and combine them using static analysis of the program. Measurements need to be taken in the least intrusive way in order to avoid affecting accuracy of estimated WCET. Several programs exhibit phase behavior, wherein program dynamic execution is observed to be composed of phases. Each phase being distinct from the other, exhibits homogeneous behavior with respect to cycles per instruction (CPI), data cache misses etc. In this paper, we show that phase behavior has important implications on timing analysis. We make use of the homogeneity of a phase to reduce instrumentation overhead at the same time ensuring that accuracy of WCET is not largely affected. We propose a model for estimating WCET using static worst case instruction counts of individual phases and a function of measured average CPI. We describe a WCET analyzer built on this model which targets two different architectures. The WCET analyzer is observed to give safe estimates for most benchmarks considered in this paper. The tightness of the WCET estimates are observed to be improved for most benchmarks compared to Chronos, a well known static WCET analyzer.
Resumo:
Most of the existing WCET estimation methods directly estimate execution time, ET, in cycles. We propose to study ET as a product of two factors, ET = IC * CPI, where IC is instruction count and CPI is cycles per instruction. Considering directly the estimation of ET may lead to a highly pessimistic estimate since implicitly these methods may be using worst case IC and worst case CPI. We hypothesize that there exists a functional relationship between CPI and IC such that CPI=f(IC). This is ascertained by computing the covariance matrix and studying the scatter plots of CPI versus IC. IC and CPI values are obtained by running benchmarks with a large number of inputs using the cycle accurate architectural simulator, Simplescalar on two different architectures. It is shown that the benchmarks can be grouped into different classes based on the CPI versus IC relationship. For some benchmarks like FFT, FIR etc., both IC and CPI are almost a constant irrespective of the input. There are other benchmarks that exhibit a direct or an inverse relationship between CPI and IC. In such a case, one can predict CPI for a given IC as CPI=f(IC). We derive the theoretical worst case IC for a program, denoted as SWIC, using integer linear programming(ILP) and estimate WCET as SWIC*f(SWIC). However, if CPI decreases sharply with IC then measured maximum cycles is observed to be a better estimate. For certain other benchmarks, it is observed that the CPI versus IC relationship is either random or CPI remains constant with varying IC. In such cases, WCET is estimated as the product of SWIC and measured maximum CPI. It is observed that use of the proposed method results in tighter WCET estimates than Chronos, a static WCET analyzer, for most benchmarks for the two architectures considered in this paper.
Resumo:
Pervasive use of pointers in large-scale real-world applications continues to make points-to analysis an important optimization-enabler. Rapid growth of software systems demands a scalable pointer analysis algorithm. A typical inclusion-based points-to analysis iteratively evaluates constraints and computes a points-to solution until a fixpoint. In each iteration, (i) points-to information is propagated across directed edges in a constraint graph G and (ii) more edges are added by processing the points-to constraints. We observe that prioritizing the order in which the information is processed within each of the above two steps can lead to efficient execution of the points-to analysis. While earlier work in the literature focuses only on the propagation order, we argue that the other dimension, that is, prioritizing the constraint processing, can lead to even higher improvements on how fast the fixpoint of the points-to algorithm is reached. This becomes especially important as we prove that finding an optimal sequence for processing the points-to constraints is NP-Complete. The prioritization scheme proposed in this paper is general enough to be applied to any of the existing points-to analyses. Using the prioritization framework developed in this paper, we implement prioritized versions of Andersen's analysis, Deep Propagation, Hardekopf and Lin's Lazy Cycle Detection and Bloom Filter based points-to analysis. In each case, we report significant improvements in the analysis times (33%, 47%, 44%, 20% respectively) as well as the memory requirements for a large suite of programs, including SPEC 2000 benchmarks and five large open source programs.
Resumo:
Advances in technology have increased the number of cores and size of caches present on chip multicore platforms(CMPs). As a result, leakage power consumption of on-chip caches has already become a major power consuming component of the memory subsystem. We propose to reduce leakage power consumption in static nonuniform cache architecture(SNUCA) on a tiled CMP by dynamically varying the number of cache slices used and switching off unused cache slices. A cache slice in a tile includes all cache banks present in that tile. Switched-off cache slices are remapped considering the communication costs to reduce cache usage with minimal impact on execution time. This saves leakage power consumption in switched-off L2 cache slices. On an average, there map policy achieves 41% and 49% higher EDP savings compared to static and dynamic NUCA (DNUCA) cache policies on a scalable tiled CMP, respectively.
Resumo:
Adaptive Gaussian Mixture Models (GMM) have been one of the most popular and successful approaches to perform foreground segmentation on multimodal background scenes. However, the good accuracy of the GMM algorithm comes at a high computational cost. An improved GMM technique was proposed by Zivkovic to reduce computational cost by minimizing the number of modes adaptively. In this paper, we propose a modification to his adaptive GMM algorithm that further reduces execution time by replacing expensive floating point computations with low cost integer operations. To maintain accuracy, we derive a heuristic that computes periodic floating point updates for the GMM weight parameter using the value of an integer counter. Experiments show speedups in the range of 1.33 - 1.44 on standard video datasets where a large fraction of pixels are multimodal.
Resumo:
Mobile WiMAX is a burgeoning network technology with diverse applications, one of them being used for VANETs. The performance metrics such as Mean Throughput and Packet Loss Ratio for the operations of VANETs adopting 802.16e are computed through simulation techniques. Next we evaluated the similar performance of VANETs employing 802.11p, also known as WAVE (Wireless Access in Vehicular Environment). The simulation model proposed is close to reality as we have generated mobility traces for both the cases using a traffic simulator (SUMO), and fed it into network simulator (NS2) based on their operations in a typical urban scenario for VANETs. In sequel, a VANET application called `Street Congestion Alert' is developed to assess the performances of these two technologies. For this application, TraCI is used for coupling SUMO and NS2 in a feedback loop to set up a realistic simulation scenario. Our inferences show that the Mobile WiMAX performs better than WAVE for larger network sizes.
Resumo:
YAlO3:Ni2+ (0.1 mol%) doped nanophosphor was synthesised by a low temperature solution combustion method. Powder X-ray diffraction (PXRD) confirms the orthorhombic phase of yttrium aluminate (YAlO3) along with traces of Y3Al5O12. Scanning Electron microscopy (SEM) shows that the powder particles appears to be spherical in shape with large agglomeration. The average crystallite sizes appeared to be in the range 45-90 nm and the same was confirmed by transmission electron microscopy (TEM) and Williamson-Hall (W-H) plots. Electron Paramagnetic Resonance (EPR) and photoluminescence (PL) studies reveal that Ni2+ ions are in octahedral coordination. Thermoluminescence (TL) glow curve consists of two peaks with the main peak at similar to 224 degrees C and a shouldered peak at 285 degrees C was recorded in the range 0.2-15 kGy gamma-irradiated samples. The TL intensity was found to be increasing linearly for 224 degrees C and 285 degrees C peaks up to 1 kGy and thereafter it shows sub-linear (up to 8 kGy) and saturation behavior. The trap parameters namely activation energy (E), order of kinetics (b), frequency factor (s) at different gamma-doses were determined using Chens glow peak shape and Luschiks methods then the results are discussed in detail. Simple glow peak structure, the 224 degrees C peak in YAlO3:Ni2+ nanophosphor can be used in personal dosimetry. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
A combined 3D finite element simulation and experimental study of interaction between a notch and cylindrical voids ahead of it in single edge notch (tension) aluminum single crystal specimens is undertaken in this work. Two lattice orientations are considered in which the notch front is parallel to the crystallographic 10 (1) over bar] direction. The flat surface of the notch coincides with the (010) plane in one orientation and with the (1 (1) over bar1) plane in the other. Three equally spaced cylindrical voids are placed directly ahead of the notch tip. The predicted load-displacement curves, slip traces, lattice rotation and void growth from the finite element analysis are found to be in good agreement with the experimental observations for both the orientations. Finite element results show considerable through-thickness variation in both hydrostatic stress and equivalent plastic slip which, however, depends additionally on the lattice orientation. The through-thickness variation in the above quantities affects the void growth rate and causes it to differ from the center-plane to the free surface of the specimen. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The paper addresses experiments and modeling studies on the use of producer gas, a bio-derived low energy content fuel in a spark-ignited engine. Producer gas, generated in situ, has thermo-physical properties different from those of fossil fuel(s). Experiments on naturally aspirated and turbo-charged engine operation and subsequent analysis of the cylinder pressure traces reveal significant differences in the heat release pattern within the cylinder compared with a typical fossil fuel. The heat release patterns for gasoline and producer gas compare well in the initial 50% but beyond this, producer gas combustion tends to be sluggish leading to an overall increase in the combustion duration. This is rather unexpected considering that producer gas with nearly 20% hydrogen has higher flame speeds than gasoline. The influence of hydrogen on the initial flame kernel development period and the combustion duration and hence on the overall heat release pattern is addressed. The significant deviations in the heat release profiles between conventional fuels and producer gas necessitates the estimation of producer gas-specific Wiebe coefficients. The experimental heat release profiles are used for estimating the Wiebe coefficients. Experimental evidence of lower fuel conversion efficiency based on the chemical and thermal analysis of the engine exhaust gas is used to arrive at the Wiebe coefficients. The efficiency factor a is found to be 2.4 while the shape factor m is estimated at 0.7 for 2% to 90% burn duration. The standard Wiebe coefficients for conventional fuels and fuel-specific coefficients for producer gas are used in a zero D model to predict the performance of a 6-cylinder gas engine under naturally aspirated and turbo-charged conditions. While simulation results with standard Wiebe coefficients result in excessive deviations from the experimental results, excellent match is observed when producer gas-specific coefficients are used. Predictions using the same coefficients on a 3-cylinder gas engine having different geometry and compression ratio(s) indicate close match with the experimental traces highlighting the versatility of the coefficients.
Resumo:
We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.