942 resultados para Research performance
Analyzing Cache Performance Bottlenecks of STM Applications and addressing them with Compiler's help
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.
Resumo:
A parallel matrix multiplication algorithm is presented, and studies of its performance and estimation are discussed. The algorithm is implemented on a network of transputers connected in a ring topology. An efficient scheme for partitioning the input matrices is introduced which enables overlapping computation with communication. This makes the algorithm achieve near-ideal speed-up for reasonably large matrices. Analytical expressions for the execution time of the algorithm have been derived by analysing its computation and communication characteristics. These expressions are validated by comparing the theoretical results of the performance with the experimental values obtained on a four-transputer network for both square and irregular matrices. The analytical model is also used to estimate the performance of the algorithm for a varying number of transputers and varying problem sizes. Although the algorithm is implemented on transputers, the methodology and the partitioning scheme presented in this paper are quite general and can be implemented on other processors which have the capability of overlapping computation with communication. The equations for performance prediction can also be extended to other multiprocessor systems.
Resumo:
SAW matched filter is commonly used in spread spectrum communication receivers in order to maximize the SNR prior to detection, At times the receiver would be a mobile one while the signal is processed at the IF level, In that case frequency deviations due to Doppler shift or temperature dependence of the acoustic medium used for SAW device would, severely effect it's performance, The impact of these errors on the receiver performance is analyzed on a generalised basis.
Resumo:
Real gas effects dominate the hypersonic flow fields encountered by modem day hypersonic space vehicles. Measurement of aerodynamic data for the design applications of such aerospace vehicles calls for special kinds of wind tunnels capable of faithfully simulating real gas effects. A shock tunnel is an established facility commonly used along with special instrumentation for acquiring the data for this purpose within a short time period. The hypersonic shock tunnel (HST1), established at the Indian Institute of Science (IISc) in the early 1970s, has been extensively used to measure the aerodynamic data of various bodies of interest at hypersonic Mach numbers in the range 4 to 13. Details of some important measurements made during the period 1975-1995 along with the performance capabilities of the HST1 are presented in this review. In view of the re-emergence of interest in hypersonics across the globe in recent times, the present review highlights the Suitability of the hypersonic shock tunnel at the IISc for future space application studies in India.
Resumo:
Wave pipelining is a design technique for increasing the throughput of a digital circuit or system without introducing pipelining registers between adjacent combinational logic blocks in the circuit/system. However, this requires balancing of the delays along all the paths from the input to the output which comes the way of its implementation. Static CMOS is inherently susceptible to delay variation with input data, and hence, receives a low priority for wave pipelined digital design. On the other hand, ECL and CML, which are amenable to wave pipelining, lack the compactness and low power attributes of CMOS. In this paper we attempt to exploit wave pipelining in CMOS technology. We use a single generic building block in Normal Process Complementary Pass Transistor Logic (NPCPL), modeled after CPL, to achieve equal delay along all the propagation paths in the logic structure. An 8×8 b multiplier is designed using this logic in a 0.8 ?m technology. The carry-save multiplier architecture is modified suitably to support wave pipelining, viz., the logic depth of all the paths are made identical. The 1 mm×0.6 mm multiplier core supports a throughput of 400 MHz and dissipates a total power of 0.6 W. We develop simple enhancements to the NPCPL building blocks that allow the multiplier to sustain throughputs in excess of 600 MHz. The methodology can be extended to introduce wave pipelining in other circuits as well
Resumo:
Low-pressure MOCVD, with tris(2,4 pentanedionato)aluminum(III) as the precursor, was used in the present investigation to coat alumina on to cemented carbide cutting tools. To evaluate the MOCVD process, the efficiency in cutting operations of MOCVD-coated tools was compared with that of tools coated using the industry-standard CVD process.Three multilayer cemented carbide cutting tool inserts, viz., TiN/TiC/WC, CVD-coated Al2O3 on TiN/TiC/WC, and MOCVD-coated Al2O3 on TiN/TiC/WC, were compared in the dry turning of mild steel. Turning tests were conducted for cutting speeds ranging from 14 to 47 m/min, for a depth of cut from 0.25 to 1 mm, at the constant feed rate of 0.2 mm/min. The axial, tangential, and radial forces were measured using a lathe tool dynamometer for different cutting parameters, and the machined work pieces were tested for surface roughness. The results indicate that, in most of the cases examined, the MOCVD-coated inserts produced a smoother surface finish, while requiring lower cutting forces, indicating that MOCVD produces the best-performing insert, followed by the CVD-coated one. The superior performance of MOCVD-alumina is attributed to the co-deposition of carbon with the oxide, due to the very nature of the precursor used, leading to enhanced mechanical properties for cutting applications in harsh environment.
Resumo:
Computational grids with multiple batch systems (batch grids) can be powerful infrastructures for executing long-running multicomponent parallel applications. In this paper, we have constructed a middleware framework for executing such long-running applications spanning multiple submissions to the queues on multiple batch systems. We have used our framework for execution of a foremost long-running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our framework coordinates the distribution, execution, migration and restart of the components of CCSM on the multiple queues where the component jobs of the different queues can have different queue waiting and startup times.
Resumo:
The conventional metal oxide semiconductor field effect transistor (MOSFET)may not be suitable for future low standby power (LSTP) applications due to its high off-state current as the sub-threshold swing is theoretically limited to 60mV/decade. Tunnel field effect transistor (TFET) based on gate controlled band to band tunneling has attracted attention for such applications due to its extremely small sub-threshold swing (much less than 60mV/decade). This paper takes a simulation approach to gain some insight into its electrostatics and the carrier transport mechanism. Using 2D device simulations, a thorough study and analysis of the electrical parameters of the planar double gate TFET is performed. Due to excellent sub-threshold characteristics and a reverse biased structure, it offers orders of magnitude less leakage current compared to the conventional MOSFET. In this work, it is shown that the device can be scaled down to channel lengths as small as 30 nm without affecting its performance. Also, it is observed that the bulk region of the device plays a major role in determining the sub-threshold characteristics of the device and considerable improvement in performance (in terms of ION/IOFF ratio) can be achieved if the thickness of the device is reduced. An ION/IOFF ratio of 2x1012 and a minimum point sub-threshold swing of 22mV/decade is obtained.
Resumo:
We consider a time varying wireless fading channel, equalized by an LMS linear equalizer. We study how well this equalizer tracks the optimal Wiener equalizer. We model the channel by an Auto-regressive (AR) process. Then the LMS equalizer and the AR process are jointly approximated by the solution of a system of ODEs (ordinary differential equations). Using these ODEs, the error between the LMS equalizer and the instantaneous Wiener filter is shown to decay exponentially/polynomially to zero unless the channel is marginally stable in which case the convergence may not hold.Using the same ODEs, we also show that the corresponding Mean Square Error (MSE) converges towards minimum MSE(MMSE) at the same rate for a stable channel. We further show that the difference between the MSE and the MMSE does not explode with time even when the channel is unstable. Finally we obtain an optimum step size for the linear equalizer in terms of the AR parameters, whenever the error decay is exponential.
Resumo:
Modeling the performance behavior of parallel applications to predict the execution times of the applications for larger problem sizes and number of processors has been an active area of research for several years. The existing curve fitting strategies for performance modeling utilize data from experiments that are conducted under uniform loading conditions. Hence the accuracy of these models degrade when the load conditions on the machines and network change. In this paper, we analyze a curve fitting model that attempts to predict execution times for any load conditions that may exist on the systems during application execution. Based on the experiments conducted with the model for a parallel eigenvalue problem, we propose a multi-dimensional curve-fitting model based on rational polynomials for performance predictions of parallel applications in non-dedicated environments. We used the rational polynomial based model to predict execution times for 2 other parallel applications on systems with large load dynamics. In all the cases, the model gave good predictions of execution times with average percentage prediction errors of less than 20%
Resumo:
Beginning with the ‘frog-leg experiment’ by Galvani (1786), followed by the demonstrations of Volta pile by Volta (1792) and lead-acid accumulator by Plante´ (1859), several battery chemistries have been developed and realized commercially. The development of lithium-ion rechargeable battery in the early 1990s is a breakthrough in the science and technology of batteries. Owing to its high energy density and high operating voltage, the Li-ion battery has become the battery of choice for various portable applications such as note-book computers, cellular telephones, camcorders, etc. Huge efforts are underway in succeeding the development of large size batteries for electric vehicle applications. The origin of lithium-ion battery lies in the discovery that Li+-ions can reversibly be intercalated into/de-intercalated from the Van der Walls gap between graphene sheets of carbon materials at a potential close to the Li/Li+ electrode. By employing carbon as the negative electrode material in rechargeable lithium-ion batteries, the problems associated with metallic lithium in rechargeable lithium batteries have been mitigated. Complimentary investigations on intercalation compounds based on transition metals have resulted in establishing LiCoO2 as the promising cathode material. By employing carbon and LiCoO2, respectively, as the negative and positive electrodes in a non-aqueous lithium-salt electrolyte,a Li-ion cell with a voltage value of about 3.5 V has resulted.Subsequent to commercialization of Li-ion batteries, a number of research activities concerning various aspects of the battery components began in several laboratories across the globe. Regarding the positive electrode materials, research priorities have been to develop different kinds of active materials concerning various aspects such as safety, high capacity, low cost, high stability with long cycle-life, environmental compatibility,understanding relationships between crystallographic and electrochemical properties. The present review discusses the published literature on different positive electrode materials of Li-ion batteries, with a focus on the effect of particle size on electrochemical performance.
Resumo:
This paper proposes a Petri net model for a commercial network processor (Intel iXP architecture) which is a multithreaded multiprocessor architecture. We consider and model three different applications viz., IPv4 forwarding, network address translation, and IP security running on IXP 2400/2850. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. We conduct a detailed performance evaluation, identify the bottleneck resource, and propose a few architectural extensions and evaluate them in detail.
Resumo:
This study deals with tailoring of the surface morphology, microstructure, and electrochemical properties of Sn thin films deposited by magnetron sputtering with different deposition rates. Scanning electron microscopy and atomic force microscopy are used to characterize the film surface morphology. Electrochemical properties of Sn thin film are measured and compared by cyclic voltammetry and charge-discharge cycle data at a constant current density. Sn thin film fabricated with a higher deposition rate exhibited an initial discharge capacity of 798 mAh g(-1) but reduced to 94 mAh g(-1) at 30th cycle. Film deposited with lower deposition rate delivered 770 mAh g(-1) during 1st cycle with improved capacity retention of 521 mAh g(-1) on 30th cycle. Comparison of electrochemical performances of these films has revealed important distinctions, which are associated with the surface morphology and hence on rate of deposition. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The impact of gate-to-source/drain overlap length on performance and variability of 65 nm CMOS is presented. The device and circuit variability is investigated as a function of three significant process parameters, namely gate length, gate oxide thickness, and halo dose. The comparison is made with three different values of gate-to-source/drain overlap length namely 5 nm, 0 nm, and -5 nm and at two different leakage currents of 10 nA and 100 nA. The Worst-Case-Analysis approach is used to study the inverter delay fluctuations at the process corners. The drive current of the device for device robustness and stage delay of an inverter for circuit robustness are taken as performance metrics. The design trade-off between performance and variability is demonstrated both at the device level and circuit level. It is shown that larger overlap length leads to better performance, while smaller overlap length results in better variability. Performance trades with variability as overlap length is varied. An optimal value of overlap length of 0 nm is recommended at 65 nm gate length, for a reasonable combination of performance and variability.