120 resultados para Evolving Object-Oriented Compiler


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software transactional memory (STM) has been proposed as a promising programming paradigm for shared memory multi-threaded programs as an alternative to conventional lock based synchronization primitives. Typical STM implementations employ a conflict detection scheme, which works with uniform access granularity, tracking shared data accesses either at word/cache line or at object level. It is well known that a single fixed access tracking granularity cannot meet the conflicting goals of reducing false conflicts without impacting concurrency adversely. A fine grained granularity while improving concurrency can have an adverse impact on performance due to lock aliasing, lock validation overheads, and additional cache pressure. On the other hand, a coarse grained granularity can impact performance due to reduced concurrency. Thus, in general, a fixed or uniform granularity access tracking (UGAT) scheme is application-unaware and rarely matches the access patterns of individual application or parts of an application, leading to sub-optimal performance for different parts of the application(s). In order to mitigate the disadvantages associated with UGAT scheme, we propose a Variable Granularity Access Tracking (VGAT) scheme in this paper. We propose a compiler based approach wherein the compiler uses inter-procedural whole program static analysis to select the access tracking granularity for different shared data structures of the application based on the application's data access pattern. We describe our prototype VGAT scheme, using TL2 as our STM implementation. Our experimental results reveal that VGAT-STM scheme can improve the application performance of STAMP benchmarks from 1.87% to up to 21.2%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Zn1−xMgxO (x = 0.3) thin films have been fabricated on Pt/TiO2/SiO2/Si substrates using multimagnetron sputtering technique. The films with wurtzite structure showed a (002) preferred orientation. Ferroelectricity in Zn1−xMgxO films was established from the temperature dependent dielectric constant and the polarization hysteresis loop. The temperature dependent study of dielectric constant at different frequencies exhibited a dielectric anomaly at 110 °C. The resistivity versus temperature characteristics showed an anomalous increase in the vicinity of the dielectric transition temperature. The Zn1−xMgxO thin films exhibit well-defined polarization hysteresis loop, with a remanent polarization of 0.2 μC/cm2 and coercive field of 8 kV/cm at room temperature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Highly (110) preferred orientated antiferroelectric PbZrO3 (PZ) and La-modified PZ thin films have been fabricated on Pt/Ti/SiO2/Si substrates using sol-gel process. Dielectric properties, electric field induced ferroelectric polarization, and the temperature dependence of the dielectric response have been explored as a function of composition. The Tc has been observed to decrease by ∼ 17 °C per 1 mol % of La doping. Double hysteresis loops were seen with zero remnant polarization and with coercive fields in between 176 and 193 kV/cm at 80 °C for antiferroelectric to ferroelectric phase transformation. These slim loops have been explained by the high orientation of the films along the polar direction of the antiparallel dipoles of a tetragonal primitive cell and by the strong electrostatic interaction between La ions and oxygen ions in an ABO3 perovskite unit cell. High quality films exhibited very low loss factor less than 0.015 at room temperature and pure PZ; 1 and 2 mol % La doped PZs have shown the room temperature dielectric constant of 135, 219, and 142 at the frequency of 10 kHz. The passive layer effects in these films have been explained by Curie constants and Curie temperatures. The ac conductivity and the corresponding Arrhenius plots have been shown and explained in terms of doping effect and electrode resistance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the simulation of a control scheme using the principle of field orientation for the control of a voltage source inverter-fed induction motor. The control principle is explained, followed by an algorithm to simulate various components of the system in the digital computer. The dynamic response of the system for the load disturbance and set-point variations have been studied. Also, the results of the simulation showing the behavior of field coordinates for such disturbances are given.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the method of field orientation of the stator current vector with respect to the stator, mutual, and rotor flux vectors, for the control of an induction motor fed from a current source inverter (CSI). A control scheme using this principle is described for orienting the stator current with respect to the rotor flux, as this gives natural decoupling between the current coordinates. A dedicated micro-computer system developed for implementing this scheme has been described. The experimental results are also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Some experimental results on the recognition of three-dimensional wire-frame objects are presented. In order to overcome the limitations of a recent model, which employs radial basis functions-based neural networks, we have proposed a hybrid learning system for object recognition, featuring: an optimization strategy (simulated annealing) in order to avoid local minima of an energy functional; and an appropriate choice of centers of the units. Further, in an attempt to achieve improved generalization ability, and to reduce the time for training, we invoke the principle of self-organization which utilises an unsupervised learning algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an algorithm for tracking objects in a video sequence, based on a novel approach for motion detection. We do not estimate the velocity �eld. In-stead we detect only the direction of motion at edge points and thus isolate sets of points which are moving coherently. We use a Hausdor� distance based matching algorithm to match point sets in local neighborhood and thus track objects in a video sequence. We show through some examples the e�ectiveness of the algo- rithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reports optical and nanomechanical properties of predominantly a-axis oriented AlN thin films. These films were deposited by reactive DC magnetron sputtering technique at an optimal target to substrate distance of 180 mm. X-ray rocking curve (FWHM = 52 arcsec) studies confirmed the preferred orientation. Spectroscopic ellipsometry revealed a refractive index of 1.93 at a wavelength of 546 nm. The hardness and elastic modulus of these films were 17 and 190 GPa, respectively, which are much higher than those reported earlier can be useful for piezoelectric films in bulk acoustic wave resonators. (C) 2012 American Institute of Physics. http://dx.doi.org/10.1063/1.4772204]

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the rapid scaling down of the semiconductor process technology, the process variation aware circuit design has become essential today. Several statistical models have been proposed to deal with the process variation. We propose an accurate BSIM model for handling variability in 45nm CMOS technology. The MOSFET is designed to meet the specification of low standby power technology of International Technology Roadmap for Semiconductors (ITRS).The process parameters variation of annealing temperature, oxide thickness, halo dose and title angle of halo implant are considered for the model development. One parameter variation at a time is considered for developing the model. The model validation is done by performance matching with device simulation results and reported error is less than 10%.© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.