120 resultados para Evolving Object-Oriented Compiler


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Structural, optical and nanomechanical properties of nanocrystalline Zinc Telluride (ZnTe) films of thickness upto 10 microns deposited at room temperature on borosilicate glass substrates are reported. X-ray diffraction patterns reveal that the films were preferentially oriented along the (1 1 1) direction. The maximum refractive index of the films was 2.74 at a wavelength of 2000 nm. The optical band gap showed strong thickness dependence. The average film hardness and Young's modulus obtained from load-displacement curves and analyzed by Oliver-Pharr method were 4 and 70 GPa respectively. Hardness of (1 1 1) oriented ZnTe thin films exhibited almost 5 times higher value than bulk. The studies show clearly that the hardness increases with decreasing indentation size, for indents between 30 and 300 nm in depth indicating the existence of indentation size effect. The coefficient of friction for these films as obtained from the nanoscratch test was ~0.4.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Structural, optical and nanomechanical properties of nanocrystalline Zinc Telluride (ZnTe) films of thickness upto 10 microns deposited at room temperature on borosilicate glass substrates are reported. X-ray diffraction patterns reveal that the films were preferentially oriented along the (1 1 1) direction. The maximum refractive index of the films was 2.74 at a wavelength of 2000 nm. The optical band gap showed strong thickness dependence. The average film hardness and Young's modulus obtained from load-displacement curves and analyzed by Oliver-Pharr method were 4 and 70 GPa respectively. Hardness of (1 1 1) oriented ZnTe thin films exhibited almost 5 times higher value than bulk. The studies show clearly that the hardness increases with decreasing indentation size, for indents between 30 and 300 nm in depth indicating the existence of indentation size effect. The coefficient of friction for these films as obtained from the nanoscratch test was ~0.4.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

During lightning strike to a tall grounded object (TGO), reflections of current waves are known to occur at either ends of the TGO. These reflection modify the channel current and hence, the lightning electromagnetic fields. This study aims to identify the possible contributing factors to reflection at a TGO-channel junction for the current waves ascending on the TGO. Possible sources of reflection identified are corona sheath and discontinuity of resistance and radius. For analyzing the contribution of corona sheath and discontinuity of resistance at the junction, a macroscopic physical model for the return stroke developed in our earlier work is employed. NEC-2D is used for assessing the contribution of abrupt change in radii at a TGO-channel junction. The wire-cage model adopted for the same is validated using laboratory experiments. Detailed investigation revealed the following. The main contributor for reflection at a TGO-channel junction is the difference between TGO and channel core radii. Also, the discontinuity of resistance at a TGO-channel junction can be of some relevance only for the first microsecond regime. Further, corona sheath does not play any significant role in the reflection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multiple Clock Domain processors provide an attractive solution to the increasingly challenging problems of clock distribution and power dissipation. They allow their chips to be partitioned into different clock domains, and each domain’s frequency (voltage) to be independently configured. This flexibility adds new dimensions to the Dynamic Voltage and Frequency Scaling problem, while providing better scope for saving energy and meeting performance demands. In this paper, we propose a compiler directed approach for MCD-DVFS. We build a formal petri net based program performance model, parameterized by settings of microarchitectural components and resource configurations, and integrate it with our compiler passes for frequency selection.Our model estimates the performance impact of a frequency setting, unlike the existing best techniques which rely on weaker indicators of domain performance such as queue occupancies(used by online methods) and slack manifestation for a particular frequency setting (software based methods).We evaluate our method with subsets of SPECFP2000,Mediabench and Mibench benchmarks. Our mean energy savings is 60.39% (versus 33.91% of the best software technique)in a memory constrained system for cache miss dominated benchmarks, and we meet the performance demands.Our ED2 improves by 22.11% (versus 18.34%) for other benchmarks. For a CPU with restricted frequency settings, our energy consumption is within 4.69% of the optimal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

NMR spectra of molecules oriented in liquid-crystalline matrix provide information on the structure and orientation of the molecules. Thermotropic liquid crystals used as an orienting media result in the spectra of spins that are generally strongly coupled. The number of allowed transitions increases rapidly with the increase in the number of interacting spins. Furthermore, the number of single quantum transitions required for analysis is highly redundant. In the present study, we have demonstrated that it is possible to separate the subspectra of a homonuclear dipolar coupled spin system on the basis of the spin states of the coupled heteronuclei by multiple quantum (MQ)−single quantum (SQ) correlation experiments. This significantly reduces the number of redundant transitions, thereby simplifying the analysis of the complex spectrum. The methodology has been demonstrated on the doubly 13C labeled acetonitrile aligned in the liquid-crystal matrix and has been applied to analyze the complex spectrum of an oriented six spin system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Energy consumption has become a major constraint in providing increased functionality for devices with small form factors. Dynamic voltage and frequency scaling has been identified as an effective approach for reducing the energy consumption of embedded systems. Earlier works on dynamic voltage scaling focused mainly on performing voltage scaling when the CPU is waiting for memory subsystem or concentrated chiefly on loop nests and/or subroutine calls having sufficient number of dynamic instructions. This paper concentrates on coarser program regions and for the first time uses program phase behavior for performing dynamic voltage scaling. Program phases are annotated at compile time with mode switch instructions. Further, we relate the Dynamic Voltage Scaling Problem to the Multiple Choice Knapsack Problem, and use well known heuristics to solve it efficiently. Also, we develop a simple integer linear program formulation for this problem. Experimental evaluation on a set of media applications reveal that our heuristic method obtains a 38% reduction in energy consumption on an average, with a performance degradation of 1% and upto 45% reduction in energy with a performance degradation of 5%. Further, the energy consumed by the heuristic solution is within 1% of the optimal solution obtained from the ILP approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider evolving exponential RGGs in one dimension and characterize the time dependent behavior of some of their topological properties. We consider two evolution models and study one of them detail while providing a summary of the results for the other. In the first model, the inter-nodal gaps evolve according to an exponential AR(1) process that makes the stationary distribution of the node locations exponential. For this model we obtain the one-step conditional connectivity probabilities and extend it to the k-step case. Finite and asymptotic analysis are given. We then obtain the k-step connectivity probability conditioned on the network being disconnected. We also derive the pmf of the first passage time for a connected network to become disconnected. We then describe a random birth-death model where at each instant, the node locations evolve according to an AR(1) process. In addition, a random node is allowed to die while giving birth to a node at another location. We derive properties similar to those above.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to the importance of collective communications in scientific parallel applications, many strategies have been devised for optimizing collective communications for different kinds of parallel environments. There has been an increasing interest to evolve efficient broadcast algorithms for computational grids. In this paper, we present application-oriented adaptive techniques that take into account resource characteristics as well as the application's usage of broadcasts for deriving efficient broadcast trees. In particular, we consider two broadcast parameters used in the application, namely, the broadcast message sizes and the time interval between the broadcasts. The results indicate that our adaptive strategies can provide 20% average improvement in performance over the popular MPICH-G2's MPI_Bcast implementation for loaded network conditions.