Biblioteca Digital

864 resultados para High performance processors

Synergistic crystal facet engineering and structural control of WO3 films exhibiting unprecedented photoelectrochemical performance

Relevância:

90.00% 90.00%

Publicador:

Resumo:

WO3 nanoplate arrays with (002) oriented facets grown on fluorine doped SnO2 (FTO) glass substrates are tailored by tuning the precursor solution via a facile hydrothermal method. A 2-step hydrothermal method leads to the preferential growth of WO3 film with enriched (002) facets, which exhibits extraordinary photoelectrochemical (PEC) performance with a remarkable photocurrent density of 3.7 mA cm–2 at 1.23 V vs. revisable hydrogen electrode (RHE) under AM 1.5 G illumination without the use of any cocatalyst, corresponding to ~93% of the theoretical photocurrent of WO3. Density functional theory (DFT) calculations together with experimental studies reveal that the enhanced photocatalytic activity and better photo-stability of the WO3 films are attributed to the synergistic effect of highly reactive (002) facet and nanoplate structure which facilitates the photo–induced charge carrier separation and suppresses the formation of peroxo-species. Without the use of oxygen evolution cocatalysts, the excellent PEC performance, demonstrated in this work, by simply tuning crystal facets and nanostructure of pristine WO3 films may open up new opportunities in designing high performance photoanodes for PEC water splitting.

A cost effective pipelined divider for double precision floating point number

Relevância:

90.00% 90.00%

Publicador:

Resumo:

he growth of high-performance application in computer graphics, signal processing and scientific computing is a key driver for high performance, fixed latency; pipelined floating point dividers. Solutions available in the literature use large lookup table for double precision floating point operations.In this paper, we propose a cost effective, fixed latency pipelined divider using modified Taylor-series expansion for double precision floating point operations. We reduce chip area by using a smaller lookup table. We show that the latency of the proposed divider is 49.4 times the latency of a full-adder. The proposed divider reduces chip area by about 81% than the pipelined divider in [9] which is based on modified Taylor-series.

Construction and use of linear regression models for processor performance analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Processor architects have a challenging task of evaluating a large design space consisting of several interacting parameters and optimizations. In order to assist architects in making crucial design decisions, we build linear regression models that relate Processor performance to micro-architecture parameters, using simulation based experiments. We obtain good approximate models using an iterative process in which Akaike's information criteria is used to extract a good linear model from a small set of simulations, and limited further simulation is guided by the model using D-optimal experimental designs. The iterative process is repeated until desired error bounds are achieved. We used this procedure to establish the relationship of the CPI performance response to 26 key micro-architectural parameters using a detailed cycle-by-cycle superscalar processor simulator The resulting models provide a significance ordering on all micro-architectural parameters and their interactions, and explain the performance variations of micro-architectural techniques.

Area Optimized H.264 Intra Prediction Architecture for 1080p HD Resolution

Relevância:

90.00% 90.00%

Publicador:

Resumo:

High performance video standards use prediction techniques to achieve high picture quality at low bit rates. The type of prediction decides the bit rates and the image quality. Intra Prediction achieves high video quality with significant reduction in bit rate. This paper present an area optimized architecture for Intra prediction, for H.264 decoding at HDTV resolution with a target of achieving 60 fps. The architecture was validated on Virtex-5 FPGA based platform. The architecture achieves a frame rate of 64 fps. The architecture is based on multi-level memory hierarchy to reduce latency and ensure optimum resources utilization. It removes redundancy by reusing same functional blocks across different modes. The proposed architecture uses only 13% of the total LUTs available on the Xilinx FPGA XC5VLX50T.

Executing Long-running Multi-component Applications on Batch Grids

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Computational grids are increasingly being used for executing large multi-component scientific applications. The most widely reported advantages of application execution on grids are the performance benefits, in terms of speeds, problem sizes or quality of solutions, due to increased number of processors. We explore the possibility of improved performance on grids without increasing the application’s processor space. For this, we consider grids with multiple batch systems. We explore the challenges involved in and the advantages of executing long-running multi-component applications on multiple batch sites with a popular multi-component climate simulation application, CCSM, as the motivation.We have performed extensive simulation studies to estimate the single and multi-site execution rates of the applications for different system characteristics.Our experiments show that in many cases, multiple batch executions can have better execution rates than a single site execution.

Adaptive Filtering Technique and DSP Based Implementation for High-Speed Distance Protection

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The paper presents an adaptive Fourier filtering technique and a relaying scheme based on a combination of a digital band-pass filter along with a three-sample algorithm, for applications in high-speed numerical distance protection. To enhance the performance of above-mentioned technique, a high-speed fault detector has been used. MATLAB based simulation studies show that the adaptive Fourier filtering technique provides fast tripping for near faults and security for farther faults. The digital relaying scheme based on a combination of digital band-pass filter along with three-sample data window algorithm also provides accurate and high-speed detection of faults. The paper also proposes a high performance 16-bit fixed point DSP (Texas Instruments TMS320LF2407A) processor based hardware scheme suitable for implementation of the above techniques. To evaluate the performance of the proposed relaying scheme under steady state and transient conditions, PC based menu driven relay test procedures are developed using National Instruments LabVIEW software. The test signals are generated in real time using LabVIEW compatible analog output modules. The results obtained from the simulation studies as well as hardware implementations are also presented.

Limits of Data Level Parallelism

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract—A new breed of processors like the Cell Broadband Engine, the Imagine stream processor and the various GPU processors emphasize data-level parallelism (DLP) and threadlevel parallelism (TLP) as opposed to traditional instructionlevel parallelism (ILP). This allows them to achieve order-ofmagnitude improvements over conventional superscalar processors for many workloads. However, it is unclear as to how much parallelism of these types exists in current programs. Most earlier studies have largely concentrated on the amount of ILP in a program, without differentiating DLP or TLP. In this study, we investigate the extent of data-level parallelism available in programs in the MediaBench suite. By packing instructions in a SIMD fashion, we observe reductions of up to 91 % (84 % on average) in the number of dynamic instructions, indicating a very high degree of DLP in several applications. I.

Exploiting Programmable Network Interfaces for Parallel Query Execution in Workstation Clusters

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Workstation clusters equipped with high performance interconnect having programmable network processors facilitate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in parallel database query execution is performed on the network processor. We evaluate the performance of TPC-H queries executing on a high end cluster where all tuple processing is done on the host processor, using a timed Petri net model, and find that tuple processing costs on the host processor dominate the execution time. These results are validated using a small cluster. We therefore propose 4 schemes where certain tuple processing activity is offloaded to the network processor. The first 2 schemes offload the tuple splitting activity - computation to identify the node on which to process the tuples, resulting in an execution time speedup of 1.09 relative to the base scheme, but with I/O bus becoming the bottleneck resource. In the 3rd scheme in addition to offloading tuple processing activity, the disk and network interface are combined to avoid the I/O bus bottleneck, which results in speedups up to 1.16, but with high host processor utilization. Our 4th scheme where the network processor also performs apart of join operation along with the host processor, gives a speedup of 1.47 along with balanced system resource utilizations. Further we observe that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64

Integrated parallelization of computations and visualization for large-scale applications

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Critical applications like cyclone tracking and earthquake modeling require simultaneous high-performance simulations and online visualization for timely analysis. Faster simulations and simultaneous visualization enable scientists provide real-time guidance to decision makers. In this work, we have developed an integrated user-driven and automated steering framework that simultaneously performs numerical simulations and efficient online remote visualization of critical weather applications in resource-constrained environments. It considers application dynamics like the criticality of the application and resource dynamics like the storage space, network bandwidth and available number of processors to adapt various application and resource parameters like simulation resolution, simulation rate and the frequency of visualization. We formulate the problem of finding an optimal set of simulation parameters as a linear programming problem. This leads to 30% higher simulation rate and 25-50% lesser storage consumption than a naive greedy approach. The framework also provides the user control over various application parameters like region of interest and simulation resolution. We have also devised an adaptive algorithm to reduce the lag between the simulation and visualization times. Using experiments with different network bandwidths, we find that our adaptive algorithm is able to reduce lag as well as visualize the most representative frames.

Extended histories: improving regularity and performance in correlation prefetchers

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data Prefetchers identify and make use of any regularity present in the history/training stream to predict future references and prefetch them into the cache. The training information used is typically the primary misses seen at a particular cache level, which is a filtered version of the accesses seen by the cache. In this work we demonstrate that extending the training information to include secondary misses and hits along with primary misses helps improve the performance of prefetchers. In addition to empirical evaluation, we use the information theoretic metric entropy, to quantify the regularity present in extended histories. Entropy measurements indicate that extended histories are more regular than the default primary miss only training stream. Entropy measurements also help corroborate our empirical findings. With extended histories, further benefits can be achieved by triggering prefetches during secondary misses also. In this paper we explore the design space of extended prefetch histories and alternative prefetch trigger points for delta correlation prefetchers. We observe that different prefetch schemes benefit to a different extent with extended histories and alternative trigger points. Also the best performing design point varies on a per-benchmark basis. To meet these requirements, we propose a simple adaptive scheme that identifies the best performing design point for a benchmark-prefetcher combination at runtime. In SPEC2000 benchmarks, using all the L2 accesses as history for prefetcher improves the performance in terms of both IPC and misses reduced over techniques that use only primary misses as history. The adaptive scheme improves the performance of CZone prefetcher over Baseline by 4.6% on an average. These performance gains are accompanied by a moderate reduction in the memory traffic requirements.

Optimization of HfO2 films for high transconductance back gated graphene transistors

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Hafnium dioxide (HfO2) films, deposited using electron beam evaporation, are optimized for high performance back-gated graphene transistors. Bilayer graphene is identified on HfO2/Si substrate using optical microscope and subsequently confirmed with Raman spectroscopy. Back-gated graphene transistor, with 32 nm thick HfO2 gate dielectric, has been fabricated with very high transconductance value of 60 mu S. From the hysteresis of the current-voltage characteristics, we estimate the trap density in HfO2 to be in the mid 10(11)/cm(2) range, comparable to SiO2.

A divide and conquer strategy for scaling weather simulations with multiple regions of interest

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we present a strategy for parallel execution of multiple nested domain simulations based on partitioning the 2-D processor grid into disjoint rectangular regions associated with each domain. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. Experiments on IBM Blue Gene systems using WRF show that the proposed strategies result in performance improvement of up to 33% with topology-oblivious mapping and up to additional 7% with topology-aware mapping over the default sequential strategy.

Room-temperature bandlike transport and Hall effect in a high-mobility ambipolar polymer

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The advent of a new class of high-mobility semiconducting polymers opens up a window to address fundamental issues in electrical transport mechanism such as transport between localized states versus extended state conduction. Here, we investigate the origin of the ultralow degree of disorder (E-a similar to 16 meV) and the ``bandlike'' negative temperature (T) coefficient of the field effect electron mobility: mu(e)(FET) (T) in a high performance (mu(e)(FET) > 2.5 cm(2) V-1 s(-1)) diketopyrrolopyrrole based semiconducting polymer. Models based on the framework of mobility edge with exponential density of states are invoked to explain the trends in transport. The temperature window over which the system demonstrates delocalized transport was tuned by a systematic introduction of disorder at the transport interface. Additionally, the Hall mobility (mu(e)(Hall)) extracted from Hall voltage measurements in these devices was found to be comparable to field effect mobility (mu(e)(FET)) in the high T bandlike regime. Comprehensive studies with different combinations of dielectrics and semiconductors demonstrate the effectiveness of rationale molecular design, which emphasizes uniform-energetic landscape and low reorganization energy.

Zinc oxide nanostructures and high electron mobility nanocomposite thin film transistors

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper reports on the synthesis of zinc oxide (ZnO) nanostructures and examines the performance of nanocomposite thin-film transistors (TFTs) fabricated using ZnO dispersed in both n- and p-type polymer host matrices. The ZnO nanostructures considered here comprise nanowires and tetrapods and were synthesized using vapor phase deposition techniques involving the carbothermal reduction of solid-phase zinc-containing compounds. Measurement results of nanocomposite TFTs based on dispersion of ZnO nanorods in an n-type organic semiconductor ([6, 6]-phenyl-C61-butyric acid methyl ester) show electron field-effect mobilities in the range 0.3-0.6 cm2V-1 s-1. representing an approximate enhancement by as much as a factor of 40 from the pristine state. The on/off current ratio of the nanocomposite TFTs approach 106 at saturation with off-currents on the order of 10 pA. The results presented here, although preliminary, show a highly promising enhancement for realization of high-performance solution-processable n-type organic TFTs. © 2008 IEEE.

High temperature direct double side cooled inverter module for hybrid electric vehicle application

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper a novel approach to the design and fabrication of a high temperature inverter module for hybrid electrical vehicles is presented. Firstly, SiC power electronic devices are considered in place of the conventional Si devices. Use of SiC raises the maximum practical operating junction temperature to well over 200°C, giving much greater thermal headroom between the chips and the coolant. In the first fabrication, a SiC Schottky barrier diode (SBD) replaces the Si pin diode and is paired with a Si-IGBT. Secondly, double-sided cooling is employed, in which the semiconductor chips are sandwiched between two substrate tiles. The tiles provide electrical connections to the top and the bottom of the chips, thus replacing the conventional wire bonded interconnect. Each tile assembly supports two IGBTs and two SBDs in a half-bridge configuration. Both sides of the assembly are cooled directly using a high-performance liquid impingement system. Specific features of the design ensure that thermo-mechanical stresses are controlled so as to achieve long thermal cycling life. A prototype 10 kW inverter module is described incorporating three half-bridge sandwich assemblies, gate drives, dc-link capacitance and two heat-exchangers. This achieves a volumetric power density of 30W/cm3.

«
1
2
...
45
46
47
48
49
50
51
...
57
58
»