134 resultados para Hardware Transactional Memory


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Serine hydroxymethyltransferase (SHMT) belongs to the alpha-family of pyridoxal 5'-phosphate-dependent enzymes and catalyzes the reversible conversion of L-Ser and etrahydrofolate to Gly and 5,10-methylene tetrahydrofolate. 5,10-Methylene tetrahydrofolate serves as a source of one-carbon fragment in many biological processes. SHMT also catalyzes the tetrahydrofolate-independent conversion of L-allo-Thr to Gly and acetaldehyde. The crystal structure of Bacillus stearothermophilus SHMT (bsSHMT) suggested that E53 interacts with the substrate, L-Ser and etrahydrofolate. To elucidate the role of E53, it was mutated to Q and structural and biochemical studies were carried out with the mutant enzyme. The internal aldimine structure of E53QbsSHMT was similar to that of the except for significant changes at Q53, Y60 and Y61. The wild-type enzyme, carboxyl of Gly and side chain of L-Ser were in two conformations in the respective external aldimine structures. The mutant enzyme was completely inactive for tetrahydrofolate-depen dent cleavage of L-Ser, whereas there was a 1.5-fold increase in the rate of tetrahydrofolate-independent reaction with L-allo-Thr. The results obtained from these studies suggest that E53 plays an essential role in tetrahydrofolate/5-formyl tetrahydrofolate binding and in the proper positioning of C beta of L-Ser for direct attack by N5 of tetrahydrofolate. Most interestingly, the structure of the complex obtained by cocrystallization of E53QbsSHMT with Gly and 5-formyl tetrahydrofolate revealed the gem-diamine form of pyridoxal 5'-phosphate bound to Gly and active site Lys. However, density for 5-formyl tetrahydrofolate was not observed. Gly carboxylate was in a single conformation, whereas pyridoxal 5'-phosphate had two distinct conformations. The differences between the structures of this complex and Gly external aldimine suggest that the changes induced by initial binding of 5-formyl tetrahydrofolate are retained even though 5-formyl tetrahydrofolate is absent in the final structure. Spectral studies carried out with this mutant enzyme also suggest that 5-formyl tetrahydrofolate binds to the E53QbsSHMT-Gly complex forming a quinonoid intermediate and falls off within 4 h of dialysis, leaving behind the mutant enzyme in the gemdiamine form. This is the first report to provide direct evidence for enzyme memory based on the crystal structure of enzyme complexes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The fluctuation of the distance between a fluorescein-tyrosine pair within a single protein complex was directly monitored in real time by photoinduced electron transfer and found to be a stationary, time-reversible, and non-Markovian Gaussian process. Within the generalized Langevin equation formalism, we experimentally determine the memory kernel K(t), which is proportional to the autocorrelation function of the random fluctuating force. K(t) is a power-law decay, t(-0.51 +/- 0.07) in a broad range of time scales (10(-3)-10 s). Such a long-time memory effect could have implications for protein functions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Grid-connected systems when put to use at the site would experience scenarios like voltage sag, voltage swell, frequency deviations and unbalance which are common in the real world grid. When these systems are tested at laboratory, these scenarios do not exist and an almost stiff voltage source is what is usually seen. But, to qualify the grid-connected systems to operate at the site, it becomes essential to test them under the grid conditions mentioned earlier. The grid simulator is a hardware that can be programmed to generate some of the typical conditions experienced by the grid-connected systems at site. It is an inverter that is controlled to act like a voltage source in series with a grid impedance. The series grid impedance is emulated virtually within the inverter control rather than through physical components, thus avoiding the losses and the need for bulky reactive components. This paper describes the design of a grid simulator. Control implementation issues are highlighted in the experimental results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Template matching is concerned with measuring the similarity between patterns of two objects. This paper proposes a memory-based reasoning approach for pattern recognition of binary images with a large template set. It seems that memory-based reasoning intrinsically requires a large database. Moreover, some binary image recognition problems inherently need large template sets, such as the recognition of Chinese characters which needs thousands of templates. The proposed algorithm is based on the Connection Machine, which is the most massively parallel machine to date, using a multiresolution method to search for the matching template. The approach uses the pyramid data structure for the multiresolution representation of templates and the input image pattern. For a given binary image it scans the template pyramid searching the match. A binary image of N × N pixels can be matched in O(log N) time complexity by our algorithm and is independent of the number of templates. Implementation of the proposed scheme is described in detail.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have performed a series of magnetic aging experiments on single crystals of Dy0.5Sr0.5MnO3. The results demonstrate striking memory and chaos-like effects in this insulating half-doped perovskite manganite and suggest the existence of strong magnetic relaxation mechanisms of a clustered magnetic state. The spin-glass-like state established below a temperature T-sg approximate to 34 K originates from quenched disorder arising due to the ionic-radii mismatch at the rare earth site. However, deviations from the typical behavior seen in canonical spin glass materials are observed which indicate that the glassy magnetic properties are due to cooperative and frustrated dynamics in a heterogeneous or clustered magnetic state. In particular, the microscopic spin flip time obtained from dynamical scaling near the spin glass freezing temperature is four orders of magnitude larger than microscopic times found in atomic spin glasses. The magnetic viscosity deduced from the time dependence of the zero-field-cooled magnetization exhibits a peak at a temperature T < T-sg and displays a marked dependence on waiting time in zero field.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Massively parallel SIMD computing is applied to obtain an order of magnitude improvement in the executional speed of an important algorithm in VLSI design automation. The physical design of a VLSI circuit involves logic module placement as a subtask. The paper is concerned with accelerating the well known Min-cut placement technique for logic cell placement. The inherent parallelism of the Min-cut algorithm is identified, and it is shown that a parallel machine based on the efficient execution of the placement procedure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effect of thermal cycling on the load-controlled tension-tension fatigue behavior of a Ni-Ti-Fe shape memory alloy (SMA) at room temperature was studied. Considerable strain accumulation was observed to occur in this alloy under both quasi-static and cyclic loading conditions. Though, in all cases, steady-state is reached within the first 50-100 cycles, the accumulated steady-state strain, epsilon(p.ss), is much smaller in thermally cycled alloy. As a result, the fatigue performance of them was found to be significantly enhanced vis-a-vis the as-solutionized alloy. Furthermore, under load-controlled conditions, the fatigue life of Ni-Ti-Fe alloys was found to be exclusively dependent on epsilon(p.ss). Observations made by profilometry and differential scanning calorimetry (DSC) indicate that the 200-500% enhancement in fatigue life of thermally cycled alloy is due to the homogeneous distribution of the accumulated fatigue strain. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Implementation details of efficient schemes for lenient execution and concurrent execution of re-entrant routines in a data flow model have been discussed in this paper. The proposed schemes require no extra hardware support and utilise the existing hardware resources such as the Matching Unit and Memory Network Interface, effectively to achieve the above mentioned goals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An associative memory with parallel architecture is presented. The neurons are modelled by perceptrons having only binary, rather than continuous valued input. To store m elements each having n features, m neurons each with n connections are needed. The n features are coded as an n-bit binary vector. The weights of the n connections that store the n features of an element has only two values -1 and 1 corresponding to the absence or presence of a feature. This makes the learning very simple and straightforward. For an input corrupted by binary noise, the associative memory indicates the element that is closest (in terms of Hamming distance) to the noisy input. In the case where the noisy input is equidistant from two or more stored vectors, the associative memory indicates two or more elements simultaneously. From some simple experiments performed on the human memory and also on the associative memory, it can be concluded that the associative memory presented in this paper is in some respect more akin to a human memory than a Hopfield model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

NiTi thin films deposited by DC magnetron sputtering of an alloy (Ni/Ti:45/55) target at different deposition rates and substrate temperatures were analyzed for their structure and mechanical properties. The crystalline structure, phase-transformation and mechanical response were characterized by X-ray diffraction (XRD), Differential Scanning Calorimetry (DSC) and Nano-indentation techniques, respectively. The films were deposited on silicon substrates maintained at temperatures in the range 300 to 500 degrees C and post-annealed at 600 degrees C for four hours to ensure film crystallinity. Films deposited at 300 degrees C and annealed for 600 degrees C have exhibited crystalline behavior with Austenite phase as the prominent phase. Deposition onto substrates held at higher deposition temperatures (400 and 500 degrees C) resulted in the co-existence of Austenite phase along with Martensite phase. The increase in deposition rates corresponding to increase in cathode current from 250 to 350 mA has also resulted in the appearance of Martensite phase as well as improvement in crystallinity. XRD analysis revealed that the crystalline film structure is strongly influenced by process parameters such as substrate temperature and deposition rate. DSC results indicate that the film deposited at 300 degrees C had its crystallization temperature at 445 degrees C in the first thermal cycle, which is further confirmed by stress temperature response. In the second thermal cycle the Austenite and Martensite transitions were observed at 75 and 60 degrees C respectively. However, the films deposited at 500 degrees C had the Austenite and Martensite transitions at 73 and 58 degrees C, respectively. Elastic modulus and hardness values increased from 93 to 145 GPa and 7.2 to 12.6 GPa, respectively, with increase in deposition rates. These results are explained on the basis of change in film composition and crystallization. (C) 2010 Published by Elsevier Ltd

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of a program will ultimately be limited by its serial (scalar) portion, as pointed out by Amdahl′s Law. Reported studies thus far of instruction-level parallelism have mixed data-parallel program portions with scalar program portions, often leading to contradictory and controversial results. We report an instruction-level behavioral characterization of scalar code containing minimal data-parallelism, extracted from highly vectorized programs of the PERFECT benchmark suite running on a Cray Y-MP system. We classify scalar basic blocks according to their instruction mix, characterize the data dependencies seen in each class, and, as a first step, measure the maximum intrablock instruction-level parallelism available. We observe skewed rather than balanced instruction distributions in scalar code and in individual basic block classes of scalar code; nonuniform distribution of parallelism across instruction classes; and, as expected, limited available intrablock parallelism. We identify frequently occurring data-dependence patterns and discuss new instructions to reduce latency. Toward effective scalar hardware, we study latency-pipelining trade-offs and restricted multiple instruction issue mechanisms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large external memory bandwidth requirement leads to increased system power dissipation and cost in video coding application. Majority of the external memory traffic in video encoder is due to reference data accesses. We describe a lossy reference frame compression technique that can be used in video coding with minimal impact on quality while significantly reducing power and bandwidth requirement. The low cost transformless compression technique uses lossy reference for motion estimation to reduce memory traffic, and lossless reference for motion compensation (MC) to avoid drift. Thus, it is compatible with all existing video standards. We calculate the quantization error bound and show that by storing quantization error separately, bandwidth overhead due to MC can be reduced significantly. The technique meets key requirements specific to the video encode application. 24-39% reduction in peak bandwidth and 23-31% reduction in total average power consumption are observed for IBBP sequences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Polycrystalline strontium titanate (SrTiO3) films were prepared by a pulsed laser deposition technique on p-type silicon and platinum-coated silicon substrates. The films exhibited good structural and dielectric properties which were sensitive to the processing conditions. The small signal dielectric constant and dissipation factor at a frequency of 100 kHz were about 225 and 0.03 respectively. The capacitance-voltage (C-V) characteristics in metal-insulator-semiconductor structures exhibited anomalous frequency dispersion behavior and a hysteresis effect. The hysteresis in the C-V curve was found to be about 1 V and of a charge injection type. The density of interface states was about 1.79 x 10(12) cm(-2). The charge storage density was found to be 40 fC mu m(-2) at an applied electric field of 200 kV cm(-1). Studies on current-voltage characteristics indicated an ohmic nature at lower voltages and space charge conduction at higher voltages. The films also exhibited excellent time-dependent dielectric breakdown behavior.