182 resultados para Compute Unified Device Architecture(CUDA)
Resumo:
Cardiac arrhythmias, such as ventricular tachycardia (VT) and ventricular fibrillation (VF), are among the leading causes of death in the industrialized world. These are associated with the formation of spiral and scroll waves of electrical activation in cardiac tissue; single spiral and scroll waves are believed to be associated with VT whereas their turbulent analogs are associated with VF. Thus, the study of these waves is an important biophysical problem. We present a systematic study of the combined effects of muscle-fiber rotation and inhomogeneities on scroll-wave dynamics in the TNNP (ten Tusscher Noble Noble Panfilov) model for human cardiac tissue. In particular, we use the three-dimensional TNNP model with fiber rotation and consider both conduction and ionic inhomogeneities. We find that, in addition to displaying a sensitive dependence on the positions, sizes, and types of inhomogeneities, scroll-wave dynamics also depends delicately upon the degree of fiber rotation. We find that the tendency of scroll waves to anchor to cylindrical conduction inhomogeneities increases with the radius of the inhomogeneity. Furthermore, the filament of the scroll wave can exhibit drift or meandering, transmural bending, twisting, and break-up. If the scroll-wave filament exhibits weak meandering, then there is a fine balance between the anchoring of this wave at the inhomogeneity and a disruption of wave-pinning by fiber rotation. If this filament displays strong meandering, then again the anchoring is suppressed by fiber rotation; also, the scroll wave can be eliminated from most of the layers only to be regenerated by a seed wave. Ionic inhomogeneities can also lead to an anchoring of the scroll wave; scroll waves can now enter the region inside an ionic inhomogeneity and can display a coexistence of spatiotemporal chaos and quasi-periodic behavior in different parts of the simulation domain. We discuss the experimental implications of our study.
Resumo:
We describe the design of a directory-based shared memory architecture on a hierarchical network of hypercubes. The distributed directory scheme comprises two separate hierarchical networks for handling cache requests and transfers. Further, the scheme assumes a single address space and each processing element views the entire network as contiguous memory space. The size of individual directories stored at each node of the network remains constant throughout the network. Although the size of the directory increases with the network size, the architecture is scalable. The results of the analytical studies demonstrate superior performance characteristics of our scheme compared with those of other schemes.
Resumo:
Monoclonal antibodies have been used as probes to study the architecture of several plant viruses over the past decade. These studies complement the information obtained through X-ray crystallography and help in delineating epitopes on the surface of the virus. The monoclonal antibodies that recognize distinct epitopes also aid in unravelling the mechanisms of assembly/disassembly of virus particles. Group-specific and strain-specific monoclonal antibodies are widely used in the classification of viruses. The significant developments made in this emerging area are reviewed here with specific examples.
Resumo:
An approach to the constraint counting theory of glasses is applied to many glass systems which include an oxide, chalcohalide, and chalcogenides. In this, shifting of the percolation threshold due to noncovalent bonding interactions in a basically covalent network and other recent extensions of the theory appear natural. This is particularly insightful and reveals that the chemical threshold signifies another structural transition along with the rigidity percolation threshold, thus unifying these two seemingly disparate toplogical concepts. [S0163-1829(99)11441-3].
Resumo:
A nondimensional number that is constant in two-dimensional, incompressible and constant pressure laminar and fully turbulent boundary, layer flows has been proposed. An extension of this to constant pressure transitional flow is discussed.
Resumo:
Depth measures the extent of atom/residue burial within a protein. It correlates with properties such as protein stability, hydrogen exchange rate, protein-protein interaction hot spots, post-translational modification sites and sequence variability. Our server, DEPTH, accurately computes depth and solvent-accessible surface area (SASA) values. We show that depth can be used to predict small molecule ligand binding cavities in proteins. Often, some of the residues lining a ligand binding cavity are both deep and solvent exposed. Using the depth-SASA pair values for a residue, its likelihood to form part of a small molecule binding cavity is estimated. The parameters of the method were calibrated over a training set of 900 high-resolution X-ray crystal structures of single-domain proteins bound to small molecules (molecular weight < 1.5 KDa). The prediction accuracy of DEPTH is comparable to that of other geometry-based prediction methods including LIGSITE, SURFNET and Pocket-Finder (all with Matthew's correlation coefficient of similar to 0.4) over a testing set of 225 single and multi-chain protein structures. Users have the option of tuning several parameters to detect cavities of different sizes, for example, geometrically flat binding sites. The input to the server is a protein 3D structure in PDB format. The users have the option of tuning the values of four parameters associated with the computation of residue depth and the prediction of binding cavities. The computed depths, SASA and binding cavity predictions are displayed in 2D plots and mapped onto 3D representations of the protein structure using Jmol. Links are provided to download the outputs. Our server is useful for all structural analysis based on residue depth and SASA, such as guiding site-directed mutagenesis experiments and small molecule docking exercises, in the context of protein functional annotation and drug discovery.
Resumo:
Molecular complexes of melamine with hydroxy and dihydroxybenzoic acids have been analyzed to assess the collective role of the hydroxyl (OH) and carboxyl (COOH) functionalities in the recognition process. In most cases, solvents of crystallization do play a major role in self-assembly and structure stabilization. Hydrated compounds generate linear chains of melamine molecules with acid molecules pendant resulting in a zipper architecture. However, anhydrous and solvated compounds generate tetrameric units consisting of melamine dimers together with acid molecules. These tetramers in turn interweave to form a Lincoln log arrangement in the crystal. The salt/co-crystal formation in these complexes cannot be predicted apriori on the basis of Delta pK(a) values as there exists a salt-to-co-crystal continuum.
Resumo:
We present a real-time haptics-aided injection technique for biological cells using miniature compliant mechanisms. Our system consists of a haptic robot operated by a human hand, an XYZ stage for micro-positioning, a camera for image capture, and a polydimethylsiloxane (PDMS) miniature compliant device that serves the dual purpose of an injecting tool and a force-sensor. In contrast to existing haptics-based micromanipulation techniques where an external force sensor is used, we use visually captured displacements of the compliant mechanism to compute the applied and reaction forces. The human hand can feel the magnified manipulation force through the haptic device in real-time while the motion of the human hand is replicated on the mechanism side. The images are captured using a camera at the rate of 30 frames per second for extracting the displacement data. This is used to compute the forces at the rate of 30 Hz. The force computed in this manner is sent at the rate of 1000 Hz to ensure stable haptic interaction. The haptic cell-manipulation system was tested by injecting into a zebrafish egg cell after validating the technique at a size larger than that of the cell.
Resumo:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
Resumo:
In this paper analytical expressions for optimal Vdd and Vth to minimize energy for a given speed constraint are derived. These expressions are based on the EKV model for transistors and are valid in both strong inversion and sub threshold regions. The effect of gate leakage on the optimal Vdd and Vth is analyzed. A new gradient based algorithm for controlling Vdd and Vth based on delay and power monitoring results is proposed. A Vdd-Vth controller which uses the algorithm to dynamically control the supply and threshold voltage of a representative logic block (sum of absolute difference computation of an MPEG decoder) is designed. Simulation results using 65 nm predictive technology models are given.
Resumo:
Regular Expressions are generic representations for a string or a collection of strings. This paper focuses on implementation of a regular expression matching architecture on reconfigurable fabric like FPGA. We present a Nondeterministic Finite Automata based implementation with extended regular expression syntax set compared to previous approaches. We also describe a dynamically reconfigurable generic block that implements the supported regular expression syntax. This enables formation of the regular expression hardware by a simple cascade of generic blocks as well as a possibility for reconfiguring the generic blocks to change the regular expression being matched. Further,we have developed an HDL code generator to obtain the VHDL description of the hardware for any regular expression set. Our optimized regular expression engine achieves a throughput of 2.45 Gbps. Our dynamically reconfigurable regular expression engine achieves a throughput of 0.8 Gbps using 12 FPGA slices per generic block on Xilinx Virtex2Pro FPGA.
Resumo:
Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences critical system design objectives like area, power and performance. Hence the embedded system designer performs a complete memory architecture exploration to custom design a memory architecture for a given set of applications. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of exhaustive-search based memory exploration at the outer level and a two step based integrated data layout for SPRAM-Cache based architectures at the inner level. We present a two step integrated approach for data layout for SPRAM-Cache based hybrid architectures with the first step as data-partitioning that partitions data between SPRAM and Cache, and the second step is the cache conscious data layout. We formulate the cache-conscious data layout as a graph partitioning problem and show that our approach gives up to 34% improvement over an existing approach and also optimizes the off-chip memory address space. We experimented our approach with 3 embedded multimedia applications and our approach explores several hundred memory configurations for each application, yielding several optimal design points in a few hours of computation on a standard desktop.
Resumo:
This paper presents the design of the area optimized integer two dimensional discrete cosine transform (2-D DCT) used in H.264/AVC codecs. The 2-D DCT calculation is performed by utilizing the separability property, in such a way that 2-D DCT is divided into two 1-D DCT calculation that are joined through a common memory. Due to its area optimized approach, the design will find application in mobile devices. Verilog hardware description language (HDL) in cadence environment has been used for design, compilation, simulation and synthesis of transform block in 0.18 mu TSMC technology.