Biblioteca Digital

1000 resultados para Supercomputer Education

The fully implicit stochastic-alpha method for stiff stochastic differential equations

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A fully implicit integration method for stochastic differential equations with significant multiplicative noise and stiffness in both the drift and diffusion coefficients has been constructed, analyzed and illustrated with numerical examples in this work. The method has strong order 1.0 consistency and has user-selectable parameters that allow the user to expand the stability region of the method to cover almost the entire drift-diffusion stability plane. The large stability region enables the method to take computationally efficient time steps. A system of chemical Langevin equations simulated with the method illustrates its computational efficiency.

Redefine: Runtime Reconfigurable Polymorphic ASIC

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization and low power dissipation, we propose REDEFINE, a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re-composition. It is a ``future-proof'' custom hardware solution for multiple applications and their derivatives in a domain. In this article, we describe a compiler framework and supporting hardware comprising compute, storage, and communication resources. Applications described in high-level language (e.g., C) are compiled into application substructures. For each application substructure, a set of compute elements on the hardware are interconnected during runtime to form a pattern that closely matches the communication pattern of that particular application. The advantage is that the bounded CEs are neither processor cores nor logic elements as in FPGAs. Hence, REDEFINE offers the power and performance advantage of an ASIC and the hardware reconfigurability and programmability of that of an FPGA/instruction set processor. In addition, the hardware supports custom instruction pipelining. Existing instruction-set extensible processors determine a sequence of instructions that repeatedly occur within the application to create custom instructions at design time to speed up the execution of this sequence. We extend this scheme further, where a kernel is compiled into custom instructions that bear strong producer-consumer relationship (and not limited to frequently occurring sequences of instructions). Custom instructions, realized as hardware compositions effected at runtime, allow several instances of the same to be active in parallel. A key distinguishing factor in majority of the emerging embedded applications is stream processing. To reduce the overheads of data transfer between custom instructions, direct communication paths are employed among custom instructions. In this article, we present the overview of the hardware-aware compiler framework, which determines the NoC-aware schedule of transports of the data exchanged between the custom instructions on the interconnect. The results for the FFT kernel indicate a 25% reduction in the number of loads/stores, and throughput improves by log(n) for n-point FFT when compared to sequential implementation. Overall, REDEFINE offers flexibility and a runtime reconfigurability at the expense of 1.16x in power and 8x in area when compared to an ASIC. REDEFINE implementation consumes 0.1x the power of an FPGA implementation. In addition, the configuration overhead of the FPGA implementation is 1,000x more than that of REDEFINE.

An algorithm to find similar internal sequence repeats

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years, identification of sequence patterns has been given immense importance to understand better their significance with respect to genomic organization and evolutionary processes. To this end, an algorithm has been derived to identify all similar sequence repeats present in a protein sequence. The proposed algorithm is useful to correlate the three-dimensional structure of various similar sequence repeats available in the Protein Data Bank against the same sequence repeats present in other databases like SWISS-PROT, PIR and Genome databases.

Strategies for efficient disruption of metabolism in Mycobacterium tuberculosis from network analysis

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tuberculosis continues to be a major health challenge, warranting the need for newer strategies for therapeutic intervention and newer approaches to discover them. Here, we report the identification of efficient metabolism disruption strategies by analysis of a reactome network. Protein-protein dependencies at a genome scale are derived from the curated metabolic network, from which insights into the nature and extent of inter-protein and inter-pathway dependencies have been obtained. A functional distance matrix and a subsequent nearness index derived from this information, helps in understanding how the influence of a given protein can pervade to the metabolic network. Thus, the nearness index can be viewed as a metabolic disruptability index, which suggests possible strategies for achieving maximal metabolic disruption by inhibition of the least number of proteins. A greedy approach has been used to identify the most influential singleton, and its combination with the other most pervasive proteins to obtain highly influential pairs, triplets and quadruplets. The effect of deletion of these combinations on cellular metabolism has been studied by flux balance analysis. An obvious outcome of this study is a rational identification of drug targets, to efficiently bring down mycobacterial metabolism.

An Input Triggered Polymorphic ASIC for H.264 Decoding

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper reports the design of an input-triggered polymorphic ASIC for H.264 baseline decoder. Hardware polymorphism is achieved by selectively reusing hardware resources at system and module level. Complete design is done using ESL design tools following a methodology that maintains consistency in testing and verification throughout the design flow. The proposed design can support frame sizes from QCIF to 1080p.

Grids with multiple batch systems for performance enhancement of multi-component and parameter sweep parallel applications

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work, we evaluate the benefits of using Grids with multiple batch systems to improve the performance of multi-component and parameter sweep parallel applications by reduction in queue waiting times. Using different job traces of different loads, job distributions and queue waiting times corresponding to three different queuing policies(FCFS, conservative and EASY backfilling), we conducted a large number of experiments using simulators of two important classes of applications. The first simulator models Community Climate System Model (CCSM), a prominent multi-component application and the second simulator models parameter sweep applications. We compare the performance of the applications when executed on multiple batch systems and on a single batch system for different system and application configurations. We show that there are a large number of configurations for which application execution using multiple batch systems can give improved performance over execution on a single system.

Singular value decomposition based computationally efficient algorithm for rapid dynamic near-infrared diffuse optical tomography

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Purpose: A computationally efficient algorithm (linear iterative type) based on singular value decomposition (SVD) of the Jacobian has been developed that can be used in rapid dynamic near-infrared (NIR) diffuse optical tomography. Methods: Numerical and experimental studies have been conducted to prove the computational efficacy of this SVD-based algorithm over conventional optical image reconstruction algorithms. Results: These studies indicate that the performance of linear iterative algorithms in terms of contrast recovery (quantitation of optical images) is better compared to nonlinear iterative (conventional) algorithms, provided the initial guess is close to the actual solution. The nonlinear algorithms can provide better quality images compared to the linear iterative type algorithms. Moreover, the analytical and numerical equivalence of the SVD-based algorithm to linear iterative algorithms was also established as a part of this work. It is also demonstrated that the SVD-based image reconstruction typically requires O(NN2) operations per iteration, as contrasted with linear and nonlinear iterative methods that, respectively, requir O(NN3) and O(NN6) operations, with ``NN'' being the number of unknown parameters in the optical image reconstruction procedure. Conclusions: This SVD-based computationally efficient algorithm can make the integration of image reconstruction procedure with the data acquisition feasible, in turn making the rapid dynamic NIR tomography viable in the clinic to continuously monitor hemodynamic changes in the tissue pathophysiology.

A strategy for scheduling tightly coupled parallel applications on clusters

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Although various strategies have been developed for scheduling parallel applications with independent tasks, very little work exists for scheduling tightly coupled parallel applications on cluster environments. In this paper, we compare four different strategies based on performance models of tightly coupled parallel applications for scheduling the applications on clusters. In addition to algorithms based on existing popular optimization techniques, we also propose a new algorithm called Box Elimination that searches the space of performance model parameters to determine the best schedule of machines. By means of real and simulation experiments, we evaluated the algorithms on single cluster and multi-cluster setups. We show that our Box Elimination algorithm generates up to 80% more efficient schedule than other algorithms. We also show that the execution times of the schedules produced by our algorithm are more robust against the performance modeling errors.

The alpha method for solving differential algebraic inequality (DAI) systems

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes an algorithm for ``direct numerical integration'' of the initial value Differential-Algebraic Inequalities (DAI) in a time stepping fashion using a sequential quadratic programming (SQP) method solver for detecting and satisfying active path constraints at each time step. The activation of a path constraint generally increases the condition number of the active discretized differential algebraic equation's (DAE) Jacobian and this difficulty is addressed by a regularization property of the alpha method. The algorithm is locally stable when index 1 and index 2 active path constraints and bounds are active. Subject to available regularization it is seen to be stable for active index 3 active path constraints in the numerical examples. For the high index active path constraints, the algorithm uses a user-selectable parameter to perturb the smaller singular values of the Jacobian with a view to reducing the condition number so that the simulation can proceed. The algorithm can be used as a relatively cheaper estimation tool for trajectory and control planning and in the context of model predictive control solutions. It can also be used to generate initial guess values of optimization variables used as input to inequality path constrained dynamic optimization problems. The method is illustrated with examples from space vehicle trajectory and robot path planning.

Analysis of DNA sequence transformations on grids

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Study of the evolution of species or organisms is essential for various biological applications. Evolution is typically studied at the molecular level by analyzing the mutations of DNA sequences of organisms. Techniques have been developed for building phylogenetic or evolutionary trees for a set of sequences. Though phylogenetic trees capture the overall evolutionary relationships among the sequences, they do not reveal fine-level details of the evolution. In this work, we attempt to resolve various fine-level sequence transformation details associated with a phylogenetic tree using cellular automata. In particular, our work tries to determine the cellular automata rules for neighbor-dependent mutations of segments of DNA sequences. We also determine the number of time steps needed for evolution of a progeny from an ancestor and the unknown segments of the intermediate sequences in the phylogenetic tree. Due to the existence of vast number of cellular automata rules, we have developed a grid system that performs parallel guided explorations of the rules on grid resources. We demonstrate our techniques by conducting experiments on a grid comprising machines in three countries and obtaining potentially useful statistics regarding evolutions in three HIV sequences. In particular, our work is able to verify the phenomenon of neighbor-dependent mutations and find that certain combinations of neighbor-dependent mutations, defined by a cellular automata rule, occur with greater than 90% probability. We also find the average number of time steps for mutations for some branches of phylogenetic tree over a large number of possible transformations with standard deviations less than 2.

A Method to Find Sequentially Separated Motifs in Biological Sequences (SSMBS)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sequence motifs occurring in a particular order in proteins or DNA have been proved to be of biological interest. In this paper, a new method to locate the occurrences of up to five user-defined motifs in a specified order in large proteins and in nucleotide sequence databases is proposed. It has been designed using the concept of quantifiers in regular expressions and linked lists for data storage. The application of this method includes the extraction of relevant consensus regions from biological sequences. This might be useful in clustering of protein families as well as to study the correlation between positions of motifs and their functional sites in DNA sequences.

Architecture of a polymorphic ASIC for interoperability across multi-mode H.264 decoders

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Run-time interoperability between different applications based on H.264/AVC is an emerging need in networked infotainment, where media delivery must match the desired resolution and quality of the end terminals. In this paper, we describe the architecture and design of a polymorphic ASIC to support this. The H.264 decoding flow is partitioned into modules, such that the polymorphic ASIC meets the design goals of low-power, low-area, high flexibility, high throughput and fast interoperability between different profiles and levels of H.264. We demonstrate the idea with a multi-mode decoder that can decode baseline, main and high profile H.264 streams and can interoperate at run.time across these profiles. The decoder is capable of processing frame sizes of up to 1024 times 768 at 30 fps. The design synthesized with UMC 0.13 mum technology, occupies 250 k gates and runs at 100 MHz.

A Combinatorial Optimization Problem for High Order PODs with Few Sensors

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Experimental characterization of high dimensional dynamic systems sometimes uses the proper orthogonal decomposition (POD). If there are many measurement locations and relatively fewer sensors, then steady-state behavior can still be studied by sequentially taking several sets of simultaneous measurements. The number required of such sets of measurements can be minimized if we solve a combinatorial optimization problem. We aim to bring this problem to the attention of engineering audiences, summarize some known mathematical results about this problem, and present a heuristic (suboptimal) calculation that gives reasonable, if not stellar, results.

CsrA interacting small RNAs in Haemophilus spp genomes: a theoretical analysis

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The csrA is a carbon storage regulator gene that encodes a protein with multiple RNA interaction sites. Bacterial non-coding small RNAs like csrB, csrC and their counterparts in diverse bacterial genus are identified to control the regulatory activities of CsrA and its orthologs. An attempt has been made in this study to identify 'novel' non-coding small RNAs that are involved in the regulatory activities of csrA gene. All CsrA-interacting small RNAs are computationally fingerprinted to have multiple occurrence of 7-nucleotide CsrA interacting repeats [CAGGA(U/A/C)G] along with a 18-nucleotide upstream binding site. However, in several of the genomes like Haemophilus spp, the upstream binding site is not identified. The current methodology overcomes this difficulty by identifying small RNA-specific orphan transcriptional units within the intergenic regions of the genome. The results could identify all known CsrA-interacting small RNAs in E. coli, Vibrio cholerae and Pseudomonas aeruginosa genomes and additionally has picked six new possible CsrA-interacting small RNA regions in E. coli. Our computational analysis indicates that known rygD and rprA sRNAs in E. coli could possibly interact with CsrA proteins. The study is extended to three of the Haemophilus genomes that could identify seven new possible CsrA interacting small RNAs.

A Petri net model for evaluating packet buffering strategies in a network processor

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Previous studies have shown that buffering packets in DRAM is a performance bottleneck. In order to understand the impediments in accessing the DRAM, we developed a detailed Petri net model of IP forwarding application on IXP2400 that models the different levels of the memory hierarchy. The cell based interface used to receive and transmit packets in a network processor leads to some small size DRAM accesses. Such narrow accesses to the DRAM expose the bank access latency, reducing the bandwidth that can be realized. With real traces up to 30% of the accesses are smaller than the cell size, resulting in 7.7% reduction in DRAM bandwidth. To overcome this problem, we propose buffering these small chunks of data in the on chip scratchpad memory. This scheme also exploits greater degree of parallelism between different levels of the memory hierarchy. Using real traces from the internet, we show that the transmit rate can be improved by an average of 21% over the base scheme without the use of additional hardware. Further, the impact of different traffic patterns on the network processor resources is studied. Under real traffic conditions, we show that the data bus which connects the off-chip packet buffer to the micro-engines, is the obstacle in achieving higher throughput.

«
1
2
3
4
5
6
7
8
...
66
67
»