443 resultados para Supercomputer
Resumo:
The altered spontaneous emission of an emitter near an arbitrary body can be elucidated using an energy balance of the electromagnetic field. From a classical point of view it is trivial to show that the field scattered back from any body should alter the emission of the source. But it is not at all apparent that the total radiative and non-radiative decay in an arbitrary body can add to the vacuum decay rate of the emitter (i.e.) an increase of emission that is just as much as the body absorbs and radiates in all directions. This gives us an opportunity to revisit two other elegant classical ideas of the past, the optical theorem and the Wheeler-Feynman absorber theory of radiation. It also provides us alternative perspectives of Purcell effect and generalizes many of its manifestations, both enhancement and inhibition of emission. When the optical density of states of a body or a material is difficult to resolve (in a complex geometry or a highly inhomogeneous volume) such a generalization offers new directions to solutions. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The Morse-Smale complex is a topological structure that captures the behavior of the gradient of a scalar function on a manifold. This paper discusses scalable techniques to compute the Morse-Smale complex of scalar functions defined on large three-dimensional structured grids. Computing the Morse-Smale complex of three-dimensional domains is challenging as compared to two-dimensional domains because of the non-trivial structure introduced by the two types of saddle criticalities. We present a parallel shared-memory algorithm to compute the Morse-Smale complex based on Forman's discrete Morse theory. The algorithm achieves scalability via synergistic use of the CPU and the GPU. We first prove that the discrete gradient on the domain can be computed independently for each cell and hence can be implemented on the GPU. Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU. Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU.
Resumo:
Online remote visualization and steering of critical weather applications like cyclone tracking are essential for effective and timely analysis by geographically distributed climate science community. A steering framework for controlling the high-performance simulations of critical weather events needs to take into account both the steering inputs of the scientists and the criticality needs of the application including minimum progress rate of simulations and continuous visualization of significant events. In this work, we have developed an integrated user-driven and automated steering framework INST for simulations, online remote visualization, and analysis for critical weather applications. INST provides the user control over various application parameters including region of interest, resolution of simulation, and frequency of data for visualization. Unlike existing efforts, our framework considers both the steering inputs and the criticality of the application, namely, the minimum progress rate needed for the application, and various resource constraints including storage space and network bandwidth to decide the best possible parameter values for simulations and visualization.
Resumo:
Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of `protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a `roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.
Resumo:
The spatial search problem on regular lattice structures in integer number of dimensions d >= 2 has been studied extensively, using both coined and coinless quantum walks. The relativistic Dirac operator has been a crucial ingredient in these studies. Here, we investigate the spatial search problem on fractals of noninteger dimensions. Although the Dirac operator cannot be defined on a fractal, we construct the quantum walk on a fractal using the flip-flop operator that incorporates a Klein-Gordon mode. We find that the scaling behavior of the spatial search is determined by the spectral (and not the fractal) dimension. Our numerical results have been obtained on the well-known Sierpinski gaskets in two and three dimensions.
Resumo:
Experiments have shown strong effects of some substrates on the localized plasmons of metallic nano particles but they are inconclusive on the affecting parameters. Here, we have used discrete dipole approximation in conjunction with Sommerfeld integral relations to explain the effect of the substrates as a function of the parameters of incident radiation. The radiative coupling can both quench and enhance the resonance and its dependence on the angle and polarization of incident radiation with respect to the surface is shown. Non-radiative interaction with the substrate enhances the plasmon resonance of the particles and can shift the resonances from their free-space energies significantly. The non-radiative interaction of the substrate is sensitive to the shape of particles and polarization of incident radiation with respect to substrate. Our results show that the plasmon resonances in coupled and single particles can be significantly altered from their free-space resonances and are quenched or enhanced by the choice of substrate and polarization of incident radiation. (C) 2012 American Institute of Physics. http://dx.doi.org/10.1063/1.4736544]
Resumo:
We investigate evolution of quantum correlations in ensembles of two-qubit nuclear spin systems via nuclear magnetic resonance techniques. We use discord as a measure of quantum correlations and the Werner state as an explicit example. We, first, introduce different ways of measuring discord and geometric discord in two-qubit systems and then describe the following experimental studies: (a) We quantitatively measure discord for Werner-like states prepared using an entangling pulse sequence. An initial thermal state with zero discord is gradually and periodically transformed into a mixed state with maximum discord. The experimental and simulated behavior of rise and fall of discord agree fairly well. (b) We examine the efficiency of dynamical decoupling sequences in preserving quantum correlations. In our experimental setup, the dynamical decoupling sequences preserved the traceless parts of the density matrices at high fidelity. But they could not maintain the purity of the quantum states and so were unable to keep the discord from decaying. (c) We observe the evolution of discord for a singlet-triplet mixed state during a radio-frequency spin-lock. A simple relaxation model describes the evolution of discord, and the accompanying evolution of fidelity of the long-lived singlet state, reasonably well.
Resumo:
The diffusion equation-based modeling of near infrared light propagation in tissue is achieved by using finite-element mesh for imaging real-tissue types, such as breast and brain. The finite-element mesh size (number of nodes) dictates the parameter space in the optical tomographic imaging. Most commonly used finite-element meshing algorithms do not provide the flexibility of distinct nodal spacing in different regions of imaging domain to take the sensitivity of the problem into consideration. This study aims to present a computationally efficient mesh simplification method that can be used as a preprocessing step to iterative image reconstruction, where the finite-element mesh is simplified by using an edge collapsing algorithm to reduce the parameter space at regions where the sensitivity of the problem is relatively low. It is shown, using simulations and experimental phantom data for simple meshes/domains, that a significant reduction in parameter space could be achieved without compromising on the reconstructed image quality. The maximum errors observed by using the simplified meshes were less than 0.27% in the forward problem and 5% for inverse problem.
Resumo:
Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than one hour. Malleable applications, where the number of processors on which the applications execute can be changed during executions, can make use of their malleability to better tolerate high failure rates. We present AdFT, an adaptive fault tolerance framework for long running malleable applications to maximize application performance in the presence of failures. AdFT framework includes cost models for evaluating the benefits of various fault tolerance actions including checkpointing, live-migration and rescheduling, and runtime decisions for dynamically selecting the fault tolerance actions at different points of application execution to maximize performance. Simulations with real and synthetic failure traces show that our approach outperforms existing fault tolerance mechanisms for malleable applications yielding up to 23% improvement in application performance, and is effective even for petascale systems and beyond.
Resumo:
Purpose: To optimize the data-collection strategy for diffuse optical tomography and to obtain a set of independent measurements among the total measurements using the model based data-resolution matrix characteristics. Methods: The data-resolution matrix is computed based on the sensitivity matrix and the regularization scheme used in the reconstruction procedure by matching the predicted data with the actual one. The diagonal values of data-resolution matrix show the importance of a particular measurement and the magnitude of off-diagonal entries shows the dependence among measurements. Based on the closeness of diagonal value magnitude to off-diagonal entries, the independent measurements choice is made. The reconstruction results obtained using all measurements were compared to the ones obtained using only independent measurements in both numerical and experimental phantom cases. The traditional singular value analysis was also performed to compare the results obtained using the proposed method. Results: The results indicate that choosing only independent measurements based on data-resolution matrix characteristics for the image reconstruction does not compromise the reconstructed image quality significantly, in turn reduces the data-collection time associated with the procedure. When the same number of measurements (equivalent to independent ones) are chosen at random, the reconstruction results were having poor quality with major boundary artifacts. The number of independent measurements obtained using data-resolution matrix analysis is much higher compared to that obtained using the singular value analysis. Conclusions: The data-resolution matrix analysis is able to provide the high level of optimization needed for effective data-collection in diffuse optical imaging. The analysis itself is independent of noise characteristics in the data, resulting in an universal framework to characterize and optimize a given data-collection strategy. (C) 2012 American Association of Physicists in Medicine. http://dx.doi.org/10.1118/1.4736820]
Resumo:
Computational grids with multiple batch systems (batch grids) can be powerful infrastructures for executing long-running multi-component parallel applications. In this paper, we evaluate the potential improvements in throughput of long-running multi-component applications when the different components of the applications are executed on multiple batch systems of batch grids. We compare the multiple batch executions with executions of the components on a single batch system without increasing the number of processors used for executions. We perform our analysis with a foremost long-running multi-component application for climate modeling, the Community Climate System Model (CCSM). We have built a robust simulator that models the characteristics of both the multi-component application and the batch systems. By conducting large number of simulations with different workload characteristics and queuing policies of the systems, processor allocations to components of the application, distributions of the components to the batch systems and inter-cluster bandwidths, we show that multiple batch executions lead to 55% average increase in throughput over single batch executions for long-running CCSM. We also conducted real experiments with a practical middleware infrastructure and showed that multi-site executions lead to effective utilization of batch systems for executions of CCSM and give higher simulation throughput than single-site executions. Copyright (c) 2011 John Wiley & Sons, Ltd.
Resumo:
Background and Purpose: Withanolides are naturally occurring chemical compounds. They are secondary metabolites produced via oxidation of steroids and structurally consist of a steroid-backbone bound to a lactone or its derivatives. They are known to protect plants against herbivores and have medicinal value including anti-inflammation, anti-cancer, adaptogenic and anti-oxidant effects. Withaferin A (Wi-A) and Withanone (Wi-N) are two structurally similar withanolides isolated from Withania somnifera, also known as Ashwagandha in Indian Ayurvedic medicine. Ashwagandha alcoholic leaf extract (i-Extract), rich in Wi-N, was shown to kill cancer cells selectively. Furthermore, the two closely related purified phytochemicals, Wi-A and Wi-N, showed differential activity in normal and cancer human cells in vitro and in vivo. We had earlier identified several genes involved in cytotoxicity of i-Extract in human cancer cells by loss-of-function assays using either siRNA or randomized ribozyme library. Methodology/Principal Findings: In the present study, we have employed bioinformatics tools on four genes, i.e., mortalin, p53, p21 and Nrf2, identified by loss-of-function screenings. We examined the docking efficacy of Wi-N and Wi-A to each of the four targets and found that the two closely related phytochemicals have differential binding properties to the selected cellular targets that can potentially instigate differential molecular effects. We validated these findings by undertaking parallel experiments on specific gene responses to either Wi-N or Wi-A in human normal and cancer cells. We demonstrate that Wi-A that binds strongly to the selected targets acts as a strong cytotoxic agent both for normal and cancer cells. Wi-N, on the other hand, has a weak binding to the targets; it showed milder cytotoxicity towards cancer cells and was safe for normal cells. The present molecular docking analyses and experimental evidence revealed important insights to the use of Wi-A and Wi-N for cancer treatment and development of new anti-cancer phytochemical cocktails.
Resumo:
Effective sharing of the last level cache has a significant influence on the overall performance of a multicore system. We observe that existing solutions control cache occupancy at a coarser granularity, do not scale well to large core counts and in some cases lack the flexibility to support a variety of performance goals. In this paper, we propose Probabilistic Shared Cache Management (PriSM), a framework to manage the cache occupancy of different cores at cache block granularity by controlling their eviction probabilities. The proposed framework requires only simple hardware changes to implement, can scale to larger core count and is flexible enough to support a variety of performance goals. We demonstrate the flexibility of PriSM, by computing the eviction probabilities needed to achieve goals like hit-maximization, fairness and QOS. PriSM-HitMax improves performance by 18.7% over LRU and 11.8% over previously proposed schemes in a sixteen core machine. PriSM-Fairness improves fairness over existing solutions by 23.3% along with a performance improvement of 19.0%. PriSM-QOS successfully achieves the desired QOS targets.
Resumo:
Ensuring reliable operation over an extended period of time is one of the biggest challenges facing present day electronic systems. The increased vulnerability of the components to atmospheric particle strikes poses a big threat in attaining the reliability required for various mission critical applications. Various soft error mitigation methodologies exist to address this reliability challenge. A general solution to this problem is to arrive at a soft error mitigation methodology with an acceptable implementation overhead and error tolerance level. This implementation overhead can then be reduced by taking advantage of various derating effects like logical derating, electrical derating and timing window derating, and/or making use of application redundancy, e. g. redundancy in firmware/software executing on the so designed robust hardware. In this paper, we analyze the impact of various derating factors and show how they can be profitably employed to reduce the hardware overhead to implement a given level of soft error robustness. This analysis is performed on a set of benchmark circuits using the delayed capture methodology. Experimental results show upto 23% reduction in the hardware overhead when considering individual and combined derating factors.