166 resultados para execution traces


Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work, the fracture behavior of magnesium single crystals is studied by conducting experiments with notched three point bend specimens of three crystallographic orientations. In the first and second orientations, the c-axis is along the normal to the flat surface of the notch, while in the third it is aligned with the notch front. For all the orientations, in situ electron back scattered diffraction observations made around the notch root show profuse tensile twinning of {10 (1) over bar2} type. Further, in the first two orientations basal and prismatic slip traces are identified from optical metallography. The width of the most prominent twin saturates at around 120-150 mu m, while twins continue to nucleate farther away to accommodate plastic deformation. In all the orientations, crack initiation occurs before the attainment of peak load and the crack grows stably along twin-matrix interface before deflecting at twin-twin intersections. Results show that profuse tensile twinning is an important energy dissipating mechanism that enhances the fracture toughness. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many meteorological phenomena occur at different locations simultaneously. These phenomena vary temporally and spatially. It is essential to track these multiple phenomena for accurate weather prediction. Efficient analysis require high-resolution simulations which can be conducted by introducing finer resolution nested simulations, nests at the locations of these phenomena. Simultaneous tracking of these multiple weather phenomena requires simultaneous execution of the nests on different subsets of the maximum number of processors for the main weather simulation. Dynamic variation in the number of these nests require efficient processor reallocation strategies. In this paper, we have developed strategies for efficient partitioning and repartitioning of the nests among the processors. As a case study, we consider an application of tracking multiple organized cloud clusters in tropical weather systems. We first present a parallel data analysis algorithm to detect such clouds. We have developed a tree-based hierarchical diffusion method which reallocates processors for the nests such that the redistribution cost is less. We achieve this by a novel tree reorganization approach. We show that our approach exhibits up to 25% lower redistribution cost and 53% lesser hop-bytes than the processor reallocation strategy that does not consider the existing processor allocation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Using a recently developed strong-coupling method, we present a comprehensive theory for doublon production processes in modulation spectroscopy of a three-dimensional system of ultracold fermionic atoms in an optical lattice with a trap. The theoretical predictions compare well to the experimental time traces of doublon production. For experimentally feasible conditions, we provide a quantitative prediction for the presence of a nonlinear ``two-photon'' excitation at strong modulation amplitudes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Knowledge of protein-ligand interactions is essential to understand several biological processes and important for applications ranging from understanding protein function to drug discovery and protein engineering. Here, we describe an algorithm for the comparison of three-dimensional ligand-binding sites in protein structures. A previously described algorithm, PocketMatch (version 1.0) is optimised, expanded, and MPI-enabled for parallel execution. PocketMatch (version 2.0) rapidly quantifies binding-site similarity based on structural descriptors such as residue nature and interatomic distances. Atomic-scale alignments may also be obtained from amino acid residue pairings generated. It allows an end-user to compute database-wide, all-to-all comparisons in a matter of hours. The use of our algorithm on a sample dataset, performance-analysis, and annotated source code is also included.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In celebrating Professor C. N. R. Rao's 80th birthday, this article recalls his singular contributions to solid state and materials chemistry for about sixty years. In so doing, the article also traces the growth of the field as a central domain of research in chemical sciences from its early origins in Europe. Although Rao's major work lies in solid state and materials chemistry - a field which he started and nurtured in India while its importance was being recognized internationally - his contributions to other areas of chemistry (and physics), viz., molecular spectroscopy, phase transitions, fullerenes, graphene, nanomaterials and multiferroics are equally significant. Illustrative examples of his work devoted to rare earth and transition metal oxides, defects and nonstoichiometry, metal-insulator transitions, investigation of crystal and electronic structures of a variety of solids by means of electron microscopies and photoelectron spectroscopy, superconducting cuprates, magnetoresistive manganites, multiferroic metal oxides of various structures and, last but not the least, development of new strategies for chemical synthesis of a wide variety of solids including nanomaterials and framework solids in different dimensionalities, are highlighted. The article also captures his exemplary role as a science teacher, science educationist and institution builder in post-Independence India.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of flow-sensitivity. We implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new representation of spatio-temporal random processes is proposed in this work. In practical applications, such processes are used to model velocity fields, temperature distributions, response of vibrating systems, to name a few. Finding an efficient representation for any random process leads to encapsulation of information which makes it more convenient for a practical implementations, for instance, in a computational mechanics problem. For a single-parameter process such as spatial or temporal process, the eigenvalue decomposition of the covariance matrix leads to the well-known Karhunen-Loeve (KL) decomposition. However, for multiparameter processes such as a spatio-temporal process, the covariance function itself can be defined in multiple ways. Here the process is assumed to be measured at a finite set of spatial locations and a finite number of time instants. Then the spatial covariance matrix at different time instants are considered to define the covariance of the process. This set of square, symmetric, positive semi-definite matrices is then represented as a third-order tensor. A suitable decomposition of this tensor can identify the dominant components of the process, and these components are then used to define a closed-form representation of the process. The procedure is analogous to the KL decomposition for a single-parameter process, however, the decompositions and interpretations vary significantly. The tensor decompositions are successfully applied on (i) a heat conduction problem, (ii) a vibration problem, and (iii) a covariance function taken from the literature that was fitted to model a measured wind velocity data. It is observed that the proposed representation provides an efficient approximation to some processes. Furthermore, a comparison with KL decomposition showed that the proposed method is computationally cheaper than the KL, both in terms of computer memory and execution time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Polyhedral techniques for program transformation are now used in several proprietary and open source compilers. However, most of the research on polyhedral compilation has focused on imperative languages such as C, where the computation is specified in terms of statements with zero or more nested loops and other control structures around them. Graphical dataflow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and referential transparency of dataflow languages impose a different set of challenges. In this paper, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from dataflow programs and to synthesize them from their equivalent polyhedral representation. We then describe PolyGLoT, a framework for automatic transformation of dataflow programs which we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used dataflow programming languages. Results show that dataflow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30x while running on an 8-core Intel system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The 2004 earthquake left several traces of coseismic land deformation and tsunami deposits, both on the islands along the plate boundary and distant shores of the Indian Ocean rim countries. Researchers are now exploring these sites to develop a chronology of past events. Where the coastal regions are also inundated by storm surges, there is an additional challenge to discriminate between the deposits formed by these two processes. Paleo-tsunami research relies largely on finding deposits where preservation potential is high and storm surge origin can be excluded. During the past decade of our work along the Andaman and Nicobar Islands and the east coast of India, we have observed that the 2004 tsunami deposits are best preserved in lagoons, inland streams and also on elevated terraces. Chronological evidence for older events obtained from such sites is better correlated with those from Thailand, Sri Lanka and Indonesia, reiterating their usefulness in tsunami geology studies. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Visual tracking is an important task in various computer vision applications including visual surveillance, human computer interaction, event detection, video indexing and retrieval. Recent state of the art sparse representation (SR) based trackers show better robustness than many of the other existing trackers. One of the issues with these SR trackers is low execution speed. The particle filter framework is one of the major aspects responsible for slow execution, and is common to most of the existing SR trackers. In this paper,(1) we propose a robust interest point based tracker in l(1) minimization framework that runs at real-time with performance comparable to the state of the art trackers. In the proposed tracker, the target dictionary is obtained from the patches around target interest points. Next, the interest points from the candidate window of the current frame are obtained. The correspondence between target and candidate points is obtained via solving the proposed l(1) minimization problem. In order to prune the noisy matches, a robust matching criterion is proposed, where only the reliable candidate points that mutually match with target and candidate dictionary elements are considered for tracking. The object is localized by measuring the displacement of these interest points. The reliable candidate patches are used for updating the target dictionary. The performance and accuracy of the proposed tracker is benchmarked with several complex video sequences. The tracker is found to be considerably fast as compared to the reported state of the art trackers. The proposed tracker is further evaluated for various local patch sizes, number of interest points and regularization parameters. The performance of the tracker for various challenges including illumination change, occlusion, and background clutter has been quantified with a benchmark dataset containing 50 videos. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Matroidal networks were introduced by Dougherty et al. and have been well studied in the recent past. It was shown that a network has a scalar linear network coding solution if and only if it is matroidal associated with a representable matroid. A particularly interesting feature of this development is the ability to construct (scalar and vector) linearly solvable networks using certain classes of matroids. Furthermore, it was shown through the connection between network coding and matroid theory that linear network coding is not always sufficient for general network coding scenarios. The current work attempts to establish a connection between matroid theory and network-error correcting and detecting codes. In a similar vein to the theory connecting matroids and network coding, we abstract the essential aspects of linear network-error detecting codes to arrive at the definition of a matroidal error detecting network (and similarly, a matroidal error correcting network abstracting from network-error correcting codes). An acyclic network (with arbitrary sink demands) is then shown to possess a scalar linear error detecting (correcting) network code if and only if it is a matroidal error detecting (correcting) network associated with a representable matroid. Therefore, constructing such network-error correcting and detecting codes implies the construction of certain representable matroids that satisfy some special conditions, and vice versa. We then present algorithms that enable the construction of matroidal error detecting and correcting networks with a specified capability of network-error correction. Using these construction algorithms, a large class of hitherto unknown scalar linearly solvable networks with multisource, multicast, and multiple-unicast network-error correcting codes is made available for theoretical use and practical implementation, with parameters, such as number of information symbols, number of sinks, number of coding nodes, error correcting capability, and so on, being arbitrary but for computing power (for the execution of the algorithms). The complexity of the construction of these networks is shown to be comparable with the complexity of existing algorithms that design multicast scalar linear network-error correcting codes. Finally, we also show that linear network coding is not sufficient for the general network-error correction (detection) problem with arbitrary demands. In particular, for the same number of network errors, we show a network for which there is a nonlinear network-error detecting code satisfying the demands at the sinks, whereas there are no linear network-error detecting codes that do the same.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Designing and implementing thread-safe multithreaded libraries can be a daunting task as developers of these libraries need to ensure that their implementations are free from concurrency bugs, including deadlocks. The usual practice involves employing software testing and/or dynamic analysis to detect. deadlocks. Their effectiveness is dependent on well-designed multithreaded test cases. Unsurprisingly, developing multithreaded tests is significantly harder than developing sequential tests for obvious reasons. In this paper, we address the problem of automatically synthesizing multithreaded tests that can induce deadlocks. The key insight to our approach is that a subset of the properties observed when a deadlock manifests in a concurrent execution can also be observed in a single threaded execution. We design a novel, automatic, scalable and directed approach that identifies these properties and synthesizes a deadlock revealing multithreaded test. The input to our approach is the library implementation under consideration and the output is a set of deadlock revealing multithreaded tests. We have implemented our approach as part of a tool, named OMEN1. OMEN is able to synthesize multithreaded tests on many multithreaded Java libraries. Applying a dynamic deadlock detector on the execution of the synthesized tests results in the detection of a number of deadlocks, including 35 real deadlocks in classes documented as thread-safe. Moreover, our experimental results show that dynamic analysis on multithreaded tests that are either synthesized randomly or developed by third-party programmers are ineffective in detecting the deadlocks.