965 resultados para Survey Programs.
Resumo:
Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing trade-offs involving energy, power, temperature and performance of computing systems. In this paper, we evaluate three different DVFS schemes - our enhancement of a Petri net performance model based DVFS method for sequential programs to stream programs, a simple profile based Linear Scaling method, and an existing hardware based DVFS method for multithreaded applications - using multithreaded stream applications, in a full system Chip Multiprocessor (CMP) simulator. From our evaluation, we find that the software based methods achieve significant Energy/Throughput2(ET−2) improvements. The hardware based scheme degrades performance heavily and suffers ET−2 loss. Our results indicate that the simple profile based scheme achieves the benefits of the complex Petri net based scheme for stream programs, and present a strong case for the need for independent voltage/frequency control for different cores of CMPs, which is lacking in most of the state-of-the-art CMPs. This is in contrast to the conclusions of a recent evaluation of per-core DVFS schemes for multithreaded applications for CMPs.
Resumo:
Memory models for shared-memory concurrent programming languages typically guarantee sequential consistency (SC) semantics for datarace-free (DRF) programs, while providing very weak or no guarantees for non-DRF programs. In effect programmers are expected to write only DRF programs, which are then executed with SC semantics. With this in mind, we propose a novel scalable solution for dataflow analysis of concurrent programs, which is proved to be sound for DRF programs with SC semantics. We use the synchronization structure of the program to propagate dataflow information among threads without requiring to consider all interleavings explicitly. Given a dataflow analysis that is sound for sequential programs and meets certain criteria, our technique automatically converts it to an analysis for concurrent programs.
Resumo:
There are many applications such as software for processing customer records in telecom, patient records in hospitals, email processing software accessing a single email in a mailbox etc. which require to access a single record in a database consisting of millions of records. A basic feature of these applications is that they need to access data sets which are very large but simple. Cloud computing provides computing requirements for these kinds of new generation of applications involving very large data sets which cannot possibly be handled efficiently using traditional computing infrastructure. In this paper, we describe storage services provided by three well-known cloud service providers and give a comparison of their features with a view to characterize storage requirements of very large data sets as examples and we hope that it would act as a catalyst for the design of storage services for very large data set requirements in future. We also give a brief overview of other kinds of storage that have come up in the recent past for cloud computing.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.
Resumo:
Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.
Resumo:
We present a study of the environments of extended radio sources in the Australia Telescope Low-Brightness Survey (ATLBS). The radio sources were selected from the ATLBS Extended Source Sample, which is a well defined sample containing the most extended of radio sources in the ATLBS sky survey regions. The environments were analysed using 4-m Cerro-Tololo Inter-American Observatory Blanco telescope observations carried out for ATLBS fields in the Sloan Digital Sky Survey r(') band. We have estimated the properties of the environments using smoothed density maps derived from galaxy catalogues constructed using these optical imaging data. The angular distribution of galaxy density relative to the axes of the radio sources has been quantified by defining anisotropy parameters that are estimated using a new method presented here. Examining the anisotropy parameters for a subsample of extended double radio sources that includes all sources with pronounced asymmetry in lobe extents, we find good evidence for environmental anisotropy being the dominant cause for lobe asymmetry in that higher galaxy density occurs almost always on the side of the shorter lobe, and this validates the usefulness of the method proposed and adopted here. The environmental anisotropy parameters have been used to examine and compare the environments of Fanaroff-Riley Class I (FRI) and Fanaroff-Riley Class II (FRII) radio sources in two redshift regimes (z < 0.5 and z > 0.5). Wide-angle tail sources and head-tail sources lie in the most overdense environments. The head-tail source environments (for the HT sources in our sample) display dipolar anisotropy in that higher galaxy density appears to lie in the direction of the tails. Excluding the head-tail and wide-angle tail sources, subsamples of FRI and FRII sources from the ATLBS appear to lie in similar moderately overdense environments, with no evidence for redshift evolution in the regimes studied herein.
Resumo:
Culturally protected forest patches or sacred groves have been the integral part of many traditional societies. This age old tradition is a classic instance of community driven nature conservation sheltering native biodiversity and supporting various ecosystem functions particularly hydrology. The current work in Central Western Ghats of Karnataka, India, highlights that even small sacred groves amidst humanised landscapes serve as tiny islands of biodiversity, especially of rare and endemic species. Temporal analysis of landuse dynamics reveals the changing pattern of the studied landscape. There is fast reduction of forest cover (15.14-11.02 %) in last 20 years to meet up the demand of agricultural land and plantation programs. A thorough survey and assessment of woody endemic species distribution in the 25 km(2) study area documented presence of 19 endemic species. The distribution of these species is highly skewed towards the culturally protected patches in comparison to other land use elements. It is found that, among the 19 woody endemic species, those with greater ecological amplitude are widely distributed in the studied landscape in groves as well as other land use forms whereas, natural population of the sensitive endemics are very much restricted in the sacred grove fragments. The recent degradation in the sacred grove system is perhaps, due to weakening of traditional belief systems and associated laxity in grove protection leading to biotic disturbances. Revitalisation of traditional practices related to conservation of sacred groves can go a long way in strengthening natural ecological systems of fragile humid tropical landscape.
Resumo:
With the renewed interest in vector-like fermion extensions of the Standard Model, we present here a study of multiple vector-like theories and their phenomenological implications. Our focus is mostly on minimal flavor conserving theories that couple the vector-like fermions to the SM gauge fields and mix only weakly with SM fermions so as to avoid flavor problems. We present calculations for precision electroweak and vector-like state decays, which are needed to investigate compatibility with currently known data. We investigate the impact of vector-like fermions on Higgs boson production and decay, including loop contributions, in a wide variety of vector-like extensions and their parameter spaces.
Resumo:
We propose a new approach for producing precise constrained slices of programs in a language such as C. We build upon a previous approach for this problem, which is based on term-rewriting, which primarily targets loop-free fragments and is fully precise in this setting. We incorporate abstract interpretation into term-rewriting, using a given arbitrary abstract lattice, resulting in a novel technique for slicing loops whose precision is linked to the power of the given abstract lattice. We address pointers in a first-class manner, including when they are used within loops to traverse and update recursive data structures. Finally, we illustrate the comparative precision of our slices over those of previous approaches using representative examples.
Resumo:
Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for intertask synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallelmodels in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and, as a side effect, reduced power consumption. In this article, we develop a systematic approach to compile loop nests into concurrent, dynamically constructed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extraction of interband parallelism (cross-barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code generation alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at runtime. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative dataflow task-parallel language with explicit intertask dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.
Resumo:
Clock synchronization in wireless sensor networks (WSNs) assures that sensor nodes have the same reference clock time. This is necessary not only for various WSN applications but also for many system level protocols for WSNs such as MAC protocols, and protocols for sleep scheduling of sensor nodes. Clock value of a node at a particular instant of time depends on its initial value and the frequency of the crystal oscillator used in the sensor node. The frequency of the crystal oscillator varies from node to node, and may also change over time depending upon many factors like temperature, humidity, etc. As a result, clock values of different sensor nodes diverge from each other and also from the real time clock, and hence, there is a requirement for clock synchronization in WSNs. Consequently, many clock synchronization protocols for WSNs have been proposed in the recent past. These protocols differ from each other considerably, and so, there is a need to understand them using a common platform. Towards this goal, this survey paper categorizes the features of clock synchronization protocols for WSNs into three types, viz, structural features, technical features, and global objective features. Each of these categories has different options to further segregate the features for better understanding. The features of clock synchronization protocols that have been used in this survey include all the features which have been used in existing surveys as well as new features such as how the clock value is propagated, when the clock value is propagated, and when the physical clock is updated, which are required for better understanding of the clock synchronization protocols in WSNs in a systematic way. This paper also gives a brief description of a few basic clock synchronization protocols for WSNs, and shows how these protocols fit into the above classification criteria. In addition, the recent clock synchronization protocols for WSNs, which are based on the above basic clock synchronization protocols, are also given alongside the corresponding basic clock synchronization protocols. Indeed, the proposed model for characterizing the clock synchronization protocols in WSNs can be used not only for analyzing the existing protocols but also for designing new clock synchronization protocols. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
This paper lists some references that could in some way be relevant in the context of the real-time computational simulation of biological organs, the research area being defined in a very broad sense. This paper contains 198 references.
Resumo:
Among the various types of a-peptide folding motifs, delta-turn, which requires a central cis-amide disposition, has been one of the least extensively investigated. In particular, this main-chain reversal topology has been studied in-depth neither in linear/cyclic peptides nor in proteins. This Minireview article assembles and critically analyzes relevant data from a literature survey on the d-turn conformation in those compounds. Unpublished results from recent conformational energy calculations and a preliminary solution-state analysis on a small model peptide, currently ongoing in our laboratories, are also briefly outlined.
Resumo:
Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy-even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.