29 resultados para Concurrent programs
Resumo:
We have previously reported that both Ca2+ and staurosporine-sensitive protein kinase(s) are involved in the cytokinin zeatin induction of cucumber chitinase activity and its protein content (Barwe et al. 2001). To further characterize signal transduction events involved in this cytokinin induction of chitinase gene expression, Northern hybridizations of total RNAs prepared from excised, dark-grown cucumber cotyledons treated with cytokinins and/or various agonists and antagonists of signal transduction components, were carried out using a cucumber acidic chitinase (CACHT) cDNA probe (Metraux et al. 1989). CACHT mRNA increased by approximately 5- to 6-fold in response to exogenous zeatin (Z), zeatin riboside (ZR), and benzyladenine (BA) treatment, but failed to accumulate in response to kinetin (K). Among the cytokinins tested, Z was most effective. The Z-induced accumulation of CACHT mRNA was inhibited by a plasma membrane Ca2+ channel blocker verapamil. Treatment of cotyledons with exogenous CaCl2 and calcium ionophore A23187 in the presence and absence of cytokinin enhanced CACHT mRNA accumulation. These two observations suggest the participation of extracellular calcium in signaling Z-induction. Furthermore, the presence of staurosporine (an inhibitor of protein kinase) in Z treatment reduced CACHT mRNA, suggesting the involvement of phosphorylation of one or more cellular proteins. In addition, we provide evidence that the Z-induction of CACHT mRNA is blocked by protein synthesis inhibitor cycloheximide treatment. Taken together, these results suggest that Ca2+ influx from extracellular space, protein phosphorylation, and concurrent protein synthesis events participate in cytokinin signaling during Z-induced CACHT transcript accumulation.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Advertisements(Ads) are the main revenue earner for Television (TV) broadcasters. As TV reaches a large audience, it acts as the best media for advertisements of products and services. With the emergence of digital TV, it is important for the broadcasters to provide an intelligent service according to the various dimensions like program features, ad features, viewers’ interest and sponsors’ preference. We present an automatic ad recommendation algorithm that selects a set of ads by considering these dimensions and semantically match them with programs. Features of the ad video are captured interms of annotations and they are grouped into number of predefined semantic categories by using a categorization technique. Fuzzy categorical data clustering technique is applied on categorized data for selecting better suited ads for a particular program. Since the same ad can be recommended for more than one program depending upon multiple parameters, fuzzy clustering acts as the best suited method for ad recommendation. The relative fuzzy score called “degree of membership” calculated for each ad indicates the membership of a particular ad to different program clusters. Subjective evaluation of the algorithm is done by 10 different people and rated with a high success score.
Resumo:
Innate immunity recognizes and resists various pathogens; however, the mechanisms regulating pathogen versus non-pathogen discrimination are still imprecisely understood. Here, we demonstrate that pathogen-specific activation of TLR2 upon infection with Mycobacterium bovis BCG, in comparison with other pathogenic microbes, including Salmonella typhimurium and Staphylococcus aureus, programs macrophages for robust up-regulation of signaling cohorts of Wnt-beta-catenin signaling. Signaling perturbations or genetic approaches suggest that infection-mediated stimulation of Wnt-beta-catenin is vital for activation of Notch1 signaling. Interestingly, inducible NOS (iNOS) activity is pivotal for TLR2-mediated activation of Wnt-beta-catenin signaling as iNOS(-/-) mice demonstrated compromised ability to trigger activation of Wnt-beta-catenin signaling as well as Notch1-mediated cellular responses. Intriguingly, TLR2-driven integration of iNOS/NO, Wnt-beta-catenin, and Notch1 signaling contributes to its capacity to regulate the battery of genes associated with T(Reg) cell lineage commitment. These findings reveal a role for differential stimulation of TLR2 in deciding the strength of Wnt-beta-catenin signaling, which together with signals from Notch1 contributes toward the modulation of a defined set of effector functions in macrophages and thus establishes a conceptual framework for the development of novel therapeutics.
Resumo:
Dynamic Voltage and Frequency Scaling (DVFS) is a very effective tool for designing trade-offs between energy and performance. In this paper, we use a formal Petri net based program performance model that directly captures both the application and system properties, to find energy efficient DVFS settings for CMP systems, that satisfy a given performance constraint, for SPMD multithreaded programs. Experimental evaluation shows that we achieve significant energy savings, while meeting the performance constraints.
Resumo:
Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing trade-offs involving energy, power, temperature and performance of computing systems. In this paper, we evaluate three different DVFS schemes - our enhancement of a Petri net performance model based DVFS method for sequential programs to stream programs, a simple profile based Linear Scaling method, and an existing hardware based DVFS method for multithreaded applications - using multithreaded stream applications, in a full system Chip Multiprocessor (CMP) simulator. From our evaluation, we find that the software based methods achieve significant Energy/Throughput2(ET−2) improvements. The hardware based scheme degrades performance heavily and suffers ET−2 loss. Our results indicate that the simple profile based scheme achieves the benefits of the complex Petri net based scheme for stream programs, and present a strong case for the need for independent voltage/frequency control for different cores of CMPs, which is lacking in most of the state-of-the-art CMPs. This is in contrast to the conclusions of a recent evaluation of per-core DVFS schemes for multithreaded applications for CMPs.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Most stencil computations allow tile-wise concurrent start, i.e., there always exists a face of the iteration space and a set of tiling hyperplanes such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. However, existing automatic tiling frameworks often choose hyperplanes that lead to pipelined start-up and load imbalance. We address this issue with a new tiling technique that ensures concurrent start-up as well as perfect load-balance whenever possible. We first provide necessary and sufficient conditions on tiling hyperplanes to enable concurrent start for programs with affine data accesses. We then provide an approach to find such hyperplanes. Experimental evaluation on a 12-core Intel Westmere shows that our code is able to outperform a tuned domain-specific stencil code generator by 4% to 27%, and previous compiler techniques by a factor of 2x to 10.14x.
Resumo:
How the brain converts parallel representations of movement goals into sequential movements is not known. We tested the role of basal ganglia (BG) in the temporal control of movement sequences by a convergent approach involving inactivation of the BG by muscimol injections into the caudate nucleus of monkeys and assessing behavior of Parkinson's disease patients, performing a modified double-step saccade task. We tested a critical prediction of a class of competitive queuing models that explains serial behavior as the outcome of a selection of concurrently activated goals. In congruence with these models, we found that inactivation or impairment of the BG unmasked the parallel nature of goal representations such that a significantly greater extent of averaged saccades, curved saccades, and saccade sequence errors were observed. These results suggest that the BG perform a form of competitive queuing, holding the second movement plan in abeyance while the first movement is being executed, allowing the proper temporal control of movement sequences.
Resumo:
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 suite that we used in our work. Current GPUs therefore allow concurrent execution of kernels to improve utilization. In this work, we study concurrent execution of GPU kernels using multiprogram workloads on current NVIDIA Fermi GPUs. On two-program workloads from the Parboil2 benchmark suite we find concurrent execution is often no better than serialized execution. We identify that the lack of control over resource allocation to kernels is a major serialization bottleneck. We propose transformations that convert CUDA kernels into elastic kernels which permit fine-grained control over their resource usage. We then propose several elastic-kernel aware concurrency policies that offer significantly better performance and concurrency compared to the current CUDA policy. We evaluate our proposals on real hardware using multiprogrammed workloads constructed from benchmarks in the Parboil 2 suite. On average, our proposals increase system throughput (STP) by 1.21x and improve the average normalized turnaround time (ANTT) by 3.73x for two-program workloads when compared to the current CUDA concurrency implementation.
Resumo:
With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.
Resumo:
Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.
Resumo:
We propose a new approach for producing precise constrained slices of programs in a language such as C. We build upon a previous approach for this problem, which is based on term-rewriting, which primarily targets loop-free fragments and is fully precise in this setting. We incorporate abstract interpretation into term-rewriting, using a given arbitrary abstract lattice, resulting in a novel technique for slicing loops whose precision is linked to the power of the given abstract lattice. We address pointers in a first-class manner, including when they are used within loops to traverse and update recursive data structures. Finally, we illustrate the comparative precision of our slices over those of previous approaches using representative examples.
Resumo:
The concurrent planning of sequential saccades offers a simple model to study the nature of visuomotor transformations since the second saccade vector needs to be remapped to foveate the second target following the first saccade. Remapping is thought to occur through egocentric mechanisms involving an efference copy of the first saccade that is available around the time of its onset. In contrast, an exocentric representation of the second target relative to the first target, if available, can be used to directly code the second saccade vector. While human volunteers performed a modified double-step task, we examined the role of exocentric encoding in concurrent saccade planning by shifting the first target location well before the efference copy could be used by the oculomotor system. The impact of the first target shift on concurrent processing was tested by examining the end-points of second saccades following a shift of the second target during the first saccade. The frequency of second saccades to the old versus new location of the second target, as well as the propagation of first saccade localization errors, both indices of concurrent processing, were found to be significantly reduced in trials with the first target shift compared to those without it. A similar decrease in concurrent processing was obtained when we shifted the first target but kept constant the second saccade vector. Overall, these results suggest that the brain can use relatively stable visual landmarks, independent of efference copy-based egocentric mechanisms, for concurrent planning of sequential saccades.