17 resultados para Concert programs.
em Indian Institute of Science - Bangalore - Índia
Resumo:
The research in software science has so far been concentrated on three measures of program complexity: (a) software effort; (b) cyclomatic complexity; and (c) program knots. In this paper we propose a measure of the logical complexity of programs in terms of the variable dependency of sequence of computations, inductive effort in writing loops and complexity of data structures. The proposed complexity mensure is described with the aid of a graph which exhibits diagrammatically the dependence of a computation at a node upon the computation of other (earlier) nodes. Complexity measures of several example programs have been computed and the related issues have been discussed. The paper also describes the role played by data structures in deciding the program complexity.
Resumo:
The StreamIt programming model has been proposed to exploit parallelism in streaming applications oil general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of if StreamIt program oil a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task oil the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as all integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. We also propose an efficient heuristic algorithm for the work-partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark Suite. The partitioned tasks are then software pipelined to execute oil the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.94X with it maximum of 51.96X over it single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over it partitioning strategy that maps only the filters that cannot be executed oil the GPU - the filters with state that is persistent across firings - onto the CPU.
Resumo:
The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.
Resumo:
We study the problem of finding a set of constraints of minimum cardinality which when relaxed in an infeasible linear program, make it feasible. We show the problem is NP-hard even when the constraint matrix is totally unimodular and prove polynomial-time solvability when the constraint matrix and the right-hand-side together form a totally unimodular matrix.
Resumo:
Due to large scale afforestation programs and forest conservation legislations, India's total forest area seems to have stabilized or even increased. In spite of such efforts, forest fragmentation and degradation continues, with forests being subject to increased pressure due to anthropogenic factors. Such fragmentation and degradation is leading to the forest cover to change from very dense to moderately dense and open forest and 253 km(2) of very dense forest has been converted to moderately dense forest, open forest, scrub and non-forest (during 2005-2007). Similarly, there has been a degradation of 4,120 km(2) of moderately dense forest to open forest, scrub and non-forest resulting in a net loss of 936 km(2) of moderately dense forest. Additionally, 4,335 km(2) of open forest have degraded to scrub and non-forest. Coupled with pressure due to anthropogenic factors, climate change is likely to be an added stress on forests. Forest sector programs and policies are major factors that determine the status of forests and potentially resilience to projected impacts of climate change. An attempt is made to review the forest policies and programs and their implications for the status of forests and for vulnerability of forests to projected climate change. The study concludes that forest conservation and development policies and programs need to be oriented to incorporate climate change impacts, vulnerability and adaptation.
Resumo:
Intracellular pathogen sensor, NOD2, has been implicated in regulation of wide range of anti-inflammatory responses critical during development of a diverse array of inflammatory diseases; however, underlying molecular details are still imprecisely understood. In this study, we demonstrate that NOD2 programs macrophages to trigger Notch1 signaling. Signaling perturbations or genetic approaches suggest signaling integration through cross-talk between Notch1-PI3K during the NOD2-triggered expression of a multitude of immunological parameters including COX-2/PGE(2) and IL-10. NOD2 stimulation enhanced active recruitment of CSL/RBP-Jk on the COX-2 promoter in vivo. Intriguingly, nitric oxide assumes critical importance in NOD2-mediated activation of Notch1 signaling as iNOS(-/-) macrophages exhibited compromised ability to execute NOD2-triggered Notch1 signaling responses. Correlative evidence demonstrates that this mechanism operates in vivo in brain and splenocytes derived from wild type, but not from iNOS(-/-) mice. Importantly, NOD2-driven activation of the Notch1-PI3K signaling axis contributes to its capacity to impart survival of macrophages against TNF-alpha or IFN-gamma-mediated apoptosis and resolution of inflammation. Current investigation identifies Notch1-PI3K as signaling cohorts involved in the NOD2-triggered expression of a battery of genes associated with anti-inflammatory functions. These findings serve as a paradigm to understand the pathogenesis of NOD2-associated inflammatory diseases and clearly pave a way toward development of novel therapeutics.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Advertisements(Ads) are the main revenue earner for Television (TV) broadcasters. As TV reaches a large audience, it acts as the best media for advertisements of products and services. With the emergence of digital TV, it is important for the broadcasters to provide an intelligent service according to the various dimensions like program features, ad features, viewers’ interest and sponsors’ preference. We present an automatic ad recommendation algorithm that selects a set of ads by considering these dimensions and semantically match them with programs. Features of the ad video are captured interms of annotations and they are grouped into number of predefined semantic categories by using a categorization technique. Fuzzy categorical data clustering technique is applied on categorized data for selecting better suited ads for a particular program. Since the same ad can be recommended for more than one program depending upon multiple parameters, fuzzy clustering acts as the best suited method for ad recommendation. The relative fuzzy score called “degree of membership” calculated for each ad indicates the membership of a particular ad to different program clusters. Subjective evaluation of the algorithm is done by 10 different people and rated with a high success score.
Resumo:
Innate immunity recognizes and resists various pathogens; however, the mechanisms regulating pathogen versus non-pathogen discrimination are still imprecisely understood. Here, we demonstrate that pathogen-specific activation of TLR2 upon infection with Mycobacterium bovis BCG, in comparison with other pathogenic microbes, including Salmonella typhimurium and Staphylococcus aureus, programs macrophages for robust up-regulation of signaling cohorts of Wnt-beta-catenin signaling. Signaling perturbations or genetic approaches suggest that infection-mediated stimulation of Wnt-beta-catenin is vital for activation of Notch1 signaling. Interestingly, inducible NOS (iNOS) activity is pivotal for TLR2-mediated activation of Wnt-beta-catenin signaling as iNOS(-/-) mice demonstrated compromised ability to trigger activation of Wnt-beta-catenin signaling as well as Notch1-mediated cellular responses. Intriguingly, TLR2-driven integration of iNOS/NO, Wnt-beta-catenin, and Notch1 signaling contributes to its capacity to regulate the battery of genes associated with T(Reg) cell lineage commitment. These findings reveal a role for differential stimulation of TLR2 in deciding the strength of Wnt-beta-catenin signaling, which together with signals from Notch1 contributes toward the modulation of a defined set of effector functions in macrophages and thus establishes a conceptual framework for the development of novel therapeutics.
Resumo:
Dynamic Voltage and Frequency Scaling (DVFS) is a very effective tool for designing trade-offs between energy and performance. In this paper, we use a formal Petri net based program performance model that directly captures both the application and system properties, to find energy efficient DVFS settings for CMP systems, that satisfy a given performance constraint, for SPMD multithreaded programs. Experimental evaluation shows that we achieve significant energy savings, while meeting the performance constraints.
Resumo:
Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing trade-offs involving energy, power, temperature and performance of computing systems. In this paper, we evaluate three different DVFS schemes - our enhancement of a Petri net performance model based DVFS method for sequential programs to stream programs, a simple profile based Linear Scaling method, and an existing hardware based DVFS method for multithreaded applications - using multithreaded stream applications, in a full system Chip Multiprocessor (CMP) simulator. From our evaluation, we find that the software based methods achieve significant Energy/Throughput2(ET−2) improvements. The hardware based scheme degrades performance heavily and suffers ET−2 loss. Our results indicate that the simple profile based scheme achieves the benefits of the complex Petri net based scheme for stream programs, and present a strong case for the need for independent voltage/frequency control for different cores of CMPs, which is lacking in most of the state-of-the-art CMPs. This is in contrast to the conclusions of a recent evaluation of per-core DVFS schemes for multithreaded applications for CMPs.
Resumo:
Memory models for shared-memory concurrent programming languages typically guarantee sequential consistency (SC) semantics for datarace-free (DRF) programs, while providing very weak or no guarantees for non-DRF programs. In effect programmers are expected to write only DRF programs, which are then executed with SC semantics. With this in mind, we propose a novel scalable solution for dataflow analysis of concurrent programs, which is proved to be sound for DRF programs with SC semantics. We use the synchronization structure of the program to propagate dataflow information among threads without requiring to consider all interleavings explicitly. Given a dataflow analysis that is sound for sequential programs and meets certain criteria, our technique automatically converts it to an analysis for concurrent programs.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.
Resumo:
Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.