25 resultados para benefits programs

em Indian Institute of Science - Bangalore - Índia


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications oil general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of if StreamIt program oil a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task oil the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as all integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. We also propose an efficient heuristic algorithm for the work-partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark Suite. The partitioned tasks are then software pipelined to execute oil the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.94X with it maximum of 51.96X over it single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over it partitioning strategy that maps only the filters that cannot be executed oil the GPU - the filters with state that is persistent across firings - onto the CPU.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing trade-offs involving energy, power, temperature and performance of computing systems. In this paper, we evaluate three different DVFS schemes - our enhancement of a Petri net performance model based DVFS method for sequential programs to stream programs, a simple profile based Linear Scaling method, and an existing hardware based DVFS method for multithreaded applications - using multithreaded stream applications, in a full system Chip Multiprocessor (CMP) simulator. From our evaluation, we find that the software based methods achieve significant Energy/Throughput2(ET−2) improvements. The hardware based scheme degrades performance heavily and suffers ET−2 loss. Our results indicate that the simple profile based scheme achieves the benefits of the complex Petri net based scheme for stream programs, and present a strong case for the need for independent voltage/frequency control for different cores of CMPs, which is lacking in most of the state-of-the-art CMPs. This is in contrast to the conclusions of a recent evaluation of per-core DVFS schemes for multithreaded applications for CMPs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The benefits that accrue from the use of design database include (i) reduced costs of preparing data for application programs and of producing the final specification, and (ii) possibility of later usage of data stored in the database for other applications related to Computer Aided Engineering (CAE). An INTEractive Relational GRAphics Database (INTERGRAD) based on relational models has been developed to create, store, retrieve and update the data related to two dimensional drawings. INTERGRAD provides two languages, Picture Definition Language (PDL) and Picture Manipulation Language (PML). The software package has been implemented on a PDP 11/35 system under the RSX-11M version 3.1 operating system and uses the graphics facility consisting of a VT-11 graphics terminal, the DECgraphic 11 software and an input device, a lightpen.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The research in software science has so far been concentrated on three measures of program complexity: (a) software effort; (b) cyclomatic complexity; and (c) program knots. In this paper we propose a measure of the logical complexity of programs in terms of the variable dependency of sequence of computations, inductive effort in writing loops and complexity of data structures. The proposed complexity mensure is described with the aid of a graph which exhibits diagrammatically the dependence of a computation at a node upon the computation of other (earlier) nodes. Complexity measures of several example programs have been computed and the related issues have been discussed. The paper also describes the role played by data structures in deciding the program complexity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study the problem of finding a set of constraints of minimum cardinality which when relaxed in an infeasible linear program, make it feasible. We show the problem is NP-hard even when the constraint matrix is totally unimodular and prove polynomial-time solvability when the constraint matrix and the right-hand-side together form a totally unimodular matrix.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to large scale afforestation programs and forest conservation legislations, India's total forest area seems to have stabilized or even increased. In spite of such efforts, forest fragmentation and degradation continues, with forests being subject to increased pressure due to anthropogenic factors. Such fragmentation and degradation is leading to the forest cover to change from very dense to moderately dense and open forest and 253 km(2) of very dense forest has been converted to moderately dense forest, open forest, scrub and non-forest (during 2005-2007). Similarly, there has been a degradation of 4,120 km(2) of moderately dense forest to open forest, scrub and non-forest resulting in a net loss of 936 km(2) of moderately dense forest. Additionally, 4,335 km(2) of open forest have degraded to scrub and non-forest. Coupled with pressure due to anthropogenic factors, climate change is likely to be an added stress on forests. Forest sector programs and policies are major factors that determine the status of forests and potentially resilience to projected impacts of climate change. An attempt is made to review the forest policies and programs and their implications for the status of forests and for vulnerability of forests to projected climate change. The study concludes that forest conservation and development policies and programs need to be oriented to incorporate climate change impacts, vulnerability and adaptation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Intracellular pathogen sensor, NOD2, has been implicated in regulation of wide range of anti-inflammatory responses critical during development of a diverse array of inflammatory diseases; however, underlying molecular details are still imprecisely understood. In this study, we demonstrate that NOD2 programs macrophages to trigger Notch1 signaling. Signaling perturbations or genetic approaches suggest signaling integration through cross-talk between Notch1-PI3K during the NOD2-triggered expression of a multitude of immunological parameters including COX-2/PGE(2) and IL-10. NOD2 stimulation enhanced active recruitment of CSL/RBP-Jk on the COX-2 promoter in vivo. Intriguingly, nitric oxide assumes critical importance in NOD2-mediated activation of Notch1 signaling as iNOS(-/-) macrophages exhibited compromised ability to execute NOD2-triggered Notch1 signaling responses. Correlative evidence demonstrates that this mechanism operates in vivo in brain and splenocytes derived from wild type, but not from iNOS(-/-) mice. Importantly, NOD2-driven activation of the Notch1-PI3K signaling axis contributes to its capacity to impart survival of macrophages against TNF-alpha or IFN-gamma-mediated apoptosis and resolution of inflammation. Current investigation identifies Notch1-PI3K as signaling cohorts involved in the NOD2-triggered expression of a battery of genes associated with anti-inflammatory functions. These findings serve as a paradigm to understand the pathogenesis of NOD2-associated inflammatory diseases and clearly pave a way toward development of novel therapeutics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advertisements(Ads) are the main revenue earner for Television (TV) broadcasters. As TV reaches a large audience, it acts as the best media for advertisements of products and services. With the emergence of digital TV, it is important for the broadcasters to provide an intelligent service according to the various dimensions like program features, ad features, viewers’ interest and sponsors’ preference. We present an automatic ad recommendation algorithm that selects a set of ads by considering these dimensions and semantically match them with programs. Features of the ad video are captured interms of annotations and they are grouped into number of predefined semantic categories by using a categorization technique. Fuzzy categorical data clustering technique is applied on categorized data for selecting better suited ads for a particular program. Since the same ad can be recommended for more than one program depending upon multiple parameters, fuzzy clustering acts as the best suited method for ad recommendation. The relative fuzzy score called “degree of membership” calculated for each ad indicates the membership of a particular ad to different program clusters. Subjective evaluation of the algorithm is done by 10 different people and rated with a high success score.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Miniaturization of devices and the ensuing decrease in the threshold voltage has led to a substantial increase in the leakage component of the total processor energy consumption. Relatively simpler issue logic and the presence of a large number of function units in the VLIW and the clustered VLIW architectures attribute a large fraction of this leakage energy consumption in the functional units. However, functional units are not fully utilized in the VLIW architectures because of the inherent variations in the ILP of the programs. This underutilization is even more pronounced in the context of clustered VLIW architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cycles.In the past, some architectural schemes have been proposed to obtain leakage energy bene .ts by aggressively exploiting the idleness of functional units. However, presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice-versa and adversely a ffects the energy benefits of a purely hardware based scheme. In this paper, we propose and evaluate a compiler instruction scheduling algorithm that assist such a hardware based scheme in the context of VLIW and clustered VLIW architectures. The proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for a longer duration. The proposed compiler-assisted scheme obtains a further 12% reduction of energy consumption of functional units with negligible performance degradation over a hardware-only scheme for a VLIW architecture. The benefits are 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively. Our test bed uses the Trimaran compiler infrastructure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Innate immunity recognizes and resists various pathogens; however, the mechanisms regulating pathogen versus non-pathogen discrimination are still imprecisely understood. Here, we demonstrate that pathogen-specific activation of TLR2 upon infection with Mycobacterium bovis BCG, in comparison with other pathogenic microbes, including Salmonella typhimurium and Staphylococcus aureus, programs macrophages for robust up-regulation of signaling cohorts of Wnt-beta-catenin signaling. Signaling perturbations or genetic approaches suggest that infection-mediated stimulation of Wnt-beta-catenin is vital for activation of Notch1 signaling. Interestingly, inducible NOS (iNOS) activity is pivotal for TLR2-mediated activation of Wnt-beta-catenin signaling as iNOS(-/-) mice demonstrated compromised ability to trigger activation of Wnt-beta-catenin signaling as well as Notch1-mediated cellular responses. Intriguingly, TLR2-driven integration of iNOS/NO, Wnt-beta-catenin, and Notch1 signaling contributes to its capacity to regulate the battery of genes associated with T(Reg) cell lineage commitment. These findings reveal a role for differential stimulation of TLR2 in deciding the strength of Wnt-beta-catenin signaling, which together with signals from Notch1 contributes toward the modulation of a defined set of effector functions in macrophages and thus establishes a conceptual framework for the development of novel therapeutics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The demand for high power density lithium-ion batteries (LIBs) for diverse applications ranging from mobile electronics to electric vehicles have resulted in an upsurge in the development of nanostructured electrode materials worldwide. Graphite has been the anode of choice in commercial LiBs. Due to several detrimental electrochemical and environmental issues, efforts are now on to develop alternative non-carbonaceous anodes which are safe, nontoxic and cost effective and at the same time exhibit high lithium storage capacity and rate capability. Titania (TiO2) and tin (Sn) based systems have gained much attention as alternative anode materials. Nanostructuring of TiO2 and SnO2 have resulted in enhancement of structural stability and electrochemical performances. Additionally, electronic wiring of mesoporous materials using carbon also effectively enhanced electronic conductivity of mesoporous electrode materials. We discuss in this article the beneficial influence of structural spacers and electronic wiring in anatase titania (TiO2) and tin dioxide (SnO2).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dynamic Voltage and Frequency Scaling (DVFS) is a very effective tool for designing trade-offs between energy and performance. In this paper, we use a formal Petri net based program performance model that directly captures both the application and system properties, to find energy efficient DVFS settings for CMP systems, that satisfy a given performance constraint, for SPMD multithreaded programs. Experimental evaluation shows that we achieve significant energy savings, while meeting the performance constraints.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Memory models for shared-memory concurrent programming languages typically guarantee sequential consistency (SC) semantics for datarace-free (DRF) programs, while providing very weak or no guarantees for non-DRF programs. In effect programmers are expected to write only DRF programs, which are then executed with SC semantics. With this in mind, we propose a novel scalable solution for dataflow analysis of concurrent programs, which is proved to be sound for DRF programs with SC semantics. We use the synchronization structure of the program to propagate dataflow information among threads without requiring to consider all interleavings explicitly. Given a dataflow analysis that is sound for sequential programs and meets certain criteria, our technique automatically converts it to an analysis for concurrent programs.