57 resultados para Improvement programs
em Indian Institute of Science - Bangalore - Índia
Resumo:
The StreamIt programming model has been proposed to exploit parallelism in streaming applications oil general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as Graphics Processing Units (GPUs) or CellBE which support abundant parallelism in hardware. In this paper, we describe a novel method to orchestrate the execution of if StreamIt program oil a multicore platform equipped with an accelerator. The proposed approach identifies, using profiling, the relative benefits of executing a task oil the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers and the required buffer layout transformations associated with the partitioning, as all integrated Integer Linear Program (ILP) which can then be solved by an ILP solver. We also propose an efficient heuristic algorithm for the work-partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solution on an average across the benchmark Suite. The partitioned tasks are then software pipelined to execute oil the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with 8 CPU cores and a GeForce 8800 GTS 512 GPU show a geometric mean speedup of 6.94X with it maximum of 51.96X over it single threaded CPU execution across the StreamIt benchmarks. This is a 18.9% improvement over it partitioning strategy that maps only the filters that cannot be executed oil the GPU - the filters with state that is persistent across firings - onto the CPU.
Resumo:
With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.
Resumo:
The research in software science has so far been concentrated on three measures of program complexity: (a) software effort; (b) cyclomatic complexity; and (c) program knots. In this paper we propose a measure of the logical complexity of programs in terms of the variable dependency of sequence of computations, inductive effort in writing loops and complexity of data structures. The proposed complexity mensure is described with the aid of a graph which exhibits diagrammatically the dependence of a computation at a node upon the computation of other (earlier) nodes. Complexity measures of several example programs have been computed and the related issues have been discussed. The paper also describes the role played by data structures in deciding the program complexity.
Resumo:
Brooks' Theorem says that if for a graph G,Δ(G)=n, then G is n-colourable, unless (1) n=2 and G has an odd cycle as a component, or (2) n>2 and Kn+1 is a component of G. In this paper we prove that if a graph G has none of some three graphs (K1,3;K5−e and H) as an induced subgraph and if Δ(G)greater-or-equal, slanted6 and d(G)<Δ(G), then χ(G)<Δ(G). Also we give examples to show that the hypothesis Δ(G)greater-or-equal, slanted6 can not be non-trivially relaxed and the graph K5−e can not be removed from the hypothesis. Moreover, for a graph G with none of K1,3;K5−e and H as an induced subgraph, we verify Borodin and Kostochka's conjecture that if for a graph G,Δ(G)greater-or-equal, slanted9 and d(G)<Δ(G), then χ(G)<Δ(G).
Resumo:
The spreadability of SAE-30 oil on Al-12 Si base (LM-13) alloy containing dispersed graphite particles about 50 μm average size in its matrix is found to be greater than on either LM-13 with no graphite or brass. It is also found that the spreadability on LM-13 base alloys increase with increasing volume of graphite dispersion in the matrix of these alloys. Further increases in the spreadability of oil on machined LM-13-graphite particle composite test surfaces occur if these are rubbed initially against control discs of either LM-13 or grey cast iron. The formation of a triboinduced graphite-rich layer, confirmed by esca, appears to be responsible for the improved oil spreadability on the rubbed test surfaces of LM-13 base alloys as compared to the as-machined test surfaces prior to rubbing. The triboinduced layer of graphite is apparently responsible for the observed reduction in the friction, wear and seizing tendency of triboelements made from aluminium alloy-graphite particle composites.
Resumo:
Various intrusion detection systems (IDSs) reported in the literature have shown distinct preferences for detecting a certain class of attack with improved accuracy, while performing moderately on the other classes. In view of the enormous computing power available in the present-day processors, deploying multiple IDSs in the same network to obtain best-of-breed solutions has been attempted earlier. The paper presented here addresses the problem of optimizing the performance of IDSs using sensor fusion with multiple sensors. The trade-off between the detection rate and false alarms with multiple sensors is highlighted. It is illustrated that the performance of the detector is better when the fusion threshold is determined according to the Chebyshev inequality. In the proposed data-dependent decision ( DD) fusion method, the performance optimization of ndividual IDSs is first addressed. A neural network supervised learner has been designed to determine the weights of individual IDSs depending on their reliability in detecting a certain attack. The final stage of this DD fusion architecture is a sensor fusion unit which does the weighted aggregation in order to make an appropriate decision. This paper theoretically models the fusion of IDSs for the purpose of demonstrating the improvement in performance, supplemented with the empirical evaluation.
Resumo:
Technological forecasting, defined as quantified probabilistic prediction of timings and degree of change in the technological parameters, capabilities desirability or needs at different times in the future, is applied to birth control technology (BCT) as a means of revealing the paths of most promising research through identifying the necessary points for breakthroughs. The present status of BCT in the areas of pills and the IUD, male contraceptives, immumological approaches, post-coital pills, abortion, sterilization, luteolytic agents, laser technologies, and control of the sex of the child, are each summarized and evaluated in turn. Fine mapping is done to identify the most potentially promising areas of BCT. These include efforts to make oral contraception easier, improvement of the design of the IUD, clinical evaluation of the male contraceptive danazol, the effecting of biochemical changes in the seminal fluid, and researching of immunological approaches and the effects of other new drugs such as prostaglandins. The areas that require immediate and large research inputs are oral contraception and the IUD. On the basis of population and technological forecasts, it is deduced that research efforts could most effectively aid countries like India through the immediate production of an oral contraceptive pill or IUD with long-lasting effects. Development of a pill for males or an immunization against pre gnancy would also have a significant impact. However, the major impediment to birth control programs to date is attitudes, which must be changed through education.
Resumo:
It is well known that the use of a series of resistors, connected between the equipotential rings of a Van de Graaff generator, improves the axial voltage grading of the generator. The work reported in this paper shows how the resistor chain also improves the radial voltage gradient. The electrolytic field mapping technique was adopted in the present work.
Resumo:
In this paper an attempt is made to study accurately, the field distribution for various types of porcelain/ceramic insulators used forhigh voltage transmission. The surface charge Simulation method is employed for the field computation. Novel field reduction electrodes are developed to reduce the maximum field around the pin region. In order to experimentally scrutinize the performance of discs with field reduction electrodes, special artificial pollution test facility was built and utilized. The experimental results show better improvement in the pollution flashover performance of string insulators.
Resumo:
The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern Graphics Processing Units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient Integer Linear Program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipelin parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.
Resumo:
Pt2+ ion dispersed in CeO2, Ce1-xTixO2-delta and TiO2 have been tested for preferential oxidation of carbon monoxide (PROX) in hydrogen rich stream. It is found that Pt2+ substituted CeO2 and Ce(1-x)TixO(2-delta) in the form of solid solution Ce0.98Pt0.02O2-delta and Ce0.83Ti0.15Pt0.02O2-delta are highly CO selective low temperature PROX catalysts in hydrogen rich stream. Just 15% of Ti substitution in CeO2 improves the overall PROX activity.
Resumo:
Modifications made in a solar air collector inlet duct to achieve uniform velocity of air in the absorber duct are described. Measurements of temperature and pressure at various points in the duct gave information on the distribution of air in the absorber duct. A thermal performance test conducted on the collector with a vaned diffuser showed some significant improvement compared with a diffuser without vanes.
Resumo:
We study the problem of finding a set of constraints of minimum cardinality which when relaxed in an infeasible linear program, make it feasible. We show the problem is NP-hard even when the constraint matrix is totally unimodular and prove polynomial-time solvability when the constraint matrix and the right-hand-side together form a totally unimodular matrix.