893 resultados para running reward
Resumo:
The ability to perform strong updates is the main contributor to the precision of flow-sensitive pointer analysis algorithms. Traditional flow-sensitive pointer analyses cannot strongly update pointers residing in the heap. This is a severe restriction for Java programs. In this paper, we propose a new flow-sensitive pointer analysis algorithm for Java that can perform strong updates on heap-based pointers effectively. Instead of points-to graphs, we represent our points-to information as maps from access paths to sets of abstract objects. We have implemented our analysis and run it on several large Java benchmarks. The results show considerable improvement in precision over the points-to graph based flow-insensitive and flow-sensitive analyses, with reasonable running time.
Resumo:
Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
Resumo:
This paper addresses the problem of finding optimal power control policies for wireless energy harvesting sensor (EHS) nodes with automatic repeat request (ARQ)-based packet transmissions. The EHS harvests energy from the environment according to a Bernoulli process; and it is required to operate within the constraint of energy neutrality. The EHS obtains partial channel state information (CSI) at the transmitter through the link-layer ARQ protocol, via the ACK/NACK feedback messages, and uses it to adapt the transmission power for the packet (re)transmission attempts. The underlying wireless fading channel is modeled as a finite state Markov chain with known transition probabilities. Thus, the goal of the power management policy is to determine the best power setting for the current packet transmission attempt, so as to maximize a long-run expected reward such as the expected outage probability. The problem is addressed in a decision-theoretic framework by casting it as a partially observable Markov decision process (POMDP). Due to the large size of the state-space, the exact solution to the POMDP is computationally expensive. Hence, two popular approximate solutions are considered, which yield good power management policies for the transmission attempts. Monte Carlo simulation results illustrate the efficacy of the approach and show that the approximate solutions significantly outperform conventional approaches.
Resumo:
Moore's Law has driven the semiconductor revolution enabling over four decades of scaling in frequency, size, complexity, and power. However, the limits of physics are preventing further scaling of speed, forcing a paradigm shift towards multicore computing and parallelization. In effect, the system is taking over the role that the single CPU was playing: high-speed signals running through chips but also packages and boards connect ever more complex systems. High-speed signals making their way through the entire system cause new challenges in the design of computing hardware. Inductance, phase shifts and velocity of light effects, material resonances, and wave behavior become not only prevalent but need to be calculated accurately and rapidly to enable short design cycle times. In essence, to continue scaling with Moore's Law requires the incorporation of Maxwell's equations in the design process. Incorporating Maxwell's equations into the design flow is only possible through the combined power that new algorithms, parallelization and high-speed computing provide. At the same time, incorporation of Maxwell-based models into circuit and system-level simulation presents a massive accuracy, passivity, and scalability challenge. In this tutorial, we navigate through the often confusing terminology and concepts behind field solvers, show how advances in field solvers enable integration into EDA flows, present novel methods for model generation and passivity assurance in large systems, and demonstrate the power of cloud computing in enabling the next generation of scalable Maxwell solvers and the next generation of Moore's Law scaling of systems. We intend to show the truly symbiotic growing relationship between Maxwell and Moore!
Resumo:
A ubiquitous network plays a critical role to provide rendered services to ubiquitous application running nodes. To provide appropriate resources the nodes are needed to be monitored continuously. Monitoring a node in ubiquitous network is challenging because of dynamicity and heterogeneity of the ubiquitous network. The network monitor has to monitor resource parameters, like data rate, delay and throughput, as well as events such as node failure, network failure and fault in the system to curb the system failure. In this paper, we propose a method to develop a ubiquitous system monitoring protocol using agents. Earlier works on network monitoring using agents consider that the agents are designed for particular network. While in our work the heterogeneity property of the network has been considered. We have shown that the nodes' behaviour can be easily monitored by using agents (both static and mobile agent). The past behavior of the application and network, and past history of the Unode and the predecessor are taken into consideration to help SA to take appropriate decision during the time of emergency situation like unavailability of resources at the local administration, and to predict the migration of the Unode based on the previous node history. The results obtained in the simulation reflects the effectiveness of the technique.
Resumo:
The magnetic structure and properties of sodium iron fluorophosphate Na2FePO4F (space group Pbcn), a cathode material for rechargeable batteries, were studied using magnetometry and neutron powder diffraction. The material, which can be described as a quasi-layered structure with zigzag Fe-octahedral chains, develops a long-range antiferromagnetic order below similar to 3.4 K. The magnetic structure is rationalized as a super-exchange-driven ferromagnetic ordering of chains running along the a-axis, coupled antiferromagnetically by super-super-exchange via phosphate groups along the c-axis, with ordering along the b-axis likely due to the contribution of dipole dipole interactions.
Resumo:
Multi-GPU machines are being increasingly used in high-performance computing. Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and manage data on each GPU. Existing works that propose to automate data allocations for GPUs have limitations and inefficiencies in terms of allocation sizes, exploiting reuse, transfer costs, and scalability. We propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding-Box-based Memory Manager (BBMM). BBMM can perform at runtime, during standard set operations like union, intersection, and difference, finding subset and superset relations on hyperrectangular regions of array data (bounding boxes). It uses these operations along with some compiler assistance to identify, allocate, and manage data required by applications in terms of disjoint bounding boxes. This allows it to (1) allocate exactly or nearly as much data as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence maximize data reuse across tiles and minimize data transfer overhead, and (3) and as a result, maximize utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a four-GPU machine with various scientific programs showed that BBMM reduces data allocations on each GPU by up to 75% compared to current allocation schemes, yields performance of at least 88% of manually written code, and allows excellent weak scaling.
Resumo:
Tuberculosis continues to kill 1.4 million people annually. During the past 5 years, an alarming increase in the number of patients with multidrug-resistant tuberculosis and extensively drug-resistant tuberculosis has been noted, particularly in eastern Europe, Asia, and southern Africa. Treatment outcomes with available treatment regimens for drug-resistant tuberculosis are poor. Although substantial progress in drug development for tuberculosis has been made, scientific progress towards development of interventions for prevention and improvement of drug treatment outcomes have lagged behind. Innovative interventions are therefore needed to combat the growing pandemic of multidrug-resistant and extensively drug-resistant tuberculosis. Novel adjunct treatments are needed to accomplish improved cure rates for multidrug-resistant and extensively drug-resistant tuberculosis. A novel, safe, widely applicable, and more effective vaccine against tuberculosis is also desperately sought to achieve disease control. The quest to develop a universally protective vaccine for tuberculosis continues. So far, research and development of tuberculosis vaccines has resulted in almost 20 candidates at different stages of the clinical trial pipeline. Host-directed therapies are now being developed to refocus the anti-Mycobacterium tuberculosis-directed immune responses towards the host; a strategy that could be especially beneficial for patients with multidrug-resistant tuberculosis or extensively drug-resistant tuberculosis. As we are running short of canonical tuberculosis drugs, more attention should be given to host-directed preventive and therapeutic intervention measures.
Resumo:
Different medium access control (MAC) layer protocols, for example, IEEE 802.11 series and others are used in wireless local area networks. They have limitation in handling bulk data transfer applications, like video-on-demand, videoconference, etc. To avoid this problem a cooperative MAC protocol environment has been introduced, which enables the MAC protocol of a node to use its nearby nodes MAC protocol as and when required. We have found on various occasions that specified cooperative MAC establishes cooperative transmissions to send the specified data to the destination. In this paper we propose cooperative MAC priority (CoopMACPri) protocol which exploits the advantages of priority value given by the upper layers for selection of different paths to nodes running heterogeneous applications in a wireless ad hoc network environment. The CoopMACPri protocol improves the system throughput and minimizes energy consumption. Using a Markov chain model, we developed a model to analyse the performance of CoopMACPri protocol; and also derived closed-form expression of saturated system throughput and energy consumption. Performance evaluations validate the accuracy of the theoretical analysis, and also show that the performance of CoopMACPri protocol varies with the number of nodes. We observed that the simulation results and analysis reflects the effectiveness of the proposed protocol as per the specifications.
Resumo:
The present article describes a beautiful contribution of Alan Turing to our understanding of how animal coat patterns form. The question that Turing posed was the following. A collection of identical cells (or processors for that matter), all running the exact same program, and all communicating with each other in the exact same way, should always be in the same state. Yet they produce nonhomogeneous periodic patterns, like those seen on animal coats. How does this happen? Turing gave an elegant explanation for this phenomenon, namely that differences between the cells due to small amounts of random noise can actually be amplified into structured periodic patterns. We attempt to describe his core conceptual contribution below.
Resumo:
Content Distribution Networks (CDNs) are widely used to distribute data to large number of users. Traditionally, content is being replicated among a number of surrogate servers, leading to high operational costs. In this context, Peer-to-Peer (P2P) CDNs have emerged as a viable alternative. An issue of concern in P2P networks is that of free riders, i.e., selfish peers who download files and leave without uploading anything in return. Free riding must be discouraged. In this paper, we propose a criterion, the Give-and-Take (G&T) criterion, that disallows free riders. Incorporating the G&T criterion in our model, we study a problem that arises naturally when a new peer enters the system: viz., the problem of downloading a `universe' of segments, scattered among other peers, at low cost. We analyse this hard problem, and characterize the optimal download cost under the G&T criterion. We propose an optimal algorithm, and provide a sub-optimal algorithm that is nearly optimal, but runs much more quickly; this provides an attractive balance between running time and performance. Finally, we compare the performance of our algorithms with that of a few existing P2P downloading strategies in use. We also study the computation time for prescribing the strategy for initial segment and peer selection for the newly arrived peer for various existing and proposed algorithms, and quantify cost-computation time trade-offs.
Resumo:
Polyhedral techniques for program transformation are now used in several proprietary and open source compilers. However, most of the research on polyhedral compilation has focused on imperative languages such as C, where the computation is specified in terms of statements with zero or more nested loops and other control structures around them. Graphical dataflow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and referential transparency of dataflow languages impose a different set of challenges. In this paper, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from dataflow programs and to synthesize them from their equivalent polyhedral representation. We then describe PolyGLoT, a framework for automatic transformation of dataflow programs which we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used dataflow programming languages. Results show that dataflow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30x while running on an 8-core Intel system.
Resumo:
We address the parameterized complexity ofMaxColorable Induced Subgraph on perfect graphs. The problem asks for a maximum sized q-colorable induced subgraph of an input graph G. Yannakakis and Gavril IPL 1987] showed that this problem is NP-complete even on split graphs if q is part of input, but gave a n(O(q)) algorithm on chordal graphs. We first observe that the problem is W2]-hard parameterized by q, even on split graphs. However, when parameterized by l, the number of vertices in the solution, we give two fixed-parameter tractable algorithms. The first algorithm runs in time 5.44(l) (n+#alpha(G))(O(1)) where #alpha(G) is the number of maximal independent sets of the input graph. The second algorithm runs in time q(l+o()l())n(O(1))T(alpha) where T-alpha is the time required to find a maximum independent set in any induced subgraph of G. The first algorithm is efficient when the input graph contains only polynomially many maximal independent sets; for example split graphs and co-chordal graphs. The running time of the second algorithm is FPT in l alone (whenever T-alpha is a polynomial in n), since q <= l for all non-trivial situations. Finally, we show that (under standard complexitytheoretic assumptions) the problem does not admit a polynomial kernel on split and perfect graphs in the following sense: (a) On split graphs, we do not expect a polynomial kernel if q is a part of the input. (b) On perfect graphs, we do not expect a polynomial kernel even for fixed values of q >= 2.
Resumo:
The problem of finding an optimal vertex cover in a graph is a classic NP-complete problem, and is a special case of the hitting set question. On the other hand, the hitting set problem, when asked in the context of induced geometric objects, often turns out to be exactly the vertex cover problem on restricted classes of graphs. In this work we explore a particular instance of such a phenomenon. We consider the problem of hitting all axis-parallel slabs induced by a point set P, and show that it is equivalent to the problem of finding a vertex cover on a graph whose edge set is the union of two Hamiltonian Paths. We show the latter problem to be NP-complete, and also give an algorithm to find a vertex cover of size at most k, on graphs of maximum degree four, whose running time is 1.2637(k) n(O(1)).
Resumo:
In this paper we consider polynomial representability of functions defined over , where p is a prime and n is a positive integer. Our aim is to provide an algorithmic characterization that (i) answers the decision problem: to determine whether a given function over is polynomially representable or not, and (ii) finds the polynomial if it is polynomially representable. The previous characterizations given by Kempner (Trans. Am. Math. Soc. 22(2):240-266, 1921) and Carlitz (Acta Arith. 9(1), 67-78, 1964) are existential in nature and only lead to an exhaustive search method, i.e. algorithm with complexity exponential in size of the input. Our characterization leads to an algorithm whose running time is linear in size of input. We also extend our result to the multivariate case.