55 resultados para heterogeneous delays
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Orthogonal frequency-division multiple access (OFDMA) systems divide the available bandwidth into orthogonal subchannels and exploit multiuser diversity and frequency selectivity to achieve high spectral efficiencies. However, they require a significant amount of channel state feedback for scheduling and rate adaptation and are sensitive to feedback delays. We develop a comprehensive analysis for OFDMA system throughput in the presence of feedback delays as a function of the feedback scheme, frequency-domain scheduler, and rate adaptation rule. Also derived are expressions for the outage probability, which captures the inability of a subchannel to successfully carry data due to the feedback scheme or feedback delays. Our model encompasses the popular best-n and threshold-based feedback schemes and the greedy, proportional fair, and round-robin schedulers that cover a wide range of throughput versus fairness tradeoff. It helps quantify the different robustness of the schedulers to feedback overhead and delays. Even at low vehicular speeds, it shows that small feedback delays markedly degrade the throughput and increase the outage probability. Further, given the feedback delay, the throughput degradation depends primarily on the feedback overhead and not on the feedback scheme itself. We also show how to optimize the rate adaptation thresholds as a function of feedback delay.
Resumo:
GPUs have been used for parallel execution of DOALL loops. However, loops with indirect array references can potentially cause cross iteration dependences which are hard to detect using existing compilation techniques. Applications with such loops cannot easily use the GPU and hence do not benefit from the tremendous compute capabilities of GPUs. In this paper, we present an algorithm to compute at runtime the cross iteration dependences in such loops. The algorithm uses both the CPU and the GPU to compute the dependences. Specifically, it effectively uses the compute capabilities of the GPU to quickly collect the memory accesses performed by the iterations by executing the slice functions generated for the indirect array accesses. Using the dependence information, the loop iterations are levelized such that each level contains independent iterations which can be executed in parallel. Another interesting aspect of the proposed solution is that it pipelines the dependence computation of the future level with the actual computation of the current level to effectively utilize the resources available in the GPU. We use NVIDIA Tesla C2070 to evaluate our implementation using benchmarks from Polybench suite and some synthetic benchmarks. Our experiments show that the proposed technique can achieve an average speedup of 6.4x on loops with a reasonable number of cross iteration dependences.
Resumo:
In this paper a generalisation of the Voronoi partition is used for locational optimisation of facilities having different service capabilities and limited range or reach. The facilities can be stationary, such as base stations in a cellular network, hospitals, schools, etc., or mobile units, such as multiple unmanned aerial vehicles, automated guided vehicles, etc., carrying sensors, or mobile units carrying relief personnel and materials. An objective function for optimal deployment of the facilities is formulated, and its critical points are determined. The locally optimal deployment is shown to be a generalised centroidal Voronoi configuration in which the facilities are located at the centroids of the corresponding generalised Voronoi cells. The problem is formulated for more general mobile facilities, and formal results on the stability, convergence and spatial distribution of the proposed control laws responsible for the motion of the agents carrying facilities, under some constraints on the agents' speed and limit on the sensor range, are provided. The theoretical results are supported with illustrative simulation results.
Resumo:
An electron rich porous metal-organic framework (MOF) has been synthesized, which acts as an effective heterogeneous catalyst for Diels-Alder reactions through encapsulation of the reactants in confined nano-channels of the framework.
Resumo:
Orthogonal frequency division multiple access (OFDMA) systems exploit multiuser diversity and frequency-selectivity to achieve high spectral efficiencies. However, they require considerable feedback for scheduling and rate adaptation, and are sensitive to feedback delays. We develop a comprehensive analysis of the OFDMA system throughput as a function of the feedback scheme, frequency-domain scheduler, and discrete rate adaptation rule in the presence of feedback delays. We analyze the popular best-n and threshold-based feedback schemes. We show that for both the greedy and round-robin schedulers, the throughput degradation, given a feedback delay, depends primarily on the fraction of feedback reduced by the feedback scheme and not the feedback scheme itself. Even small feedback delays at low vehicular speeds are shown to significantly degrade the throughput. We also show that optimizing the link adaptation thresholds as a function of the feedback delay can effectively counteract the detrimental effect of delays.
Resumo:
A transform approach to network coding was in-troduced by Bavirisetti et al. (arXiv:1103.3882v3 [cs.IT]) as a tool to view wireline networks with delays as k-instantaneous networks (for some large k). When the local encoding kernels (LEKs) of the network are varied with every time block of length k >1, the network is said to use block time varying LEKs. In this work, we propose a Precoding Based Network Alignment (PBNA) scheme based on transform approach and block time varying LEKs for three-source three-destination multiple unicast network with delays (3-S3-D MUN-D). In a recent work, Menget al. (arXiv:1202.3405v1 [cs.IT]) reduced the infinite set of sufficient conditions for feasibility of PBNA in a three-source three-destination instantaneous multiple unicast network as given by Das et al. (arXiv:1008.0235v1 [cs.IT]) to a finite set and also showed that the conditions are necessary. We show that the conditions of Meng et al. are also necessary and sufficient conditions for feasibility of PBNA based on transform approach and block time varying LEKs for 3-S3-D MUN-D.
Resumo:
In this paper, we study the collective motion of individually controlled planar particles when they are coupled through heterogeneous controller gains. Two types of collective formations, synchronization and balancing, are described and analyzed under the influence of these heterogeneous controller gains. These formations are characterized by the motion of the centroid of the group of particles. In synchronized formation, the particles and their centroid move in a common direction, while in balanced formation the movement of particles possess a fixed location of the centroid. We show that, by selecting suitable controller gains, these formations can be controlled significantly to obtain not only a desired direction of motion but also a desired location of the centroid. We present the results for N-particles in synchronized formation, while in balanced formation our analysis is confined to two and three particles.
Resumo:
In the present study an analytical model has been presented to describe the transient temperature distribution and advancement of the thermal front generated due to the reinjection of heat depleted water in a heterogeneous geothermal reservoir. One dimensional heat transport equation in porous media with advection and longitudinal heat conduction has been solved analytically using Laplace transform technique in a semi infinite medium. The heterogeneity of the porous medium is expressed by the spatial variation of the flow velocity and the longitudinal effective thermal conductivity of the medium. A simpler solution is also derived afterwards neglecting the longitudinal conduction depending on the situation where the contribution to the transient heat transport phenomenon in the porous media is negligible. Solution for a homogeneous aquifer with constant values of the rock and fluid parameters is also derived with an aim to compare the results with that of the heterogeneous one. The effect of some of the parameters involved, on the transient heat transport phenomenon is assessed by observing the variation of the results with different magnitudes of those parameters. Results prove the heterogeneity of the medium, the flow velocity and the longitudinal conductivity to have great influence and porosity to have negligible effect on the transient temperature distribution. (C) 2013 Elsevier Inc. All rights reserved.
Minimizing total weighted tardiness on heterogeneous batch processors with incompatible job families
Resumo:
In this paper, we address a scheduling problem for minimizing total weighted tardiness. The background for the paper is derived from the automobile gear manufacturing process. We consider the bottleneck operation of heat treatment stage of gear manufacturing. Real-life scenarios like unequal release times, incompatible job families, nonidentical job sizes, heterogeneous batch processors, and allowance for job splitting have been considered. We have developed a mathematical model which takes into account dynamic starting conditions. The problem considered in this study is NP-hard in nature, and hence heuristic algorithms have been proposed to address it. For real-life large-size problems, the performance of the proposed heuristic algorithms is evaluated using the method of estimated optimal solution available in literature. Extensive computational analyses reveal that the proposed heuristic algorithms are capable of consistently obtaining near-optimal statistically estimated solutions in very reasonable computational time.
Resumo:
Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.
Resumo:
This article describes the determination of the internal structure of heterogeneous nanoparticle systems including inverted core-shell (CdS core and CdSe shell) and alloyed (CdSeS) quantum dots using depth-resolved, variable-energy X-ray photoelectron spectroscopy (XPS). A unique feature of this work is the combination of photoelectron spectroscopy performed at lower X-ray energies (400-700 eV), to achieve surface sensitivity, with bulk sensitive measurements at high photon energies (>2000 eV), thereby providing detailed information about the whole nanoparticle structure with a great accuracy. The use of high photon energies furthermore allows us to investigate nanoparticles much larger than those studied thus far. This capability is a consequence of the much-increased mean free path of the photoelectron achieved at high excitation energies. Our results show that the actual structures of the synthesized nanoparticles are considerably different from the nominal, targeted structures, which can be post facto rationalized in terms of the reactivity of different constituents.
Resumo:
Streams are periodically disturbed due to flooding, act as edges between habitats and also facilitate the dispersal of propagules, thus being potentially more vulnerable to invasions than adjoining regions. We used a landscape-wide transect-based sampling strategy and a mixed effects modelling approach to understand the effects of distance from stream, a rainfall gradient, light availability and fire history on the distribution of the invasive shrub Lantana camara L.(lantana) in the tropical dry forests of Mudumalai in southern India. The area occupied by lantana thickets and lantana stem abundance were both found to be highest closest to streams across this landscape with a rainfall gradient. There was no advantage in terms of increased abundance or area occupied by lantana when it grew closer to streams in drier areas as compared to moister areas. On an average, the area covered by lantana increased with increasing annual rainfall. Areas that experienced greater number of fires during 1989-2010 had lower lantana stem abundance irrespective of distance from streams. In this landscape, total light availability did not affect lantana abundance. Understanding the spatially variable environmental factors in a heterogeneous landscape influencing the distribution of lantana would aid in making informed management decisions at this scale.
Resumo:
As the beneficial effects of curcumin have often been reported to be limited to its small concentrations, we have undertaken a study to find the aggregation properties of curcumin in water by varying the number of monomers. Our molecular dynamics simulation results show that the equilibrated structure is always an aggregated state with remarkable structural rearrangements as we vary the number of curcumin monomers from 4 to 16 monomers. We find that the curcumin monomers form clusters in a very definite pattern where they tend to aggregate both in parallel and anti-parallel orientation of the phenyl rings, often seen in the formation of beta-sheet in proteins. A considerable enhancement in the population of parallel alignments is observed with increasing the system size from 12 to 16 curcumin monomers. Due to the prevalence of such parallel alignment for large system size, a more closely packed cluster is formed with maximum number of hydrophobic contacts. We also follow the pathway of cluster growth, in particular the transition from the initial segregated to the final aggregated state. We find the existence of a metastable structural intermediate involving a number of intermediate-sized clusters dispersed in the solution. We have constructed a free energy landscape of aggregation where the metatsable state has been identified. The course of aggregation bears similarity to nucleation and growth in highly metastable state. The final aggregated form remains stable with the total exclusion of water from its sequestered hydrophobic core. We also investigate water structure near the cluster surface along with their orientation. We find that water molecules form a distorted tetrahedral geometry in the 1st solvation layer of the cluster, interacting rather strongly with the hydrophilic groups at the surface of the curcumin. The dynamics of such quasi-bound water molecules near the surface of curcumin cluster is considerably slower than the bulk signifying a restricted motion as often found in protein hydration layer. (C) 2014 AIP Publishing LLC.
Resumo:
The grain size of monolayer large area graphene is key to its performance. Microstructural design for the desired grain size requires a fundamental understanding of graphene nucleation and growth. The two levers that can be used to control these aspects are the defect density, whose population can be controlled by annealing, and the gas-phase supersaturation for activation of nucleation at the defect sites. We observe that defects on copper surface, namely dislocations, grain boundaries, triple points, and rolling marks, initiate nucleation of graphene. We show that among these defects dislocations are the most potent nucleation sites, as they get activated at lowest supersaturation. As an illustration, we tailor the defect density and supersaturation to change the domain size of graphene from <1 mu m(2) to >100 mu m(2). Growth data reported in the literature has been summarized on a supersaturation plot, and a regime for defect-dominated growth has been identified. In this growth regime, we demonstrate the spatial control over nucleation at intentionally introduced defects, paving the way for patterned growth of graphene. Our results provide a unified framework for understanding the role of defects in graphene nucleation and can be used as a guideline for controlled growth of graphene.