918 resultados para parallel execution


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Laminar separation bubbles are thought to be highly non-parallel, and hence global stability studies start from this premise. However, experimentalists have always realized that the flow is more parallel than is commonly believed, for pressure-gradient-induced bubbles, and this is why linear parallel stability theory has been successful in describing their early stages of transition. The present experimental/numerical study re-examines this important issue and finds that the base flow in such a separation bubble becomes nearly parallel due to a strong-interaction process between the separated boundary layer and the outer potential flow. The so-called dead-air region or the region of constant pressure is a simple consequence of this strong interaction. We use triple-deck theory to qualitatively explain these features. Next, the implications of global analysis for the linear stability of separation bubbles are considered. In particular we show that in the initial portion of the bubble, where the flow is nearly parallel, local stability analysis is sufficient to capture the essential physics. It appears that the real utility of the global analysis is perhaps in the rear portion of the bubble, where the flow is highly non-parallel, and where the secondary/nonlinear instability stages are likely to dominate the dynamics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we address a scheduling problem for minimizing total weighted flowtime, observed in automobile gear manufacturing. Specifically, the bottleneck operation of the pre-heat treatment stage of gear manufacturing process has been dealt with in scheduling. Many real-life scenarios like unequal release times, sequence dependent setup times, and machine eligibility restrictions have been considered. A mathematical model taking into account dynamic starting conditions has been proposed. The problem is derived to be NP-hard. To approach the problem, a few heuristic algorithms have been proposed. Based on planned computational experiments, the performance of the proposed heuristic algorithms is evaluated: (a) in comparison with optimal solution for small-size problem instances and (b) in comparison with the estimated optimal solution for large-size problem instances. Extensive computational analyses reveal that the proposed heuristic algorithms are capable of consistently yielding near-statistically estimated optimal solutions in a reasonable computational time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Unending quest for performance improvement coupled with the advancements in integrated circuit technology have led to the development of new architectural paradigm. Speculative multithreaded architecture (SpMT) philosophy relies on aggressive speculative execution for improved performance. However, aggressive speculative execution comes with a mixed flavor of improving performance, when successful, and adversely affecting the energy consumption (and performance) because of useless computation in the event of mis-speculation. Dynamic instruction criticality information can be usefully applied to control and guide such an aggressive speculative execution. In this paper, we present a model of micro-execution for SpMT architecture that we have developed to determine the dynamic instruction criticality. We have also developed two novel techniques utilizing the criticality information namely delaying the non-critical loads and the criticality based thread-prediction for reducing useless computations and energy consumption. Experimental results showing break-up of critical instructions and effectiveness of proposed techniques in reducing energy consumption are presented in the context of multiscalar processor that implements SpMT architecture. Our experiments show 17.7% and 11.6% reduction in dynamic energy for criticality based thread prediction and criticality based delayed load scheme respectively while the improvement in dynamic energy delay product is 13.9% and 5.5%, respectively. (c) 2012 Published by Elsevier B.V.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Morse-Smale complex is a topological structure that captures the behavior of the gradient of a scalar function on a manifold. This paper discusses scalable techniques to compute the Morse-Smale complex of scalar functions defined on large three-dimensional structured grids. Computing the Morse-Smale complex of three-dimensional domains is challenging as compared to two-dimensional domains because of the non-trivial structure introduced by the two types of saddle criticalities. We present a parallel shared-memory algorithm to compute the Morse-Smale complex based on Forman's discrete Morse theory. The algorithm achieves scalability via synergistic use of the CPU and the GPU. We first prove that the discrete gradient on the domain can be computed independently for each cell and hence can be implemented on the GPU. Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU. Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs. In order for STMs to be adopted widely for performance critical software, understanding and improving the cache performance of applications running on STM becomes increasingly crucial, as the performance gap between processor and memory continues to grow. In this paper, we present the most detailed experimental evaluation to date, of the cache behavior of STM applications and quantify the impact of the different STM factors on the cache misses experienced by the applications. We find that STMs are not cache friendly, with the data cache stall cycles contributing to more than 50% of the execution cycles in a majority of the benchmarks. We find that on an average, misses occurring inside the STM account for 62% of total data cache miss latency cycles experienced by the applications and the cache performance is impacted adversely due to certain inherent characteristics of the STM itself. The above observations motivate us to propose a set of specific compiler transformations targeted at making the STMs cache friendly. We find that STM's fine grained and application unaware locking is a major contributor to its poor cache behavior. Hence we propose selective Lock Data co-location (LDC) and Redundant Lock Access Removal (RLAR) to address the lock access misses. We find that even transactions that are completely disjoint access parallel, suffer from costly coherence misses caused by the centralized global time stamp updates and hence we propose the Selective Per-Partition Time Stamp (SPTS) transformation to address this. We show that our transformations are effective in improving the cache behavior of STM applications by reducing the data cache miss latency by 20.15% to 37.14% and improving execution time by 18.32% to 33.12% in five of the 8 STAMP applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Identical parallel-connected converters with unequal load sharing have unequal terminal voltages. The difference in terminal voltages is more pronounced in case of back-to-back connected converters, operated in power-circulation mode for the purpose of endurance tests. In this paper, a synchronous reference frame based analysis is presented to estimate the grid current distortion in interleaved, grid-connected converters with unequal terminal voltages. Influence of carrier interleaving angle on rms grid current ripple is studied theoretically as well as experimentally. Optimum interleaving angle to minimize the rms grid current ripple is investigated for different applications of parallel converters. The applications include unity power factor rectifiers, inverters for renewable energy sources, reactive power compensators, and circulating-power test set-up used for thermal testing of high-power converters. Optimum interleaving angle is shown to be a strong function of the average of the modulation indices of the two converters, irrespective of the application. The findings are verified experimentally on two parallel-connected converters, circulating reactive power of up to 150 kVA between them.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We show that every graph of maximum degree 3 can be represented as the intersection graph of axis parallel boxes in three dimensions, that is, every vertex can be mapped to an axis parallel box such that two boxes intersect if and only if their corresponding vertices are adjacent. In fact, we construct a representation in which any two intersecting boxes just touch at their boundaries. Further, this construction can be realized in linear time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we study the diversity-multiplexing-gain tradeoff (DMT) of wireless relay networks under the half-duplex constraint. It is often unclear what penalty if any, is imposed by the half-duplex constraint on the DMT of such networks. We study two classes of networks; the first class, called KPP(I) networks, is the class of networks with the relays organized in K parallel paths between the source and the destination. While we assume that there is no direct source-destination path, the K relaying paths can interfere with each other. The second class, termed as layered networks, is comprised of relays organized in layers, where links exist only between adjacent layers. We present a communication scheme based on static schedules and amplify-and-forward relaying for these networks. We also show that for KPP(I) networks with K >= 3, the proposed schemes can achieve full-duplex DMT performance, thus demonstrating that there is no performance hit on the DMT due to the half-duplex constraint. We also show that, for layered networks, a linear DMT of d(max)(1 - r)(+) between the maximum diversity d(max) and the maximum MG, r(max) = 1 is achievable. We adapt existing DMT optimal coding schemes to these networks, thus specifying the end-to-end communication strategy explicitly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An exciting application of crowdsourcing is to use social networks in complex task execution. In this paper, we address the problem of a planner who needs to incentivize agents within a network in order to seek their help in executing an atomic task as well as in recruiting other agents to execute the task. We study this mechanism design problem under two natural resource optimization settings: (1) cost critical tasks, where the planner's goal is to minimize the total cost, and (2) time critical tasks, where the goal is to minimize the total time elapsed before the task is executed. We identify a set of desirable properties that should ideally be satisfied by a crowdsourcing mechanism. In particular, sybil-proofness and collapse-proofness are two complementary properties in our desiderata. We prove that no mechanism can satisfy all the desirable properties simultaneously. This leads us naturally to explore approximate versions of the critical properties. We focus our attention on approximate sybil-proofness and our exploration leads to a parametrized family of payment mechanisms which satisfy collapse-proofness. We characterize the approximate versions of the desirable properties in cost critical and time critical domain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a study of the nature of the degrees-of-freedom of spatial manipulators based on the concept of partition of degrees-of-freedom. In particular, the partitioning of degrees-of-freedom is studied in five lower-mobility spatial parallel manipulators possessing different combinations of degrees-of-freedom. An extension of the existing theory is introduced so as to analyse the nature of the gained degree(s)-of-freedom at a gain-type singularity. The gain of one- and two-degrees-of-freedom is analysed in several well-studied, as well as newly developed manipulators. The formulations also present a basis for the analysis of the velocity kinematics of manipulators of any architecture. (C) 2013 Elsevier Ltd. All rights reserved.