204 resultados para scalable parallel programming
Resumo:
In this paper, we study the diversity-multiplexing-gain tradeoff (DMT) of wireless relay networks under the half-duplex constraint. It is often unclear what penalty if any, is imposed by the half-duplex constraint on the DMT of such networks. We study two classes of networks; the first class, called KPP(I) networks, is the class of networks with the relays organized in K parallel paths between the source and the destination. While we assume that there is no direct source-destination path, the K relaying paths can interfere with each other. The second class, termed as layered networks, is comprised of relays organized in layers, where links exist only between adjacent layers. We present a communication scheme based on static schedules and amplify-and-forward relaying for these networks. We also show that for KPP(I) networks with K >= 3, the proposed schemes can achieve full-duplex DMT performance, thus demonstrating that there is no performance hit on the DMT due to the half-duplex constraint. We also show that, for layered networks, a linear DMT of d(max)(1 - r)(+) between the maximum diversity d(max) and the maximum MG, r(max) = 1 is achievable. We adapt existing DMT optimal coding schemes to these networks, thus specifying the end-to-end communication strategy explicitly.
Resumo:
In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.
Resumo:
Development towards the combination of miniaturization and improved functionality of RFIC has been stalled due to the lack of high-performance integrated inductors. To meet this challenge, integration of magnetic material with high permeability as well as low conductivity is a must. Ferrite films are excellent candidates for RF devices due to their low cost, high resistivity, and low eddy current losses. Unlike its bulk counterpart, nanocrystalline zinc ferrite, because of partial inversion in the spinel structure, exhibits novel magnetic properties suitable for RF applications. However, most scalable ferrite film deposition processes require either high temperature or expensive equipment or both. We report a novel low temperature (< 200 degrees C) solution-based deposition process for obtaining high quality, polycrystalline zinc ferrite thin films (ZFTF) on Si (100) and on CMOS-foundry-fabricated spiral inductor structures, rapidly, using safe solvents and precursors. An enhancement of up to 20% at 5 GHz in the inductance of a fabricated device was achieved due to the deposited ZFTF. Substantial inductance enhancement requires sufficiently thick films and our reported process is capable of depositing smooth, uniform films as thick as similar to 20 mu m just by altering the solution composition. The method is capable of depositing film conformally on a surface with complex geometry. As it requires neither a vacuum system nor any post-deposition processing, the method reported here has a low thermal budget, making it compatible with modern CMOS process flow.
Resumo:
In social choice theory, preference aggregation refers to computing an aggregate preference over a set of alternatives given individual preferences of all the agents. In real-world scenarios, it may not be feasible to gather preferences from all the agents. Moreover, determining the aggregate preference is computationally intensive. In this paper, we show that the aggregate preference of the agents in a social network can be computed efficiently and with sufficient accuracy using preferences elicited from a small subset of critical nodes in the network. Our methodology uses a model developed based on real-world data obtained using a survey on human subjects, and exploits network structure and homophily of relationships. Our approach guarantees good performance for aggregation rules that satisfy a property which we call expected weak insensitivity. We demonstrate empirically that many practically relevant aggregation rules satisfy this property. We also show that two natural objective functions in this context satisfy certain properties, which makes our methodology attractive for scalable preference aggregation over large scale social networks. We conclude that our approach is superior to random polling while aggregating preferences related to individualistic metrics, whereas random polling is acceptable in the case of social metrics.
Resumo:
The ability to perform strong updates is the main contributor to the precision of flow-sensitive pointer analysis algorithms. Traditional flow-sensitive pointer analyses cannot strongly update pointers residing in the heap. This is a severe restriction for Java programs. In this paper, we propose a new flow-sensitive pointer analysis algorithm for Java that can perform strong updates on heap-based pointers effectively. Instead of points-to graphs, we represent our points-to information as maps from access paths to sets of abstract objects. We have implemented our analysis and run it on several large Java benchmarks. The results show considerable improvement in precision over the points-to graph based flow-insensitive and flow-sensitive analyses, with reasonable running time.
Resumo:
With proliferation of chip multicores (CMPs) on desktops and embedded platforms, multi-threaded programs have become ubiquitous. Existence of multiple threads may cause resource contention, such as, in on-chip shared cache and interconnects, depending upon how they access resources. Hence, we propose a tool - Thread Contention Predictor (TCP) to help quantify the number of threads sharing data and their sharing pattern. We demonstrate its use to predict a more profitable shared, last level on-chip cache (LLC) access policy on CMPs. Our cache configuration predictor is 2.2 times faster compared to the cycle-accurate simulations. We also demonstrate its use for identifying hot data structures in a program which may cause performance degradation due to false data sharing. We fix layout of such data structures and show up-to 10% and 18% improvement in execution time and energy-delay product (EDP), respectively.
Resumo:
This paper presents a study of the nature of the degrees-of-freedom of spatial manipulators based on the concept of partition of degrees-of-freedom. In particular, the partitioning of degrees-of-freedom is studied in five lower-mobility spatial parallel manipulators possessing different combinations of degrees-of-freedom. An extension of the existing theory is introduced so as to analyse the nature of the gained degree(s)-of-freedom at a gain-type singularity. The gain of one- and two-degrees-of-freedom is analysed in several well-studied, as well as newly developed manipulators. The formulations also present a basis for the analysis of the velocity kinematics of manipulators of any architecture. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
Resumo:
A power scalable receiver architecture is presented for low data rate Wireless Sensor Network (WSN) applications in 130nm RF-CMOS technology. Power scalable receiver is motivated by the ability to leverage lower run-time performance requirement to save power. The proposed receiver is able to switch power settings based on available signal and interference levels while maintaining requisite BER. The Low-IF receiver consists of Variable Noise and Linearity LNA, IQ Mixers, VGA, Variable Order Complex Bandpass Filter and Variable Gain and Bandwidth Amplifier (VGBWA) capable of driving variable sampling rate ADC. Various blocks have independent power scaling controls depending on their noise, gain and interference rejection (IR) requirements. The receiver is designed for constant envelope QPSK-type modulation with 2.4GHz RF input, 3MHz IF and 2MHz bandwidth. The chip operates at 1V Vdd with current scalable from 4.5mA to 1.3mA and chip area of 0.65mm2.
Resumo:
A new `generalized model predictive static programming (G-MPSP)' technique is presented in this paper in the continuous time framework for rapidly solving a class of finite-horizon nonlinear optimal control problems with hard terminal constraints. A key feature of the technique is backward propagation of a small-dimensional weight matrix dynamics, using which the control history gets updated. This feature, as well as the fact that it leads to a static optimization problem, are the reasons for its high computational efficiency. It has been shown that under Euler integration, it is equivalent to the existing model predictive static programming technique, which operates on a discrete-time approximation of the problem. Performance of the proposed technique is demonstrated by solving a challenging three-dimensional impact angle constrained missile guidance problem. The problem demands that the missile must meet constraints on both azimuth and elevation angles in addition to achieving near zero miss distance, while minimizing the lateral acceleration demand throughout its flight path. Both stationary and maneuvering ground targets are considered in the simulation studies. Effectiveness of the proposed guidance has been verified by considering first order autopilot lag as well as various target maneuvers.
Resumo:
The stability of two long unsupported circular parallel tunnels aligned horizontally in fully cohesive and cohesive-frictional soils has been determined. An upper bound limit analysis in combination with finite elements and linear programming is employed to perform the analysis. For different clear spacing (S) between the tunnels, the stability of tunnels is expressed in terms of a non-dimensional stability number (gamma H-max/c); where H is tunnel cover, c refers to soil cohesion, and gamma(max) is maximum unit weight of soil mass which the tunnels can bear without any collapse. The variation of the stability number with tunnels' spacing has been established for different combinations of H/D, m and phi; where D refers to diameter of each tunnel, phi is the internal friction angle of soil and m accounts for the rate at which the cohesion increases linearly with depth. The stability number reduces continuously with a decrease in the spacing between the tunnels. The optimum spacing (S-opt) between the two tunnels required to eliminate the interference effect increases with (i) an increase in H/D and (ii) a decrease in the values of both m and phi. The value of S-opt lies approximately in a range of 1.5D-3.5D with H/D = 1 and 7D-12D with H/D = 7. The results from the analysis compare reasonably well with the different solutions reported in literature. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
In this paper we propose a framework for optimum steering input determination of all-wheel steer vehicles (AWSV) on rough terrains. The framework computes the steering input which minimizes the tracking error for a given trajectory. Unlike previous methodologies of computing steering inputs of car-like vehicles, the proposed methodology depends explicitly on the vehicle dynamics and can be extended to vehicle having arbitrary number of steering inputs. A fully generic framework has been used to derive the vehicle dynamics and a non-linear programming based constrained optimization approach has been used to compute the steering input considering the instantaneous vehicle dynamics, no-slip and contact constraints of the vehicle. All Wheel steer Vehicles have a special parallel steering ability where the instantaneous centre of rotation (ICR) is at infinity. The proposed framework automatically enables the vehicle to choose between parallel steer and normal operation depending on the error with respect to the desired trajectory. The efficacy of the proposed framework is proved by extensive uneven terrain simulations, for trajectories with continuous or discontinuous velocity profile.
Resumo:
A robust suboptimal reentry guidance scheme is presented for a reusable launch vehicle using the recently developed, computationally efficient model predictive static programming. The formulation uses the nonlinear vehicle dynamics with a spherical and rotating Earth, hard constraints for desired terminal conditions, and an innovative cost function having several components with associated weighting factors that can account for path and control constraints in a soft constraint manner, thereby leading to smooth solutions of the guidance parameters. The proposed guidance essentially shapes the trajectory of the vehicle by computing the necessary angle of attack and bank angle that the vehicle should execute. The path constraints are the structural load constraint, thermal load constraint, bounds on the angle of attack, and bounds on the bank angle. In addition, the terminal constraints include the three-dimensional position and velocity vector components at the end of the reentry. Whereas the angle-of-attack command is generated directly, the bank angle command is generated by first generating the required heading angle history and then using it in a dynamic inversion loop considering the heading angle dynamics. Such a two-loop synthesis of bank angle leads to better management of the vehicle trajectory and avoids mathematical complexity as well. Moreover, all bank angle maneuvers have been confined to the middle of the trajectory and the vehicle ends the reentry segment with near-zero bank angle, which is quite desirable. It has also been demonstrated that the proposed guidance has sufficient robustness for state perturbations as well as parametric uncertainties in the model.
Resumo:
Knowledge of protein-ligand interactions is essential to understand several biological processes and important for applications ranging from understanding protein function to drug discovery and protein engineering. Here, we describe an algorithm for the comparison of three-dimensional ligand-binding sites in protein structures. A previously described algorithm, PocketMatch (version 1.0) is optimised, expanded, and MPI-enabled for parallel execution. PocketMatch (version 2.0) rapidly quantifies binding-site similarity based on structural descriptors such as residue nature and interatomic distances. Atomic-scale alignments may also be obtained from amino acid residue pairings generated. It allows an end-user to compute database-wide, all-to-all comparisons in a matter of hours. The use of our algorithm on a sample dataset, performance-analysis, and annotated source code is also included.