135 resultados para Distributed
Resumo:
Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.
Resumo:
This paper considers cooperative spectrum sensing algorithms for Cognitive Radios which focus on reducing the number of samples to make a reliable detection. We propose algorithms based on decentralized sequential hypothesis testing in which the Cognitive Radios sequentially collect the observations, make local decisions and send them to the fusion center for further processing to make a final decision on spectrum usage. The reporting channel between the Cognitive Radios and the fusion center is assumed more realistically as a Multiple Access Channel (MAC) with receiver noise. Furthermore the communication for reporting is limited, thereby reducing the communication cost. We start with an algorithm where the fusion center uses an SPRT-like (Sequential Probability Ratio Test) procedure and theoretically analyze its performance. Asymptotically, its performance is close to the optimal centralized test without fusion center noise. We further modify this algorithm to improve its performance at practical operating points. Later we generalize these algorithms to handle uncertainties in SNR and fading. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
In order to explore the potential use of fly ash and plastic waste in bulk quantities in civil engineering applications, it is necessary to understand the behavior of fly ash and fly ash mixed with plastic waste. These materials are considered as wastes and in this study, it is shown that combination of fly ash and plastic waste is very useful. In this regard, various tests such as classification tests, unconfined compressive strength and compressibility tests, consolidated undrained tests, and California bearing ratio tests were conducted. The results indicated that the inclusion of plastic waste in fly ash is effective in improving the engineering properties of fly ash in terms of compressive strength, shear strength parameters, and CBR values. In order to understand the effect of sample size on the shear strength parameters of fly ash and fly ash mixed with plastic waste, consolidated undrained tests were conducted with sample sizes of 38x76mm and 50x100mm. The results of the tests indicate that the shear strength increases with the increase in sample size. The implication of the use of fly ash mixed with plastic waste in unpaved roads is presented in terms of reduction of carbon print.
Resumo:
Experimental studies and atomistic simulations have shown that brittle metallic glasses fail by a cavitation mechanism whose origin has been traced to the presence of intrinsic atomic density fluctuations which give rise to weak zones with reduced yield strength. It has been shown recently through continuum analysis that the presence of these zones can lower the cavitation stress considerably under equibiaxial loading. The objective of the present work is to study the effect of the applied stress state on the cavitation behavior of such a heterogeneous plastic solid with distributed weak zones. To this end, 2D plane strain finite element simulations are performed by subjecting a unit cell containing a weak zone to different (biaxiality) stress ratios. The volume fraction and yield strength of the weak zone are varied over a wide range. The results show that unlike in a homogeneous plastic solid, the cavitation stress of the heterogeneous aggregate does not reduce appreciably as the stress ratio decreases from unity when the yield strength of the weak zone is low. It is found that a non-dimensional parameter characterizing the stress state prevailing in the weak zone and its yield properties uniquely control the cavitation stress. The nature of cavitation bifurcation may change from unstable bifurcation to the left at sufficiently low stress ratio to one involving snap cavitation at high stress ratio. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
In this paper, a C-0 interior penalty method has been proposed and analyzed for distributed optimal control problems governed by the biharmonic operator. The state and adjoint variables are discretized using continuous piecewise quadratic finite elements while the control variable is discretized using piecewise constant approximations. A priori and a posteriori error estimates are derived for the state, adjoint and control variables under minimal regularity assumptions. Numerical results justify the theoretical results obtained. The a posteriori error estimators are useful in adaptive finite element approximation and the numerical results indicate that the sharp error estimators work efficiently in guiding the mesh refinement. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
The taxonomy of the Hanuman langur (Semnopithecus spp.), a widely distributed Asian colobine monkey, has been in a flux for a long time due to much disagreement between various classification schemes. However, results from a recent field-based morphological study were consistent with Hill's (Ceylon J Sci 21:277-305, 1939) species level classification scheme. Here we tested the validity of S. hypoleucos and S. priam, the two South Indian species recognized by Hill. To this end, one mitochondrial and four nuclear markers were sequenced from over 72 non-invasive samples of Hanuman langurs and S. johnii collected from across India. The molecular data were subjected to various tree building methods. The nuclear data was also used in a Bayesian structure analysis and to determine the genealogical sorting index of each hypothesized species. Results from nuclear data suggest that the South Indian population of Hanuman langur consists of two units that correspond to the species recognized by Hill. However in the mitochondrial tree S. johnii and S. priam were polyphyletic probably due to retention of ancestral polymorphism and/or low levels of hybridization. Implications of these results on conservation of Hanuman langurs are also discussed.
Resumo:
In WSNs the communication traffic is often time and space correlated, where multiple nodes in a proximity start transmitting simultaneously. Such a situation is known as spatially correlated contention. The random access method to resolve such contention suffers from high collision rate, whereas the traditional distributed TDMA scheduling techniques primarily try to improve the network capacity by reducing the schedule length. Usually, the situation of spatially correlated contention persists only for a short duration, and therefore generating an optimal or suboptimal schedule is not very useful. Additionally, if an algorithm takes very long time to schedule, it will not only introduce additional delay in the data transfer but also consume more energy. In this paper, we present a distributed TDMA slot scheduling (DTSS) algorithm, which considerably reduces the time required to perform scheduling, while restricting the schedule length to the maximum degree of interference graph. The DTSS algorithm supports unicast, multicast, and broadcast scheduling, simultaneously without any modification in the protocol. We have analyzed the protocol for average case performance and also simulated it using Castalia simulator to evaluate its runtime performance. Both analytical and simulation results show that our protocol is able to considerably reduce the time required for scheduling.
Resumo:
Opportunistic selection in multi-node wireless systems improves system performance by selecting the ``best'' node and by using it for data transmission. In these systems, each node has a real-valued local metric, which is a measure of its ability to improve system performance. Our goal is to identify the best node, which has the largest metric. We propose, analyze, and optimize a new distributed, yet simple, node selection scheme that combines the timer scheme with power control. In it, each node sets a timer and transmit power level as a function of its metric. The power control is designed such that the best node is captured even if. other nodes simultaneously transmit with it. We develop several structural properties about the optimal metric-to-timer-and-power mapping, which maximizes the probability of selecting the best node. These significantly reduce the computational complexity of finding the optimal mapping and yield valuable insights about it. We show that the proposed scheme is scalable and significantly outperforms the conventional timer scheme. We investigate the effect of. and the number of receive power levels. Furthermore, we find that the practical peak power constraint has a negligible impact on the performance of the scheme.
Resumo:
A temperature compensation method is proposed for CNT-composite strain sensors. CNT-composite sensors are fabricated on an elastic polymer substrate having known thermo-mechanical properties to introduce thermo-mechanical strain and further calibration of the sensor. Strain is induced on the sensor by bending the substrate as a cantilever configuration. Response of the sensor is measured using a bridge circuit method. Induced strain in the beam is determined using beam theory. The sensors are characterized for different CNT concentrations and at various temperatures. A model based temperature compensation scheme is proposed and verified experimentally. The result proves the ability of CNT-nanocomposite strain sensors to be used under varying temperature applications. A method is proposed to determine the strain and temperature simultaneously. The CNT sensors are simple to fabricate in complex patterns with excellent repeatability and do not require bonding layer.
Resumo:
This paper studies a pilot-assisted physical layer data fusion technique known as Distributed Co-Phasing (DCP). In this two-phase scheme, the sensors first estimate the channel to the fusion center (FC) using pilots sent by the latter; and then they simultaneously transmit their common data by pre-rotating them by the estimated channel phase, thereby achieving physical layer data fusion. First, by analyzing the symmetric mutual information of the system, it is shown that the use of higher order constellations (HOC) can improve the throughput of DCP compared to the binary signaling considered heretofore. Using an HOC in the DCP setting requires the estimation of the composite DCP channel at the FC for data decoding. To this end, two blind algorithms are proposed: 1) power method, and 2) modified K-means algorithm. The latter algorithm is shown to be computationally efficient and converges significantly faster than the conventional K-means algorithm. Analytical expressions for the probability of error are derived, and it is found that even at moderate to low SNRs, the modified K-means algorithm achieves a probability of error comparable to that achievable with a perfect channel estimate at the FC, while requiring no pilot symbols to be transmitted from the sensor nodes. Also, the problem of signal corruption due to imperfect DCP is investigated, and constellation shaping to minimize the probability of signal corruption is proposed and analyzed. The analysis is validated, and the promising performance of DCP for energy-efficient physical layer data fusion is illustrated, using Monte Carlo simulations.
Resumo:
The time division multiple access (TDMA) based channel access mechanisms perform better than the contention based channel access mechanisms, in terms of channel utilization, reliability and power consumption, specially for high data rate applications in wireless sensor networks (WSNs). Most of the existing distributed TDMA scheduling techniques can be classified as either static or dynamic. The primary purpose of static TDMA scheduling algorithms is to improve the channel utilization by generating a schedule of smaller length. But, they usually take longer time to schedule, and hence, are not suitable for WSNs, in which the network topology changes dynamically. On the other hand, dynamic TDMA scheduling algorithms generate a schedule quickly, but they are not efficient in terms of generated schedule length. In this paper, we propose a novel scheme for TDMA scheduling in WSNs, which can generate a compact schedule similar to static scheduling algorithms, while its runtime performance can be matched with those of dynamic scheduling algorithms. Furthermore, the proposed distributed TDMA scheduling algorithm has the capability to trade-off schedule length with the time required to generate the schedule. This would allow the developers of WSNs, to tune the performance, as per the requirement of prevalent WSN applications, and the requirement to perform re-scheduling. Finally, the proposed TDMA scheduling is fault-tolerant to packet loss due to erroneous wireless channel. The algorithm has been simulated using the Castalia simulator to compare its performance with those of others in terms of generated schedule length and the time required to generate the TDMA schedule. Simulation results show that the proposed algorithm generates a compact schedule in a very less time.
Resumo:
Coarse Grained Reconfigurable Architectures (CGRA) are emerging as embedded application processing units in computing platforms for Exascale computing. Such CGRAs are distributed memory multi- core compute elements on a chip that communicate over a Network-on-chip (NoC). Numerical Linear Algebra (NLA) kernels are key to several high performance computing applications. In this paper we propose a systematic methodology to obtain the specification of Compute Elements (CE) for such CGRAs. We analyze block Matrix Multiplication and block LU Decomposition algorithms in the context of a CGRA, and obtain theoretical bounds on communication requirements, and memory sizes for a CE. Support for high performance custom computations common to NLA kernels are met through custom function units (CFUs) in the CEs. We present results to justify the merits of such CFUs.
Resumo:
Distributed system has quite a lot of servers to attain increased availability of service and for fault tolerance. Balancing the load among these servers is an important task to achieve better performance. There are various hardware and software based load balancing solutions available. However there is always an overhead on Servers and the Load Balancer while communicating with each other and sharing their availability and the current load status information. Load balancer is always busy in listening to clients' request and redirecting them. It also needs to collect the servers' availability status frequently, to keep itself up-to-date. Servers are busy in not only providing service to clients but also sharing their current load information with load balancing algorithms. In this paper we have proposed and discussed the concept and system model for software based load balancer along with Availability-Checker and Load Reporters (LB-ACLRs) which reduces the overhead on server and the load balancer. We have also described the architectural components with their roles and responsibilities. We have presented a detailed analysis to show how our proposed Availability Checker significantly increases the performance of the system.
Resumo:
In wireless sensor networks (WSNs), contention occurs when two or more nodes in a proximity simultaneously try to access the channel. The contention causes collisions, which are very likely to occur when traffic is correlated. The excessive collision not only affects the reliability and the QoS of the application, but also the lifetime of the network. It is well-known that random access mechanisms do not efficiently handle correlated-contention, and therefore, suffer from high collision rate. Most of the existing TDMA scheduling techniques try to find an optimal or a sub-optimal schedule. Usually, the situation of correlated-contention persists only for a short duration, and therefore, it is not worthwhile to take a long time to generate an optimal or a sub-optimal schedule. We propose a randomized distributed TDMA scheduling (RD-TDMA) algorithm to quickly generate a feasible schedule (not necessarily optimal) to handle correlated-contention in WSNs. In RD-TDMA, a node in the network negotiates a slot with its neighbors using the message exchange mechanism. The proposed protocol has been simulated using the Castalia simulator to evaluate its runtime performance. Simulation results show that the RD-TDMA algorithm considerably reduces the time required to schedule.
Resumo:
We propose a distributed sequential algorithm for quick detection of spectral holes in a Cognitive Radio set up. Two or more local nodes make decisions and inform the fusion centre (FC) over a reporting Multiple Access Channel (MAC), which then makes the final decision. The local nodes use energy detection and the FC uses mean detection in the presence of fading, heavy-tailed electromagnetic interference (EMI) and outliers. The statistics of the primary signal, channel gain and the EMI is not known. Different nonparametric sequential algorithms are compared to choose appropriate algorithms to be used at the local nodes and the Fe. Modification of a recently developed random walk test is selected for the local nodes for energy detection as well as at the fusion centre for mean detection. We show via simulations and analysis that the nonparametric distributed algorithm developed performs well in the presence of fading, EMI and outliers. The algorithm is iterative in nature making the computation and storage requirements minimal.