143 resultados para distributed computing
Resumo:
We consider the problem of compression of a non-Abelian source.This is motivated by the problem of distributed function computation,where it is known that if one is only interested in computing a function of several sources, then one can often improve upon the compression rate required by the Slepian-Wolf bound. Let G be a non-Abelian group having center Z(G). We show here that it is impossible to compress a source with symbols drawn from G when Z(G) is trivial if one employs a homomorphic encoder and a typical-set decoder.We provide achievable upper bounds on the minimum rate required to compress a non-Abelian group with non-trivial center. Also, in a two source setting, we provide achievable upper bounds for compression of any non-Abelian group, using a non-homomorphic encoder.
Resumo:
The setting considered in this paper is one of distributed function computation. More specifically, there is a collection of N sources possessing correlated information and a destination that would like to acquire a specific linear combination of the N sources. We address both the case when the common alphabet of the sources is a finite field and the case when it is a finite, commutative principal ideal ring with identity. The goal is to minimize the total amount of information needed to be transmitted by the N sources while enabling reliable recovery at the destination of the linear combination sought. One means of achieving this goal is for each of the sources to compress all the information it possesses and transmit this to the receiver. The Slepian-Wolf theorem of information theory governs the minimum rate at which each source must transmit while enabling all data to be reliably recovered at the receiver. However, recovering all the data at the destination is often wasteful of resources since the destination is only interested in computing a specific linear combination. An alternative explored here is one in which each source is compressed using a common linear mapping and then transmitted to the destination which then proceeds to use linearity to directly recover the needed linear combination. The article is part review and presents in part, new results. The portion of the paper that deals with finite fields is previously known material, while that dealing with rings is mostly new.Attempting to find the best linear map that will enable function computation forces us to consider the linear compression of source. While in the finite field case, it is known that a source can be linearly compressed down to its entropy, it turns out that the same does not hold in the case of rings. An explanation for this curious interplay between algebra and information theory is also provided in this paper.
Resumo:
This paper considers sequential hypothesis testing in a decentralized framework. We start with two simple decentralized sequential hypothesis testing algorithms. One of which is later proved to be asymptotically Bayes optimal. We also consider composite versions of decentralized sequential hypothesis testing. A novel nonparametric version for decentralized sequential hypothesis testing using universal source coding theory is developed. Finally we design a simple decentralized multihypothesis sequential detection algorithm.
Resumo:
In this paper, we consider a distributed function computation setting, where there are m distributed but correlated sources X1,...,Xm and a receiver interested in computing an s-dimensional subspace generated by [X1,...,Xm]Γ for some (m × s) matrix Γ of rank s. We construct a scheme based on nested linear codes and characterize the achievable rates obtained using the scheme. The proposed nested-linear-code approach performs at least as well as the Slepian-Wolf scheme in terms of sum-rate performance for all subspaces and source distributions. In addition, for a large class of distributions and subspaces, the scheme improves upon the Slepian-Wolf approach. The nested-linear-code scheme may be viewed as uniting under a common framework, both the Korner-Marton approach of using a common linear encoder as well as the Slepian-Wolf approach of employing different encoders at each source. Along the way, we prove an interesting and fundamental structural result on the nature of subspaces of an m-dimensional vector space V with respect to a normalized measure of entropy. Here, each element in V corresponds to a distinct linear combination of a set {Xi}im=1 of m random variables whose joint probability distribution function is given.
Resumo:
We consider information theoretic secret key (SK) agreement and secure function computation by multiple parties observing correlated data, with access to an interactive public communication channel. Our main result is an upper bound on the SK length, which is derived using a reduction of binary hypothesis testing to multiparty SK agreement. Building on this basic result, we derive new converses for multiparty SK agreement. Furthermore, we derive converse results for the oblivious transfer problem and the bit commitment problem by relating them to SK agreement. Finally, we derive a necessary condition for the feasibility of secure computation by trusted parties that seek to compute a function of their collective data, using an interactive public communication that by itself does not give away the value of the function. In many cases, we strengthen and improve upon previously known converse bounds. Our results are single-shot and use only the given joint distribution of the correlated observations. For the case when the correlated observations consist of independent and identically distributed (in time) sequences, we derive strong versions of previously known converses.
Resumo:
Coarse Grained Reconfigurable Architectures (CGRA) are emerging as embedded application processing units in computing platforms for Exascale computing. Such CGRAs are distributed memory multi- core compute elements on a chip that communicate over a Network-on-chip (NoC). Numerical Linear Algebra (NLA) kernels are key to several high performance computing applications. In this paper we propose a systematic methodology to obtain the specification of Compute Elements (CE) for such CGRAs. We analyze block Matrix Multiplication and block LU Decomposition algorithms in the context of a CGRA, and obtain theoretical bounds on communication requirements, and memory sizes for a CE. Support for high performance custom computations common to NLA kernels are met through custom function units (CFUs) in the CEs. We present results to justify the merits of such CFUs.
Resumo:
In this paper, we present an improved load distribution strategy, for arbitrarily divisible processing loads, to minimize the processing time in a distributed linear network of communicating processors by an efficient utilization of their front-ends. Closed-form solutions are derived, with the processing load originating at the boundary and at the interior of the network, under some important conditions on the arrangement of processors and links in the network. Asymptotic analysis is carried out to explore the ultimate performance limits of such networks. Two important theorems are stated regarding the optimal load sequence and the optimal load origination point. Comparative study of this new strategy with an earlier strategy is also presented.
Resumo:
For point to point multiple input multiple output systems, Dayal-Brehler-Varanasi have proved that training codes achieve the same diversity order as that of the underlying coherent space time block code (STBC) if a simple minimum mean squared error estimate of the channel formed using the training part is employed for coherent detection of the underlying STBC. In this letter, a similar strategy involving a combination of training, channel estimation and detection in conjunction with existing coherent distributed STBCs is proposed for noncoherent communication in Amplify-and-Forward (AF) relay networks. Simulation results show that the proposed simple strategy outperforms distributed differential space-time coding for AF relay networks. Finally, the proposed strategy is extended to asynchronous relay networks using orthogonal frequency division multiplexing.
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have implemented a Firewall with this architecture in reconflgurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results in both speed and area improvement when it is implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields. High throughput classification invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly in terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for the worst case packet size. The Firewall rule update involves only memory re-initialization in software without any hardware change.
Resumo:
High end network security applications demand high speed operation and large rule set support. Packet classification is the core functionality that demands high throughput in such applications. This paper proposes a packet classification architecture to meet such high throughput. We have Implemented a Firewall with this architecture in reconfigurable hardware. We propose an extension to Distributed Crossproducting of Field Labels (DCFL) technique to achieve scalable and high performance architecture. The implemented Firewall takes advantage of inherent structure and redundancy of rule set by using, our DCFL Extended (DCFLE) algorithm. The use of DCFLE algorithm results In both speed and area Improvement when It is Implemented in hardware. Although we restrict ourselves to standard 5-tuple matching, the architecture supports additional fields.High throughput classification Invariably uses Ternary Content Addressable Memory (TCAM) for prefix matching, though TCAM fares poorly In terms of area and power efficiency. Use of TCAM for port range matching is expensive, as the range to prefix conversion results in large number of prefixes leading to storage inefficiency. Extended TCAM (ETCAM) is fast and the most storage efficient solution for range matching. We present for the first time a reconfigurable hardware Implementation of ETCAM. We have implemented our Firewall as an embedded system on Virtex-II Pro FPGA based platform, running Linux with the packet classification in hardware. The Firewall was tested in real time with 1 Gbps Ethernet link and 128 sample rules. The packet classification hardware uses a quarter of logic resources and slightly over one third of memory resources of XC2VP30 FPGA. It achieves a maximum classification throughput of 50 million packet/s corresponding to 16 Gbps link rate for file worst case packet size. The Firewall rule update Involves only memory re-initialiization in software without any hardware change.
Resumo:
Distributed space time coding for wireless relay networks when the source, the destination and the relays have multiple antennas have been studied by Jing and Hassibi. In this set-up, the transmit and the receive signals at different antennas of the same relay are processed and designed independently, even though the antennas are colocated. In this paper, a wireless relay network with single antenna at the source and the destination and two antennas at each of the R relays is considered. A new class of distributed space time block codes called Co-ordinate Interleaved Distributed Space-Time Codes (CIDSTC) are introduced where, in the first phase, the source transmits a T-length complex vector to all the relays;and in the second phase, at each relay, the in-phase and quadrature component vectors of the received complex vectors at the two antennas are interleaved and processed before forwarding them to the destination. Compared to the scheme proposed by Jing-Hassibi, for T >= 4R, while providing the same asymptotic diversity order of 2R, CIDSTC scheme is shown to provide asymptotic coding gain with the cost of negligible increase in the processing complexity at the relays. However, for moderate and large values of P, CIDSTC scheme is shown to provide more diversity than that of the scheme proposed by Jing-Hassibi. CIDSTCs are shown to be fully diverse provided the information symbols take value from an appropriate multidimensional signal set.
Resumo:
Simultaneous consideration of both performance and reliability issues is important in the choice of computer architectures for real-time aerospace applications. One of the requirements for such a fault-tolerant computer system is the characteristic of graceful degradation. A shared and replicated resources computing system represents such an architecture. In this paper, a combinatorial model is used for the evaluation of the instruction execution rate of a degradable, replicated resources computing system such as a modular multiprocessor system. Next, a method is presented to evaluate the computation reliability of such a system utilizing a reliability graph model and the instruction execution rate. Finally, this computation reliability measure, which simultaneously describes both performance and reliability, is applied as a constraint in an architecture optimization model for such computing systems. Index Terms-Architecture optimization, computation
Resumo:
A relay network with N relays and a single source-destination pair is called a partially-coherent relay channel (PCRC) if the destination has perfect channel state information (CSI) of all the channels and the relays have only the phase information of the source-to-relay channels. In this paper, first, a new set of necessary and sufficient conditions for a space-time block code (STBC) to be single-symbol decodable (SSD) for colocated multiple antenna communication is obtained. Then, this is extended to a set of necessary and sufficient conditions for a distributed STBC (DSTBC) to be SSD for. a PCRC. Using this, several SSD DSTBCs for PCRC are identified. It is proved that even if a SSD STBC for a co-located MIMO channel does not satisfy the additional conditions for the code to be SSD for a PCRC, single-symbol decoding of it in a PCRC gives full-diversity and only coding gain is lost. It is shown that when a DSTBC is SSD for a PCRC, then arbitrary coordinate interleaving of the in-phase and quadrature-phase components of the variables does not disturb its SSD property for PCRC. Finally, it is shown that the possibility of channel phase compensation operation at the relay nodes using partial CSI at the relays increases the possible rate of SSD DSTBCs from (2)/(N) when the relays do not have CSI to(1)/(2), which is independent of N.
Resumo:
The Reeb graph tracks topology changes in level sets of a scalar function and finds applications in scientific visualization and geometric modeling. We describe an algorithm that constructs the Reeb graph of a Morse function defined on a 3-manifold. Our algorithm maintains connected components of the two dimensional levels sets as a dynamic graph and constructs the Reeb graph in O(nlogn+nlogg(loglogg)3) time, where n is the number of triangles in the tetrahedral mesh representing the 3-manifold and g is the maximum genus over all level sets of the function. We extend this algorithm to construct Reeb graphs of d-manifolds in O(nlogn(loglogn)3) time, where n is the number of triangles in the simplicial complex that represents the d-manifold. Our result is a significant improvement over the previously known O(n2) algorithm. Finally, we present experimental results of our implementation and demonstrate that our algorithm for 3-manifolds performs efficiently in practice.
Resumo:
Concurrency control (CC) algorithms are important in distributed database systems to ensure consistency of the database. A number of such algorithms are available in the literature. The issue of performance evaluation of these algorithms has been recognized to be important. However, only a few studies have been carried out towards this. This paper deals with the performance evaluation of a CC algorithm proposed by Rosenkrantz et al. through a detailed simulation study. In doing so, the algorithm has been modified so that it can, within itself, take care of the redundancy in the database. The influences of various system parameters and the transaction profile on the response time and on the degree of conflict are considered. The entire study has been carried out using the programming language SIMULA on a DEC-1090 system.