427 resultados para linear complexity
em Indian Institute of Science - Bangalore - Índia
Resumo:
In this paper, we look at the problem of scheduling expression trees with reusable registers on delayed load architectures. Reusable registers come into the picture when the compiler has a data-flow analyzer which is able to estimate the extent of use of the registers. Earlier work considered the same problem without allowing for register variables. Subsequently, Venugopal considered non-reusable registers in the tree. We further extend these efforts to consider a much more general form of the tree. We describe an approximate algorithm for the problem. We formally prove that the code schedule produced by this algorithm will, in the worst case, generate one interlock and use just one more register than that used by the optimal schedule. Spilling is minimized. The approximate algorithm is simple and has linear complexity.
Resumo:
A Field Programmable Gate Array (FPGA) based hardware accelerator for multi-conductor parasitic capacitance extraction, using Method of Moments (MoM), is presented in this paper. Due to the prohibitive cost of solving a dense algebraic system formed by MoM, linear complexity fast solver algorithms have been developed in the past to expedite the matrix-vector product computation in a Krylov sub-space based iterative solver framework. However, as the number of conductors in a system increases leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products present a time bottleneck, especially for ill-conditioned system matrices. In this work, an FPGA based hardware implementation is proposed to parallelize the iterative matrix solution for multiple RHS vectors in a low-rank compression based fast solver scheme. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple conductors in a Ball Grid Array (BGA) package. Speed-ups up to 13x over equivalent software implementation on an Intel Core i5 processor for dense matrix-vector products and 12x for QR compressed matrix-vector products is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board.
Resumo:
3-D full-wave method of moments (MoM) based electromagnetic analysis is a popular means toward accurate solution of Maxwell's equations. The time and memory bottlenecks associated with such a solution have been addressed over the last two decades by linear complexity fast solver algorithms. However, the accurate solution of 3-D full-wave MoM on an arbitrary mesh of a package-board structure does not guarantee accuracy, since the discretization may not be fine enough to capture spatial changes in the solution variable. At the same time, uniform over-meshing on the entire structure generates a large number of solution variables and therefore requires an unnecessarily large matrix solution. In this paper, different refinement criteria are studied in an adaptive mesh refinement platform. Consequently, the most suitable conductor mesh refinement criterion for MoM-based electromagnetic package-board extraction is identified and the advantages of this adaptive strategy are demonstrated from both accuracy and speed perspectives. The results are also compared with those of the recently reported integral equation-based h-refinement strategy. Finally, a new methodology to expedite each adaptive refinement pass is proposed.
Resumo:
In this article, a Field Programmable Gate Array (FPGA)-based hardware accelerator for 3D electromagnetic extraction, using Method of Moments (MoM) is presented. As the number of nets or ports in a system increases, leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products presents a time bottleneck in a linear-complexity fast solver framework. In this work, an FPGA-based hardware implementation is proposed toward a two-level parallelization scheme: (i) matrix level parallelization for single RHS and (ii) pipelining for multiple-RHS. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple nets in a Ball Grid Array (BGA) package. The acceleration is shown to be linearly scalable with FPGA resources and speed-ups over 10x against equivalent software implementation on a 2.4GHz Intel Core i5 processor is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board with the implemented design operating at 200MHz clock frequency. (c) 2016 Wiley Periodicals, Inc. Microwave Opt Technol Lett 58:776-783, 2016
Resumo:
We consider the problem of deciding whether the output of a boolean circuit is determined by a partial assignment to its inputs. This problem is easily shown to be hard, i.e., co-Image Image -complete. However, many of the consequences of a partial input assignment may be determined in linear time, by iterating the following step: if we know the values of some inputs to a gate, we can deduce the values of some outputs of that gate. This process of iteratively deducing some of the consequences of a partial assignment is called propagation. This paper explores the parallel complexity of propagation, i.e., the complexity of determining whether the output of a given boolean circuit is determined by propagating a given partial input assignment. We give a complete classification of the problem into those cases that are Image -complete and those that are unlikely to be Image complete.
Resumo:
In this paper, we generalize the existing rate-one space frequency (SF) and space-time frequency (STF) code constructions. The objective of this exercise is to provide a systematic design of full-diversity STF codes with high coding gain. Under this generalization, STF codes are formulated as linear transformations of data. Conditions on these linear transforms are then derived so that the resulting STF codes achieve full diversity and high coding gain with a moderate decoding complexity. Many of these conditions involve channel parameters like delay profile (DP) and temporal correlation. When these quantities are not available at the transmitter, design of codes that exploit full diversity on channels with arbitrary DIP and temporal correlation is considered. Complete characterization of a class of such robust codes is provided and their bit error rate (BER) performance is evaluated. On the other hand, when channel DIP and temporal correlation are available at the transmitter, linear transforms are optimized to maximize the coding gain of full-diversity STF codes. BER performance of such optimized codes is shown to be better than those of existing codes.
Resumo:
An efficient geometrical design rule checker is proposed, based on operations on quadtrees, which represent VLSI mask layouts. The time complexity of the design rule checker is O(N), where N is the number of polygons in the mask. A pseudoPascal description is provided of all the important algorithms for geometrical design rule verification.
Resumo:
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.
Resumo:
A half-duplex constrained non-orthogonal cooperative multiple access (NCMA) protocol suitable for transmission of information from N users to a single destination in a wireless fading channel is proposed. Transmission in this protocol comprises of a broadcast phase and a cooperation phase. In the broadcast phase, each user takes turn broadcasting its data to all other users and the destination in an orthogonal fashion in time. In the cooperation phase, each user transmits a linear function of what it received from all other users as well as its own data. In contrast to the orthogonal extension of cooperative relay protocols to the cooperative multiple access channels wherein at any point of time, only one user is considered as a source and all the other users behave as relays and do not transmit their own data, the NCMA protocol relaxes the orthogonality built into the protocols and hence allows for a more spectrally efficient usage of resources. Code design criteria for achieving full diversity of N in the NCMA protocol is derived using pair wise error probability (PEP) analysis and it is shown that this can be achieved with a minimum total time duration of 2N - 1 channel uses. Explicit construction of full diversity codes is then provided for arbitrary number of users. Since the Maximum Likelihood decoding complexity grows exponentially with the number of users, the notion of g-group decodable codes is introduced for our setup and a set of necesary and sufficient conditions is also obtained.
Resumo:
Support Vector Machines(SVMs) are hyperplane classifiers defined in a kernel induced feature space. The data size dependent training time complexity of SVMs usually prohibits its use in applications involving more than a few thousands of data points. In this paper we propose a novel kernel based incremental data clustering approach and its use for scaling Non-linear Support Vector Machines to handle large data sets. The clustering method introduced can find cluster abstractions of the training data in a kernel induced feature space. These cluster abstractions are then used for selective sampling based training of Support Vector Machines to reduce the training time without compromising the generalization performance. Experiments done with real world datasets show that this approach gives good generalization performance at reasonable computational expense.
Resumo:
The problem of designing high rate, full diversity noncoherent space-time block codes (STBCs) with low encoding and decoding complexity is addressed. First, the notion of g-group encodable and g-group decodable linear STBCs is introduced. Then for a known class of rate-1 linear designs, an explicit construction of fully-diverse signal sets that lead to four-group encodable and four-group decodable differential scaled unitary STBCs for any power of two number of antennas is provided. Previous works on differential STBCs either sacrifice decoding complexity for higher rate or sacrifice rate for lower decoding complexity.
Resumo:
The problem of designing high rate, full diversity noncoherent space-time block codes (STBCs) with low encoding and decoding complexity is addressed. First, the notion of g-group encodable and g-group decodable linear STBCs is introduced. Then for a known class of rate-1 linear designs, an explicit construction of fully-diverse signal sets that lead to four-group encodable and four-group decodable differential scaled unitary STBCs for any power of two number of antennas is provided. Previous works on differential STBCs either sacrifice decoding complexity for higher rate or sacrifice rate for lower decoding complexity.
Resumo:
The problem of determining whether a Tanner graph for a linear block code has a stopping set of a given size is shown to be NT-complete.
Resumo:
Space-Time Block Codes (STBCs) from Complex Orthogonal Designs (CODs) are single-symbol decodable/symbol-by-symbol decodable (SSD); however, SSD codes are obtainable from designs that are not CODs. Recently, two such classes of SSD codes have been studied: (i) Coordinate Interleaved Orthogonal Designs (CIODs) and (ii) Minimum-Decoding-Complexity (MDC) STBCs from Quasi-ODs (QODs). The class of CIODs have non-unitary weight matrices when written as a Linear Dispersion Code (LDC) proposed by Hassibi and Hochwald, whereas the other class of SSD codes including CODs have unitary weight matrices. In this paper, we construct a large class of SSD codes with nonunitary weight matrices. Also, we show that the class of CIODs is a special class of our construction.
Resumo:
A linear time approximate maximum likelihood decoding algorithm on tail-biting trellises is presented, that requires exactly two rounds on the trellis. This is an adaptation of an algorithm proposed earlier with the advantage that it reduces the time complexity from O(m log m) to O(m) where m is the number of nodes in the tail-biting trellis. A necessary condition for the output of the algorithm to differ from the output of the ideal ML decoder is deduced and simulation results on an AWGN channel using tail-biting trellises for two rate 1/2 convolutional codes with memory 4 and 6 respectively, are reported.