975 resultados para Fast foods
Resumo:
Opportunistic selection is a practically appealing technique that is used in multi-node wireless systems to maximize throughput, implement proportional fairness, etc. However, selection is challenging since the information about a node's channel gains is often available only locally at each node and not centrally. We propose a novel multiple access-based distributed selection scheme that generalizes the best features of the timer scheme, which requires minimal feedback but does not always guarantee successful selection, and the fast splitting scheme, which requires more feedback but guarantees successful selection. The proposed scheme's design explicitly accounts for feedback time overheads unlike the conventional splitting scheme and guarantees selection of the user with the highest metric unlike the timer scheme. We analyze and minimize the average time including feedback required by the scheme to select. With feedback overheads, the proposed scheme is scalable and considerably faster than several schemes proposed in the literature. Furthermore, the gains increase as the feedback overhead increases.
Resumo:
Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for LVCSR systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication.In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on a 1138 word vocabulary RM1 task using Sphinx 3.7 system show that, for a typical case the matrix multiplication approach leads to overall speedup of 46%. Both the low-rank approximation methods increase the speedup to around 60%, with the former method increasing the word error rate (WER) from 3.2% to 6.6%, while the latter increases the WER from 3.2% to 3.5%.
Resumo:
Decoding of linear space-time block codes (STBCs) with sphere-decoding (SD) is well known. A fast-version of the SD known as fast sphere decoding (FSD) has been recently studied by Biglieri, Hong and Viterbo. Viewing a linear STBC as a vector space spanned by its defining weight matrices over the real number field, we define a quadratic form (QF), called the Hurwitz-Radon QF (HRQF), on this vector space and give a QF interpretation of the FSD complexity of a linear STBC. It is shown that the FSD complexity is only a function of the weight matrices defining the code and their ordering, and not of the channel realization (even though the equivalent channel when SD is used depends on the channel realization) or the number of receive antennas. It is also shown that the FSD complexity is completely captured into a single matrix obtained from the HRQF. Moreover, for a given set of weight matrices, an algorithm to obtain a best ordering of them leading to the least FSD complexity is presented. The well known classes of low FSD complexity codes (multi-group decodable codes, fast decodable codes and fast group decodable codes) are presented in the framework of HRQF.
Resumo:
Channel-aware assignment of sub-channels to users in the downlink of an OFDMA system demands extensive feedback of channel state information (CSI) to the base station. Since the feedback bandwidth is often very scarce, schemes that limit feedback are necessary. We develop a novel, low feedback splitting-based algorithm for assigning each sub-channel to its best user, i.e., the user with the highest gain for that sub-channel among all users. The key idea behind the algorithm is that, at any time, each user contends for the sub-channel on which it has the largest channel gain among the unallocated sub-channels. Unlike other existing schemes, the algorithm explicitly handles multiple access control aspects associated with the feedback of CSI. A tractable asymptotic analysis of a system with a large number of users helps design the algorithm. It yields 50% to 65% throughput gains compared to an asymptotically optimal one-bit feedback scheme, when the number of users is as small as 10 or as large as 1000. The algorithm is fast and distributed, and scales with the number of users.
Resumo:
In this paper we present a hardware-software hybrid technique for modular multiplication over large binary fields. The technique involves application of Karatsuba-Ofman algorithm for polynomial multiplication and a novel technique for reduction. The proposed reduction technique is based on the popular repeated multiplication technique and Barrett reduction. We propose a new design of a parallel polynomial multiplier that serves as a hardware accelerator for large field multiplications. We show that the proposed reduction technique, accelerated using the modified polynomial multiplier, achieves significantly higher performance compared to a purely software technique and other hybrid techniques. We also show that the hybrid accelerated approach to modular field multiplication is significantly faster than the Montgomery algorithm based integrated multiplication approach.
Resumo:
A dynamical instability is observed in experimental studies on micro-channels of rectangular cross-section with smallest dimension 100 and 160 mu m in which one of the walls is made of soft gel. There is a spontaneous transition from an ordered, laminar flow to a chaotic and highly mixed flow state when the Reynolds number increases beyond a critical value. The critical Reynolds number, which decreases as the elasticity modulus of the soft wall is reduced, is as low as 200 for the softest wall used here (in contrast to 1200 for a rigid-walled channel) The instability onset is observed by the breakup of a dye-stream introduced in the centre of the micro-channel, as well as the onset of wall oscillations due to laser scattering from fluorescent beads embedded in the wall of the channel. The mixing time across a channel of width 1.5 mm, measured by dye-stream and outlet conductance experiments, is smaller by a factor of 10(5) than that for a laminar flow. The increased mixing rate comes at very little cost, because the pressure drop (energy requirement to drive the flow) increases continuously and modestly at transition. The deformed shape is reconstructed numerically, and computational fluid dynamics (CFD) simulations are carried out to obtain the pressure gradient and the velocity fields for different flow rates. The pressure difference across the channel predicted by simulations is in agreement with the experiments (within experimental errors) for flow rates where the dye stream is laminar, but the experimental pressure difference is higher than the simulation prediction after dye-stream breakup. A linear stability analysis is carried out using the parallel-flow approximation, in which the wall is modelled as a neo-Hookean elastic solid, and the simulation results for the mean velocity and pressure gradient from the CFD simulations are used as inputs. The stability analysis accurately predicts the Reynolds number (based on flow rate) at which an instability is observed in the dye stream, and it also predicts that the instability first takes place at the downstream converging section of the channel, and not at the upstream diverging section. The stability analysis also indicates that the destabilization is due to the modification of the flow and the local pressure gradient due to the wall deformation; if we assume a parabolic velocity profile with the pressure gradient given by the plane Poiseuille law, the flow is always found to be stable.
Resumo:
Using Genetic Algorithm, a global optimization method inspired by nature's evolutionary process, we have improved the quantitative refocused constant-time INEPT experiment (Q-INEPT-CT) of Makela et al. (JMR 204 (2010) 124-130) with various optimization constraints. The improved `average polarization transfer' and `min-max difference' of new delay sets effectively reduces the experimental time by a factor of two (compared with Q-INEPT-CT, Makela et al.) without compromising on accuracy. We also discuss a quantitative spectral editing technique based on average polarization transfer. (C) 2013 Elsevier Inc. All rights reserved.
Resumo:
We propose an eigenvalue based technique to solve the Homogeneous Quadratic Constrained Quadratic Programming problem (HQCQP) with at most three constraints which arise in many signal processing problems. Semi-Definite Relaxation (SDR) is the only known approach and is computationally intensive. We study the performance of the proposed fast eigen approach through simulations in the context of MIMO relays and show that the solution converges to the solution obtained using the SDR approach with significant reduction in complexity.
Resumo:
An efficient parallelization algorithm for the Fast Multipole Method which aims to alleviate the parallelization bottleneck arising from lower job-count closer to root levels is presented. An electrostatic problem of 12 million non-uniformly distributed mesh elements is solved with 80-85% parallel efficiency in matrix setup and matrix-vector product using 60GB and 16 threads on shared memory architecture.
Resumo:
The problem of finding a satisfying assignment that minimizes the number of variables that are set to 1 is NP-complete even for a satisfiable 2-SAT formula. We call this problem MIN ONES 2-SAT. It generalizes the well-studied problem of finding the smallest vertex cover of a graph, which can be modeled using a 2-SAT formula with no negative literals. The natural parameterized version of the problem asks for a satisfying assignment of weight at most k. In this paper, we present a polynomial-time reduction from MIN ONES 2-SAT to VERTEX COVER without increasing the parameter and ensuring that the number of vertices in the reduced instance is equal to the number of variables of the input formula. Consequently, we conclude that this problem also has a simple 2-approximation algorithm and a 2k - c logk-variable kernel subsuming (or, in the case of kernels, improving) the results known earlier. Further, the problem admits algorithms for the parameterized and optimization versions whose runtimes will always match the runtimes of the best-known algorithms for the corresponding versions of vertex cover. Finally we show that the optimum value of the LP relaxation of the MIN ONES 2-SAT and that of the corresponding VERTEX COVER are the same. This implies that the (recent) results of VERTEX COVER version parameterized above the optimum value of the LP relaxation of VERTEX COVER carry over to the MIN ONES 2-SAT version parameterized above the optimum of the LP relaxation of MIN ONES 2-SAT. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.
Resumo:
Decoding of linear space-time block codes (STBCs) with sphere-decoding (SD) is well known. A fast-version of the SD known as fast sphere decoding (FSD) was introduced by Biglieri, Hong and Viterbo. Viewing a linear STBC as a vector space spanned by its defining weight matrices over the real number field, we define a quadratic form (QF), called the Hurwitz-Radon QF (HRQF), on this vector space and give a QF interpretation of the FSD complexity of a linear STBC. It is shown that the FSD complexity is only a function of the weight matrices defining the code and their ordering, and not of the channel realization (even though the equivalent channel when SD is used depends on the channel realization) or the number of receive antennas. It is also shown that the FSD complexity is completely captured into a single matrix obtained from the HRQF. Moreover, for a given set of weight matrices, an algorithm to obtain an optimal ordering of them leading to the least FSD complexity is presented. The well known classes of low FSD complexity codes (multi-group decodable codes, fast decodable codes and fast group decodable codes) are presented in the framework of HRQF.
Resumo:
In this paper, a new method is proposed to obtain full-diversity, rate-2 (rate of two complex symbols per channel use) space-time block codes (STBCs) that are full-rate for multiple input double output (MIDO) systems. Using this method, rate-2 STBCs for 4 x 2, 6 x 2, 8 x 2, and 12 x 2 systems are constructed and these STBCs are fast ML-decodable, have large coding gains, and STBC-schemes consisting of these STBCs have a non-vanishing determinant (NVD) so that they are DMT-optimal for their respective MIDO systems. It is also shown that the Srinath-Rajan code for the 4 x 2 system, which has the lowest ML-decoding complexity among known rate-2 STBCs for the 4x2 MIDO system with a large coding gain for 4-/16-QAM, has the same algebraic structure as the STBC constructed in this paper for the 4 x 2 system. This also settles in positive a previous conjecture that the STBC-scheme that is based on the Srinath-Rajan code has the NVD property and hence is DMT-optimal for the 4 x 2 system.
Resumo:
Opportunistic selection selects the node that improves the overall system performance the most. Selecting the best node is challenging as the nodes are geographically distributed and have only local knowledge. Yet, selection must be fast to allow more time to be spent on data transmission, which exploits the selected node's services. We analyze the impact of imperfect power control on a fast, distributed, splitting based selection scheme that exploits the capture effect by allowing the transmitting nodes to have different target receive powers and uses information about the total received power to speed up selection. Imperfect power control makes the received power deviate from the target and, hence, affects performance. Our analysis quantifies how it changes the selection probability, reduces the selection speed, and leads to the selection of no node or a wrong node. We show that the effect of imperfect power control is primarily driven by the ratio of target receive powers. Furthermore, we quantify its effect on the net system throughput.
Resumo:
A nearly constant switching frequency current hysteresis controller for a 2-level inverter fed induction motor drive is proposed in this paper: The salient features of this controller are fast dynamics for the current, inherent protection against overloads and less switching frequency variation. The large variation of switching frequency as in the conventional hysteresis controller is avoided by defining a current-error boundary which is obtained from the current-error trajectory of the standard space vector PWM. The current-error boundary is computed at every sampling interval based on the induction machine parameters and from the estimated fundamental stator voltage. The stator currents are always monitored and when the current-error exceeds the boundary, voltage space vector is switched to reduce the current-error. The proposed boundary computation algorithm is applicable in linear and over-modulation region and it is simple to implement in any standard digital signal processor: Detailed experimental verification is done using a 7.5 kW induction motor and the results are given to show the performance of the drive at various operating conditions and validate the proposed advantages.