1000 resultados para fast decryption


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Edge-preserving smoothing is widely used in image processing and bilateral filtering is one way to achieve it. Bilateral filter is a nonlinear combination of domain and range filters. Implementing the classical bilateral filter is computationally intensive, owing to the nonlinearity of the range filter. In the standard form, the domain and range filters are Gaussian functions and the performance depends on the choice of the filter parameters. Recently, a constant time implementation of the bilateral filter has been proposed based on raisedcosine approximation to the Gaussian to facilitate fast implementation of the bilateral filter. We address the problem of determining the optimal parameters for raised-cosine-based constant time implementation of the bilateral filter. To determine the optimal parameters, we propose the use of Stein's unbiased risk estimator (SURE). The fast bilateral filter accelerates the search for optimal parameters by faster optimization of the SURE cost. Experimental results show that the SURE-optimal raised-cosine-based bilateral filter has nearly the same performance as the SURE-optimal standard Gaussian bilateral filter and the Oracle mean squared error (MSE)-based optimal bilateral filter.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The compatibility of the fast-tachocline scenario with a flux-transport dynamo model is explored. We employ a flux-transport dynamo model coupled with simple feedback formulae relating the thickness of the tachocline to the amplitude of the magnetic field or to the Maxwell stress. The dynamo model is found to be robust against the nonlinearity introduced by this simplified fast-tachocline mechanism. Solar-like butterfly diagrams are found to persist and, even without any parameter fitting, the overall thickness of the tachocline is well within the range admitted by helioseismic constraints. In the most realistic case of a time-and latitude-dependent tachocline thickness linked to the value of the Maxwell stress, both the thickness and its latitudinal dependence are in excellent agreement with seismic results. In nonparametric models, cycle-related temporal variations in tachocline thickness are somewhat larger than admitted by helioseismic constraints; we find, however, that introducing a further parameter into our feedback formula readily allows further fine tuning of the thickness variations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Opportunistic selection is a practically appealing technique that is used in multi-node wireless systems to maximize throughput, implement proportional fairness, etc. However, selection is challenging since the information about a node's channel gains is often available only locally at each node and not centrally. We propose a novel multiple access-based distributed selection scheme that generalizes the best features of the timer scheme, which requires minimal feedback but does not always guarantee successful selection, and the fast splitting scheme, which requires more feedback but guarantees successful selection. The proposed scheme's design explicitly accounts for feedback time overheads unlike the conventional splitting scheme and guarantees selection of the user with the highest metric unlike the timer scheme. We analyze and minimize the average time including feedback required by the scheme to select. With feedback overheads, the proposed scheme is scalable and considerably faster than several schemes proposed in the literature. Furthermore, the gains increase as the feedback overhead increases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for LVCSR systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication.In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on a 1138 word vocabulary RM1 task using Sphinx 3.7 system show that, for a typical case the matrix multiplication approach leads to overall speedup of 46%. Both the low-rank approximation methods increase the speedup to around 60%, with the former method increasing the word error rate (WER) from 3.2% to 6.6%, while the latter increases the WER from 3.2% to 3.5%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Decoding of linear space-time block codes (STBCs) with sphere-decoding (SD) is well known. A fast-version of the SD known as fast sphere decoding (FSD) has been recently studied by Biglieri, Hong and Viterbo. Viewing a linear STBC as a vector space spanned by its defining weight matrices over the real number field, we define a quadratic form (QF), called the Hurwitz-Radon QF (HRQF), on this vector space and give a QF interpretation of the FSD complexity of a linear STBC. It is shown that the FSD complexity is only a function of the weight matrices defining the code and their ordering, and not of the channel realization (even though the equivalent channel when SD is used depends on the channel realization) or the number of receive antennas. It is also shown that the FSD complexity is completely captured into a single matrix obtained from the HRQF. Moreover, for a given set of weight matrices, an algorithm to obtain a best ordering of them leading to the least FSD complexity is presented. The well known classes of low FSD complexity codes (multi-group decodable codes, fast decodable codes and fast group decodable codes) are presented in the framework of HRQF.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Channel-aware assignment of sub-channels to users in the downlink of an OFDMA system demands extensive feedback of channel state information (CSI) to the base station. Since the feedback bandwidth is often very scarce, schemes that limit feedback are necessary. We develop a novel, low feedback splitting-based algorithm for assigning each sub-channel to its best user, i.e., the user with the highest gain for that sub-channel among all users. The key idea behind the algorithm is that, at any time, each user contends for the sub-channel on which it has the largest channel gain among the unallocated sub-channels. Unlike other existing schemes, the algorithm explicitly handles multiple access control aspects associated with the feedback of CSI. A tractable asymptotic analysis of a system with a large number of users helps design the algorithm. It yields 50% to 65% throughput gains compared to an asymptotically optimal one-bit feedback scheme, when the number of users is as small as 10 or as large as 1000. The algorithm is fast and distributed, and scales with the number of users.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a hardware-software hybrid technique for modular multiplication over large binary fields. The technique involves application of Karatsuba-Ofman algorithm for polynomial multiplication and a novel technique for reduction. The proposed reduction technique is based on the popular repeated multiplication technique and Barrett reduction. We propose a new design of a parallel polynomial multiplier that serves as a hardware accelerator for large field multiplications. We show that the proposed reduction technique, accelerated using the modified polynomial multiplier, achieves significantly higher performance compared to a purely software technique and other hybrid techniques. We also show that the hybrid accelerated approach to modular field multiplication is significantly faster than the Montgomery algorithm based integrated multiplication approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A dynamical instability is observed in experimental studies on micro-channels of rectangular cross-section with smallest dimension 100 and 160 mu m in which one of the walls is made of soft gel. There is a spontaneous transition from an ordered, laminar flow to a chaotic and highly mixed flow state when the Reynolds number increases beyond a critical value. The critical Reynolds number, which decreases as the elasticity modulus of the soft wall is reduced, is as low as 200 for the softest wall used here (in contrast to 1200 for a rigid-walled channel) The instability onset is observed by the breakup of a dye-stream introduced in the centre of the micro-channel, as well as the onset of wall oscillations due to laser scattering from fluorescent beads embedded in the wall of the channel. The mixing time across a channel of width 1.5 mm, measured by dye-stream and outlet conductance experiments, is smaller by a factor of 10(5) than that for a laminar flow. The increased mixing rate comes at very little cost, because the pressure drop (energy requirement to drive the flow) increases continuously and modestly at transition. The deformed shape is reconstructed numerically, and computational fluid dynamics (CFD) simulations are carried out to obtain the pressure gradient and the velocity fields for different flow rates. The pressure difference across the channel predicted by simulations is in agreement with the experiments (within experimental errors) for flow rates where the dye stream is laminar, but the experimental pressure difference is higher than the simulation prediction after dye-stream breakup. A linear stability analysis is carried out using the parallel-flow approximation, in which the wall is modelled as a neo-Hookean elastic solid, and the simulation results for the mean velocity and pressure gradient from the CFD simulations are used as inputs. The stability analysis accurately predicts the Reynolds number (based on flow rate) at which an instability is observed in the dye stream, and it also predicts that the instability first takes place at the downstream converging section of the channel, and not at the upstream diverging section. The stability analysis also indicates that the destabilization is due to the modification of the flow and the local pressure gradient due to the wall deformation; if we assume a parabolic velocity profile with the pressure gradient given by the plane Poiseuille law, the flow is always found to be stable.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Using Genetic Algorithm, a global optimization method inspired by nature's evolutionary process, we have improved the quantitative refocused constant-time INEPT experiment (Q-INEPT-CT) of Makela et al. (JMR 204 (2010) 124-130) with various optimization constraints. The improved `average polarization transfer' and `min-max difference' of new delay sets effectively reduces the experimental time by a factor of two (compared with Q-INEPT-CT, Makela et al.) without compromising on accuracy. We also discuss a quantitative spectral editing technique based on average polarization transfer. (C) 2013 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose an eigenvalue based technique to solve the Homogeneous Quadratic Constrained Quadratic Programming problem (HQCQP) with at most three constraints which arise in many signal processing problems. Semi-Definite Relaxation (SDR) is the only known approach and is computationally intensive. We study the performance of the proposed fast eigen approach through simulations in the context of MIMO relays and show that the solution converges to the solution obtained using the SDR approach with significant reduction in complexity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An efficient parallelization algorithm for the Fast Multipole Method which aims to alleviate the parallelization bottleneck arising from lower job-count closer to root levels is presented. An electrostatic problem of 12 million non-uniformly distributed mesh elements is solved with 80-85% parallel efficiency in matrix setup and matrix-vector product using 60GB and 16 threads on shared memory architecture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of finding a satisfying assignment that minimizes the number of variables that are set to 1 is NP-complete even for a satisfiable 2-SAT formula. We call this problem MIN ONES 2-SAT. It generalizes the well-studied problem of finding the smallest vertex cover of a graph, which can be modeled using a 2-SAT formula with no negative literals. The natural parameterized version of the problem asks for a satisfying assignment of weight at most k. In this paper, we present a polynomial-time reduction from MIN ONES 2-SAT to VERTEX COVER without increasing the parameter and ensuring that the number of vertices in the reduced instance is equal to the number of variables of the input formula. Consequently, we conclude that this problem also has a simple 2-approximation algorithm and a 2k - c logk-variable kernel subsuming (or, in the case of kernels, improving) the results known earlier. Further, the problem admits algorithms for the parameterized and optimization versions whose runtimes will always match the runtimes of the best-known algorithms for the corresponding versions of vertex cover. Finally we show that the optimum value of the LP relaxation of the MIN ONES 2-SAT and that of the corresponding VERTEX COVER are the same. This implies that the (recent) results of VERTEX COVER version parameterized above the optimum value of the LP relaxation of VERTEX COVER carry over to the MIN ONES 2-SAT version parameterized above the optimum of the LP relaxation of MIN ONES 2-SAT. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Decoding of linear space-time block codes (STBCs) with sphere-decoding (SD) is well known. A fast-version of the SD known as fast sphere decoding (FSD) was introduced by Biglieri, Hong and Viterbo. Viewing a linear STBC as a vector space spanned by its defining weight matrices over the real number field, we define a quadratic form (QF), called the Hurwitz-Radon QF (HRQF), on this vector space and give a QF interpretation of the FSD complexity of a linear STBC. It is shown that the FSD complexity is only a function of the weight matrices defining the code and their ordering, and not of the channel realization (even though the equivalent channel when SD is used depends on the channel realization) or the number of receive antennas. It is also shown that the FSD complexity is completely captured into a single matrix obtained from the HRQF. Moreover, for a given set of weight matrices, an algorithm to obtain an optimal ordering of them leading to the least FSD complexity is presented. The well known classes of low FSD complexity codes (multi-group decodable codes, fast decodable codes and fast group decodable codes) are presented in the framework of HRQF.