360 resultados para Fast Algorithm
Resumo:
The contour tree is a topological abstraction of a scalar field that captures evolution in level set connectivity. It is an effective representation for visual exploration and analysis of scientific data. We describe a work-efficient, output sensitive, and scalable parallel algorithm for computing the contour tree of a scalar field defined on a domain that is represented using either an unstructured mesh or a structured grid. A hybrid implementation of the algorithm using the GPU and multi-core CPU can compute the contour tree of an input containing 16 million vertices in less than ten seconds with a speedup factor of upto 13. Experiments based on an implementation in a multi-core CPU environment show near-linear speedup for large data sets.
Resumo:
We propose an eigenvalue based technique to solve the Homogeneous Quadratic Constrained Quadratic Programming problem (HQCQP) with at most three constraints which arise in many signal processing problems. Semi-Definite Relaxation (SDR) is the only known approach and is computationally intensive. We study the performance of the proposed fast eigen approach through simulations in the context of MIMO relays and show that the solution converges to the solution obtained using the SDR approach with significant reduction in complexity.
Resumo:
Motivated by the observation that communities in real world social networks form due to actions of rational individuals in networks, we propose a novel game theory inspired algorithm to determine communities in networks. The algorithm is decentralized and only uses local information at each node. We show the efficacy of the proposed algorithm through extensive experimentation on several real world social network data sets.
Resumo:
We consider the problem of developing privacy-preserving machine learning algorithms in a dis-tributed multiparty setting. Here different parties own different parts of a data set, and the goal is to learn a classifier from the entire data set with-out any party revealing any information about the individual data points it owns. Pathak et al [7]recently proposed a solution to this problem in which each party learns a local classifier from its own data, and a third party then aggregates these classifiers in a privacy-preserving manner using a cryptographic scheme. The generaliza-tion performance of their algorithm is sensitive to the number of parties and the relative frac-tions of data owned by the different parties. In this paper, we describe a new differentially pri-vate algorithm for the multiparty setting that uses a stochastic gradient descent based procedure to directly optimize the overall multiparty ob-jective rather than combining classifiers learned from optimizing local objectives. The algorithm achieves a slightly weaker form of differential privacy than that of [7], but provides improved generalization guarantees that do not depend on the number of parties or the relative sizes of the individual data sets. Experimental results corrob-orate our theoretical findings.
Resumo:
Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a runtime to initiate transfers as necessary. This allows us to avoid redundant transfers that are exhibited by all other existing automatic memory management proposals. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only and compiler-only proposals, it also transfers 2.2x to 13.3x less data on average.
Resumo:
The problem of designing good Space-Time Block Codes (STBCs) with low maximum-likelihood (ML) decoding complexity has gathered much attention in the literature. All the known low ML decoding complexity techniques utilize the same approach of exploiting either the multigroup decodable or the fast-decodable (conditionally multigroup decodable) structure of a code. We refer to this well known technique of decoding STBCs as Conditional ML (CML) decoding. In [1], we introduced a framework to construct ML decoders for STBCs based on the Generalized Distributive Law (GDL) and the Factor-graph based Sum-Product Algorithm, and showed that for two specific families of STBCs, the Toepltiz codes and the Overlapped Alamouti Codes (OACs), the GDL based ML decoders have strictly less complexity than the CML decoders. In this paper, we introduce a `traceback' step to the GDL decoding algorithm of STBCs, which enables roughly 4 times reduction in the complexity of the GDL decoders proposed in [1]. Utilizing this complexity reduction from `traceback', we then show that for any STBC (not just the Toeplitz and Overlapped Alamouti Codes), the GDL decoding complexity is strictly less than the CML decoding complexity. For instance, for any STBC obtained from Cyclic Division Algebras that is not multigroup or conditionally multigroup decodable, the GDL decoder provides approximately 12 times reduction in complexity compared to the CML decoder. Similarly, for the Golden code, which is conditionally multigroup decodable, the GDL decoder is only about half as complex as the CML decoder.
Resumo:
In Orthogonal Frequency Division Multiplexing and Discrete Multitone transceivers, a guard interval called Cyclic Prefix (CP) is inserted to avoid inter-symbol interference. The length of the CP is usually greater than the impulse response of the channel resulting in a loss of useful data carriers. In order to avoid long CP, a time domain equalizer is used to shorten the channel. In this paper, we propose a method to include a delay in the zero-forcing equalizer and obtain an optimal value of the delay, based on the location of zeros of the channel. The performance of the algorithms is studied using numerical simulations.
Resumo:
In wireless sensor networks (WSNs) the communication traffic is often time and space correlated, where multiple nodes in a proximity start transmitting at the same time. Such a situation is known as spatially correlated contention. The random access methods to resolve such contention suffers from high collision rate, whereas the traditional distributed TDMA scheduling techniques primarily try to improve the network capacity by reducing the schedule length. Usually, the situation of spatially correlated contention persists only for a short duration and therefore generating an optimal or sub-optimal schedule is not very useful. On the other hand, if the algorithm takes very large time to schedule, it will not only introduce additional delay in the data transfer but also consume more energy. To efficiently handle the spatially correlated contention in WSNs, we present a distributed TDMA slot scheduling algorithm, called DTSS algorithm. The DTSS algorithm is designed with the primary objective of reducing the time required to perform scheduling, while restricting the schedule length to maximum degree of interference graph. The algorithm uses randomized TDMA channel access as the mechanism to transmit protocol messages, which bounds the message delay and therefore reduces the time required to get a feasible schedule. The DTSS algorithm supports unicast, multicast and broadcast scheduling, simultaneously without any modification in the protocol. The protocol has been simulated using Castalia simulator to evaluate the run time performance. Simulation results show that our protocol is able to considerably reduce the time required to schedule.
Resumo:
In this paper, we propose a low-complexity algorithm based on Markov chain Monte Carlo (MCMC) technique for signal detection on the uplink in large scale multiuser multiple input multiple output (MIMO) systems with tens to hundreds of antennas at the base station (BS) and similar number of uplink users. The algorithm employs a randomized sampling method (which makes a probabilistic choice between Gibbs sampling and random sampling in each iteration) for detection. The proposed algorithm alleviates the stalling problem encountered at high SNRs in conventional MCMC algorithm and achieves near-optimal performance in large systems with M-QAM. A novel ingredient in the algorithm that is responsible for achieving near-optimal performance at low complexities is the joint use of a randomized MCMC (R-MCMC) strategy coupled with a multiple restart strategy with an efficient restart criterion. Near-optimal detection performance is demonstrated for large number of BS antennas and users (e.g., 64, 128, 256 BS antennas/users).
Resumo:
In this paper, a new method is proposed to obtain full-diversity, rate-2 (rate of two complex symbols per channel use) space-time block codes (STBCs) that are full-rate for multiple input double output (MIDO) systems. Using this method, rate-2 STBCs for 4 x 2, 6 x 2, 8 x 2, and 12 x 2 systems are constructed and these STBCs are fast ML-decodable, have large coding gains, and STBC-schemes consisting of these STBCs have a non-vanishing determinant (NVD) so that they are DMT-optimal for their respective MIDO systems. It is also shown that the Srinath-Rajan code for the 4 x 2 system, which has the lowest ML-decoding complexity among known rate-2 STBCs for the 4x2 MIDO system with a large coding gain for 4-/16-QAM, has the same algebraic structure as the STBC constructed in this paper for the 4 x 2 system. This also settles in positive a previous conjecture that the STBC-scheme that is based on the Srinath-Rajan code has the NVD property and hence is DMT-optimal for the 4 x 2 system.
Resumo:
Opportunistic selection selects the node that improves the overall system performance the most. Selecting the best node is challenging as the nodes are geographically distributed and have only local knowledge. Yet, selection must be fast to allow more time to be spent on data transmission, which exploits the selected node's services. We analyze the impact of imperfect power control on a fast, distributed, splitting based selection scheme that exploits the capture effect by allowing the transmitting nodes to have different target receive powers and uses information about the total received power to speed up selection. Imperfect power control makes the received power deviate from the target and, hence, affects performance. Our analysis quantifies how it changes the selection probability, reduces the selection speed, and leads to the selection of no node or a wrong node. We show that the effect of imperfect power control is primarily driven by the ratio of target receive powers. Furthermore, we quantify its effect on the net system throughput.
Resumo:
Yaw rate of a vehicle is highly influenced by the lateral forces generated at the tire contact patch to attain the desired lateral acceleration, and/or by external disturbances resulting from factors such as crosswinds, flat tire or, split-μ braking. The presence of the latter and the insufficiency of the former may lead to undesired yaw motion of a vehicle. This paper proposes a steer-by-wire system based on fuzzy logic as yaw-stability controller for a four-wheeled road vehicle with active front steering. The dynamics governing the yaw behavior of the vehicle has been modeled in MATLAB/Simulink. The fuzzy controller receives the yaw rate error of the vehicle and the steering signal given by the driver as inputs and generates an additional steering angle as output which provides the corrective yaw moment. The results of simulations with various drive input signals show that the yaw stability controller using fuzzy logic proposed in the current study has a good performance in situations involving unexpected yaw motion. The yaw rate errors of a vehicle having the proposed controller are notably smaller than an uncontrolled vehicle's, and the vehicle having the yaw stability controller recovers lateral distance and desired yaw rate more quickly than the uncontrolled vehicle.
Resumo:
Spatial resolution in photoacoustic and thermoacoustic tomography is ultrasound transducer (detector) bandwidth limited. For a circular scanning geometry the axial (radial) resolution is not affected by the detector aperture, but the tangential (lateral) resolution is highly dependent on the aperture size, and it is also spatially varying (depending on the location relative to the scanning center). Several approaches have been reported to counter this problem by physically attaching a negative acoustic lens in front of the nonfocused transducer or by using virtual point detectors. Here, we have implemented a modified delay-and-sum reconstruction method, which takes into account the large aperture of the detector, leading to more than fivefold improvement in the tangential resolution in photoacoustic (and thermoacoustic) tomography. Three different types of numerical phantoms were used to validate our reconstruction method. It is also shown that we were able to preserve the shape of the reconstructed objects with the modified algorithm. (C) 2014 Optical Society of America
Resumo:
We present a new method for rapid NMR data acquisition and assignments applicable to unlabeled (C-12) or C-13-labeled biomolecules/organic molecules in general and metabolomics in particular. The method involves the acquisition of three two dimensional (2D) NMR spectra simultaneously using a dual receiver system. The three spectra, namely: (1) G-matrix Fourier transform (GFT) (3,2)D C-13, H-1] HSQC-TOCSY, (2) 2D H-1-H-1 TOCSY and (3) 2D C-13-H-1 HETCOR are acquired in a single experiment and provide mutually complementary information to completely assign individual metabolites in a mixture. The GFT (3,2)D C-13, H-1] HSQC-TOCSY provides 3D correlations in a reduced dimensionality manner facilitating high resolution and unambiguous assignments. The experiments were applied for complete H-1 and C-13 assignments of a mixture of 21 unlabeled metabolites corresponding to a medium used in assisted reproductive technology. Taken together, the experiments provide time gain of order of magnitudes compared to the conventional data acquisition methods and can be combined with other fast NMR techniques such as non-uniform sampling and covariance spectroscopy. This provides new avenues for using multiple receivers and projection NMR techniques for high-throughput approaches in metabolomics.
Resumo:
The direct and accurate determination of heteronuclear ((n)J(HX), X = F-19, P-31) couplings from the one dimensional H-1-NMR spectrum is severely hampered due to the simultaneous presence of large numbers of (n)J(HH). The present study demonstrates the utility of the pure shift NMR approach for spectral simplification, and precise and direct measurement of heteronuclear couplings. As a consequence of refocusing of homonuclear couplings ((n)J(HH)) by the pure shift NMR, only heteronuclear couplings ((n)J(HX)) appear as simple multiplets at the resonance position of each chemically non-equivalent proton, enabling their direct measurement from the 1D-H-1 spectrum. The experiment is demonstrated on a number of molecules containing either F-19 or P-31, where (n)J(HF) and (n)J(HP) could be precisely measured in a straightforward manner. The distinct advantage of the experiment is demonstrated on molecules containing more than one fluorine atom, where most of the available NMR experiments fail or have restricted utility.