123 resultados para kernel estimators
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
The paper addresses experiments and modeling studies on the use of producer gas, a bio-derived low energy content fuel in a spark-ignited engine. Producer gas, generated in situ, has thermo-physical properties different from those of fossil fuel(s). Experiments on naturally aspirated and turbo-charged engine operation and subsequent analysis of the cylinder pressure traces reveal significant differences in the heat release pattern within the cylinder compared with a typical fossil fuel. The heat release patterns for gasoline and producer gas compare well in the initial 50% but beyond this, producer gas combustion tends to be sluggish leading to an overall increase in the combustion duration. This is rather unexpected considering that producer gas with nearly 20% hydrogen has higher flame speeds than gasoline. The influence of hydrogen on the initial flame kernel development period and the combustion duration and hence on the overall heat release pattern is addressed. The significant deviations in the heat release profiles between conventional fuels and producer gas necessitates the estimation of producer gas-specific Wiebe coefficients. The experimental heat release profiles are used for estimating the Wiebe coefficients. Experimental evidence of lower fuel conversion efficiency based on the chemical and thermal analysis of the engine exhaust gas is used to arrive at the Wiebe coefficients. The efficiency factor a is found to be 2.4 while the shape factor m is estimated at 0.7 for 2% to 90% burn duration. The standard Wiebe coefficients for conventional fuels and fuel-specific coefficients for producer gas are used in a zero D model to predict the performance of a 6-cylinder gas engine under naturally aspirated and turbo-charged conditions. While simulation results with standard Wiebe coefficients result in excessive deviations from the experimental results, excellent match is observed when producer gas-specific coefficients are used. Predictions using the same coefficients on a 3-cylinder gas engine having different geometry and compression ratio(s) indicate close match with the experimental traces highlighting the versatility of the coefficients.
Resumo:
Medical image segmentation finds application in computer-aided diagnosis, computer-guided surgery, measuring tissue volumes, locating tumors, and pathologies. One approach to segmentation is to use active contours or snakes. Active contours start from an initialization (often manually specified) and are guided by image-dependent forces to the object boundary. Snakes may also be guided by gradient vector fields associated with an image. The first main result in this direction is that of Xu and Prince, who proposed the notion of gradient vector flow (GVF), which is computed iteratively. We propose a new formalism to compute the vector flow based on the notion of bilateral filtering of the gradient field associated with the edge map - we refer to it as the bilateral vector flow (BVF). The range kernel definition that we employ is different from the one employed in the standard Gaussian bilateral filter. The advantage of the BVF formalism is that smooth gradient vector flow fields with enhanced edge information can be computed noniteratively. The quality of image segmentation turned out to be on par with that obtained using the GVF and in some cases better than the GVF.
Resumo:
We address the problem of speech enhancement using a risk- estimation approach. In particular, we propose the use the Stein’s unbiased risk estimator (SURE) for solving the problem. The need for a suitable finite-sample risk estimator arises because the actual risks invariably depend on the unknown ground truth. We consider the popular mean-squared error (MSE) criterion first, and then compare it against the perceptually-motivated Itakura-Saito (IS) distortion, by deriving unbiased estimators of the corresponding risks. We use a generalized SURE (GSURE) development, recently proposed by Eldar for MSE. We consider dependent observation models from the exponential family with an additive noise model,and derive an unbiased estimator for the risk corresponding to the IS distortion, which is non-quadratic. This serves to address the speech enhancement problem in a more general setting. Experimental results illustrate that the IS metric is efficient in suppressing musical noise, which affects the MSE-enhanced speech. However, in terms of global signal-to-noise ratio (SNR), the minimum MSE solution gives better results.
Resumo:
Let be a noncompact symmetric space of higher rank. We consider two types of averages of functions: one, over level sets of the heat kernel on and the other, over geodesic spheres. We prove injectivity results for functions in which extend the results in Pati and Sitaram (Sankya Ser A 62:419-424, 2000).
Resumo:
We address the problem of sampling and reconstruction of two-dimensional (2-D) finite-rate-of-innovation (FRI) signals. We propose a three-channel sampling method for efficiently solving the problem. We consider the sampling of a stream of 2-D Dirac impulses and a sum of 2-D unit-step functions. We propose a 2-D causal exponential function as the sampling kernel. By causality in 2-D, we mean that the function has its support restricted to the first quadrant. The advantage of using a multichannel sampling method with causal exponential sampling kernel is that standard annihilating filter or root-finding algorithms are not required. Further, the proposed method has inexpensive hardware implementation and is numerically stable as the number of Dirac impulses increases.
Resumo:
We consider nonparametric or universal sequential hypothesis testing when the distribution under the null hypothesis is fully known but the alternate hypothesis corresponds to some other unknown distribution. These algorithms are primarily motivated from spectrum sensing in Cognitive Radios and intruder detection in wireless sensor networks. We use easily implementable universal lossless source codes to propose simple algorithms for such a setup. The algorithms are first proposed for discrete alphabet. Their performance and asymptotic properties are studied theoretically. Later these are extended to continuous alphabets. Their performance with two well known universal source codes, Lempel-Ziv code and KT-estimator with Arithmetic Encoder are compared. These algorithms are also compared with the tests using various other nonparametric estimators. Finally a decentralized version utilizing spatial diversity is also proposed and analysed.
Resumo:
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 suite that we used in our work. Current GPUs therefore allow concurrent execution of kernels to improve utilization. In this work, we study concurrent execution of GPU kernels using multiprogram workloads on current NVIDIA Fermi GPUs. On two-program workloads from the Parboil2 benchmark suite we find concurrent execution is often no better than serialized execution. We identify that the lack of control over resource allocation to kernels is a major serialization bottleneck. We propose transformations that convert CUDA kernels into elastic kernels which permit fine-grained control over their resource usage. We then propose several elastic-kernel aware concurrency policies that offer significantly better performance and concurrency compared to the current CUDA policy. We evaluate our proposals on real hardware using multiprogrammed workloads constructed from benchmarks in the Parboil 2 suite. On average, our proposals increase system throughput (STP) by 1.21x and improve the average normalized turnaround time (ANTT) by 3.73x for two-program workloads when compared to the current CUDA concurrency implementation.
Resumo:
Let M be the completion of the polynomial ring C(z) under bar] with respect to some inner product, and for any ideal I subset of C (z) under bar], let I] be the closure of I in M. For a homogeneous ideal I, the joint kernel of the submodule I] subset of M is shown, after imposing some mild conditions on M, to be the linear span of the set of vectors {p(i)(partial derivative/partial derivative(w) over bar (1),...,partial derivative/partial derivative(w) over bar (m)) K-I] (., w)vertical bar(w=0), 1 <= i <= t}, where K-I] is the reproducing kernel for the submodule 2] and p(1),..., p(t) is some minimal ``canonical set of generators'' for the ideal I. The proof includes an algorithm for constructing this canonical set of generators, which is determined uniquely modulo linear relations, for homogeneous ideals. A short proof of the ``Rigidity Theorem'' using the sheaf model for Hilbert modules over polynomial rings is given. We describe, via the monoidal transformation, the construction of a Hermitian holomorphic line bundle for a large class of Hilbert modules of the form I]. We show that the curvature, or even its restriction to the exceptional set, of this line bundle is an invariant for the unitary equivalence class of I]. Several examples are given to illustrate the explicit computation of these invariants.
Resumo:
In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.
Resumo:
This paper presents the formulation and performance analysis of four techniques for detection of a narrowband acoustic source in a shallow range-independent ocean using an acoustic vector sensor (AVS) array. The array signal vector is not known due to the unknown location of the source. Hence all detectors are based on a generalized likelihood ratio test (GLRT) which involves estimation of the array signal vector. One non-parametric and three parametric (model-based) signal estimators are presented. It is shown that there is a strong correlation between the detector performance and the mean-square signal estimation error. Theoretical expressions for probability of false alarm and probability of detection are derived for all the detectors, and the theoretical predictions are compared with simulation results. It is shown that the detection performance of an AVS array with a certain number of sensors is equal to or slightly better than that of a conventional acoustic pressure sensor array with thrice as many sensors.
Resumo:
The problem of finding a satisfying assignment that minimizes the number of variables that are set to 1 is NP-complete even for a satisfiable 2-SAT formula. We call this problem MIN ONES 2-SAT. It generalizes the well-studied problem of finding the smallest vertex cover of a graph, which can be modeled using a 2-SAT formula with no negative literals. The natural parameterized version of the problem asks for a satisfying assignment of weight at most k. In this paper, we present a polynomial-time reduction from MIN ONES 2-SAT to VERTEX COVER without increasing the parameter and ensuring that the number of vertices in the reduced instance is equal to the number of variables of the input formula. Consequently, we conclude that this problem also has a simple 2-approximation algorithm and a 2k - c logk-variable kernel subsuming (or, in the case of kernels, improving) the results known earlier. Further, the problem admits algorithms for the parameterized and optimization versions whose runtimes will always match the runtimes of the best-known algorithms for the corresponding versions of vertex cover. Finally we show that the optimum value of the LP relaxation of the MIN ONES 2-SAT and that of the corresponding VERTEX COVER are the same. This implies that the (recent) results of VERTEX COVER version parameterized above the optimum value of the LP relaxation of VERTEX COVER carry over to the MIN ONES 2-SAT version parameterized above the optimum of the LP relaxation of MIN ONES 2-SAT. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
The Lovasz θ function of a graph, is a fundamental tool in combinatorial optimization and approximation algorithms. Computing θ involves solving a SDP and is extremely expensive even for moderately sized graphs. In this paper we establish that the Lovasz θ function is equivalent to a kernel learning problem related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM−θ graphs, on which the Lovasz θ function can be approximated well by a one-class SVM. This leads to a novel use of SVM techniques to solve algorithmic problems in large graphs e.g. identifying a planted clique of size Θ(n√) in a random graph G(n,12). A classic approach for this problem involves computing the θ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM−θ graph, and as a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. Further, we introduce the notion of a ''common orthogonal labeling'' which extends the notion of a ''orthogonal labelling of a single graph (used in defining the θ function) to multiple graphs. The problem of finding the optimal common orthogonal labelling is cast as a Multiple Kernel Learning problem and is used to identify a large common dense region in multiple graphs. The proposed algorithm achieves an order of magnitude scalability compared to the state of the art.
Resumo:
Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
Resumo:
We propose to employ bilateral filters to solve the problem of edge detection. The proposed methodology presents an efficient and noise robust method for detecting edges. Classical bilateral filters smooth images without distorting edges. In this paper, we modify the bilateral filter to perform edge detection, which is the opposite of bilateral smoothing. The Gaussian domain kernel of the bilateral filter is replaced with an edge detection mask, and Gaussian range kernel is replaced with an inverted Gaussian kernel. The modified range kernel serves to emphasize dissimilar regions. The resulting approach effectively adapts the detection mask according as the pixel intensity differences. The results of the proposed algorithm are compared with those of standard edge detection masks. Comparisons of the bilateral edge detector with Canny edge detection algorithm, both after non-maximal suppression, are also provided. The results of our technique are observed to be better and noise-robust than those offered by methods employing masks alone, and are also comparable to the results from Canny edge detector, outperforming it in certain cases.