942 resultados para Subpixel precision


Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP(mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain)in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization.The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Theoretical approaches are of fundamental importance to predict the potential impact of waste disposal facilities on ground water contamination. Appropriate design parameters are, in general, estimated by fitting the theoretical models to a field monitoring or laboratory experimental data. Double-reservoir diffusion (Transient Through-Diffusion) experiments are generally conducted in the laboratory to estimate the mass transport parameters of the proposed barrier material. These design parameters are estimated by manual parameter adjusting techniques (also called eye-fitting) like Pollute. In this work an automated inverse model is developed to estimate the mass transport parameters from transient through-diffusion experimental data. The proposed inverse model uses particle swarm optimization (PSO) algorithm which is based on the social behaviour of animals for finding their food sources. Finite difference numerical solution of the transient through-diffusion mathematical model is integrated with the PSO algorithm to solve the inverse problem of parameter estimation.The working principle of the new solver is demonstrated by estimating mass transport parameters from the published transient through-diffusion experimental data. The estimated values are compared with the values obtained by existing procedure. The present technique is robust and efficient. The mass transport parameters are obtained with a very good precision in less time

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Context-sensitive points-to analysis is critical for several program optimizations. However, as the number of contexts grows exponentially, storage requirements for the analysis increase tremendously for large programs, making the analysis non-scalable. We propose a scalable flow-insensitive context-sensitive inclusion-based points-to analysis that uses a specially designed multi-dimensional bloom filter to store the points-to information. Two key observations motivate our proposal: (i) points-to information (between pointer-object and between pointer-pointer) is sparse, and (ii) moving from an exact to an approximate representation of points-to information only leads to reduced precision without affecting correctness of the (may-points-to) analysis. By using an approximate representation a multi-dimensional bloom filter can significantly reduce the memory requirements with a probabilistic bound on loss in precision. Experimental evaluation on SPEC 2000 benchmarks and two large open source programs reveals that with an average storage requirement of 4MB, our approach achieves almost the same precision (98.6%) as the exact implementation. By increasing the average memory to 27MB, it achieves precision upto 99.7% for these benchmarks. Using Mod/Ref analysis as the client, we find that the client analysis is not affected that often even when there is some loss of precision in the points-to representation. We find that the NoModRef percentage is within 2% of the exact analysis while requiring 4MB (maximum 15MB) memory and less than 4 minutes on average for the points-to analysis. Another major advantage of our technique is that it allows to trade off precision for memory usage of the analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a new abstract domain for static analysis of executable code. Concrete states are abstracted using circular linear progressions (CLPs). CLPs model computations using a finite word length as is seen in any real life processor. The finite abstraction allows handling overflow scenarios in a natural and straight-forward manner. Abstract transfer functions have been defined for a wide range of operations which makes this domain easily applicable for analyzing code for a wide range of ISAs. CLPs combine the scalability of interval domains with the discreteness of linear congruence domains. We also present a novel, lightweight method to track linear equality relations between static objects that is used by the analysis to improve precision. The analysis is efficient, the total space and time overhead being quadratic in the number of static objects being tracked.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes techniques to estimate the worst case execution time of executable code on architectures with data caches. The underlying mechanism is Abstract Interpretation, which is used for the dual purposes of tracking address computations and cache behavior. A simultaneous numeric and pointer analysis using an abstraction for discrete sets of values computes safe approximations of access addresses which are then used to predict cache behavior using Must Analysis. A heuristic is also proposed which generates likely worst case estimates. It can be used in soft real time systems and also for reasoning about the tightness of the safe estimate. The analysis methods can handle programs with non-affine access patterns, for which conventional Presburger Arithmetic formulations or Cache Miss Equations do not apply. The precision of the estimates is user-controlled and can be traded off against analysis time. Executables are analyzed directly, which, apart from enhancing precision, renders the method language independent.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Compiler optimizations need precise and scalable analyses to discover program properties. We propose a partially flow-sensitive framework that tries to draw on the scalability of flow-insensitive algorithms while providing more precision at some specific program points. Provided with a set of critical nodes — basic blocks at which more precise information is desired — our partially flow-sensitive algorithm computes a reduced control-flow graph by collapsing some sets of non-critical nodes. The algorithm is more scalable than a fully flow-sensitive one as, assuming that the number of critical nodes is small, the reduced flow-graph is much smaller than the original flow-graph. At the same time, a much more precise information is obtained at certain program points than would had been obtained from a flow-insensitive algorithm.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An all-digital on-chip clock skew measurement system via subsampling is presented. The clock nodes are sub-sampled with a near-frequency asynchronous sampling clock to result in beat signals which are themselves skewed in the same proportion but on a larger time scale. The beat signals are then suitably masked to extract only the skews of the rising edges of the clock signals. We propose a histogram of the arithmetic difference of the beat signals which decouples the relationship of clock jitter to the minimum measurable skew, and allows skews arbitrarily close to zero to be measured with a precision limited largely by measurement time, unlike the conventional XOR based histogram approach. We also analytically show that the proposed approach leads to an unbiased estimate of skew. The measured results from a 65 nm delay measurement front-end indicate that for an input skew range of +/- 1 fan-out-of-4 (FO4) delay, +/- 3 sigma resolution of 0.84 ps can be obtained with an integral error of 0.65 ps. We also experimentally demonstrate that a frequency modulation on a sampling clock maintains precision, indicating the robustness of the technique to jitter. We also show how FM modulation helps in restoring precision in case of rationally related clocks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The standard Gibbs energy of formation of Rh203 at high temperature has been determined recently with high precision. The new data are significantly different from those given in thermodynamic compilations.Accurate values for enthalpy and entropy of formation at 298.15 K could not be evaluated from the new data,because reliable values for heat capacity of Rh2O3 were not available. In this article, a new measurement of the high temperature heat capacity of Rh2O3 using differential scanning calorimetry (DSC) is presented.The new values for heat capacity also differ significantly from those given in compilations. The information on heat capacity is coupled with standard Gibbs energy of formation to evaluate values for standard enthalpy and entropy of formation at 289.15 K using a multivariate analysis. The results suggest a major revision in thermodynamic data for Rh2O3. For example, it is recommended that the standard entropy of Rh203 at 298.15 K be changed from 106.27 J mol-' K-'given in the compilations of Barin and Knacke et al. to 75.69 J mol-' K". The recommended revision in the standard enthalpy of formation is from -355.64 kJ mol-'to -405.53 kJ mol".

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The standard Gibbs energy of formation of ReO2 in the temperature range from 900 to 1200 K has been determined with high precision using a novel apparatus incorporating a buffer electrode between reference and working electrodes. The role of the buffer electrode was to absorb the electrochemical flux of oxygen through the solid electrolyte from the electrode with higher oxygen chemical potential to the electrode with lower oxygen potential. It prevented the polarization of the measuring electrode and ensured accurate data. The Re+ReO2 working electrode was placed in a closed stabilized-zirconia crucible to prevent continuous vaporization of Re2O7 at high temperatures. The standard Gibbs energy of the formation of ReO2 can be represented by the equation View the MathML source Accurate values of low and high temperature heat capacity of ReO2 are available in the literature. The thermal data are coupled with the standard Gibbs energy of formation, obtained in this study, to evaluate the standard enthalpy of formation of ReO2 at 298.15 K by the ‘third law’ method. The value of standard enthalpy of formation at 298.15 K is: View the MathML source(ReO2)/kJ mol−1=−445.1 (±0.2). The uncertainty estimate includes both random (2σ) and systematic errors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Two-axis micromanipulators, whose tip orientation and position can be controlled in real time in the scanning plane, enable versatile probing systems for 2.5-D nanometrology. The key to achieve high-precision probing systems is to accurately control the interaction point of the manipulator tip when its orientation is changed. This paper presents the development of a probing system wherein the deviation in the end point due to large orientation changes is controlled to within 10 nm. To achieve this, a novel micromanipulator design is first proposed, wherein the end point of the tip is located on the axis of rotation. Next, the residual tip motion caused by fabrication error and actuation crosstalk is modeled and a systematic method to compensate it is presented. The manipulator is fabricated and the performance of the developed scheme to control tip position during orientation change is experimentally validated. Subsequently, the two-axis probing system is demonstrated to scan the full top surface of a micropipette down to a diameter of 300 nm.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Null dereferences are a bane of programming in languages such as Java. In this paper we propose a sound, demand-driven, inter-procedurally context-sensitive dataflow analysis technique to verify a given dereference as safe or potentially unsafe. Our analysis uses an abstract lattice of formulas to find a pre-condition at the entry of the program such that a null-dereference can occur only if the initial state of the program satisfies this pre-condition. We use a simplified domain of formulas, abstracting out integer arithmetic, as well as unbounded access paths due to recursive data structures. For the sake of precision we model aliasing relationships explicitly in our abstract lattice, enable strong updates, and use a limited notion of path sensitivity. For the sake of scalability we prune formulas continually as they get propagated, reducing to true conjuncts that are less likely to be useful in validating or invalidating the formula. We have implemented our approach, and present an evaluation of it on a set of ten real Java programs. Our results show that the set of design features we have incorporated enable the analysis to (a) explore long, inter-procedural paths to verify each dereference, with (b) reasonable accuracy, and (c) very quick response time per dereference, making it suitable for use in desktop development environments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Reaction wheel assemblies (RWAs) are momentum exchange devices used in fine pointing control of spacecrafts. Even though the spinning rotor of the reaction wheel is precisely balanced to minimize emitted vibration due to static and dynamic imbalances, precision instrument payloads placed in the neighborhood can always be severely impacted by residual vibration forces emitted by reaction wheel assemblies. The reduction of the vibration level at sensitive payloads can be achieved by placing the RWA on appropriate mountings. A low frequency flexible space platform consisting of folded continuous beams has been designed to serve as a mount for isolating a disturbance source in precision payloads equipped spacecrafts. Analytical and experimental investigations have been carried out to test the usefulness of the low frequency flexible platform as a vibration isolator for RWAs. Measurements and tests have been conducted at varying wheel speeds, to quantify and characterize the amount of isolation obtained from the reaction wheel generated vibration. These tests are further extended to other variants of similar design in order to bring out the best isolation for given disturbance loads. Both time and frequency domain analysis of test data show that the flexible beam platform as a mount for reaction wheels is quite effective and can be used in spacecrafts for passive vibration control. (C) 2011 Elsevier Ltd. All rights reserved.