942 resultados para real-effort task
Resumo:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
Resumo:
The primary objective of the paper is to make use of statistical digital human model to better understand the nature of reach probability of points in the taskspace. The concept of task-dependent boundary manikin is introduced to geometrically characterize the extreme individuals in the given population who would accomplish the task. For a given point of interest and task, the map of the acceptable variation in anthropometric parameters is superimposed with the distribution of the same parameters in the given population to identify the extreme individuals. To illustrate the concept, the task space mapping is done for the reach probability of human arms. Unlike the boundary manikins, who are completely defined by the population, the dimensions of these manikins will vary with task, say, a point to be reached, as in the present case. Hence they are referred to here as the task-dependent boundary manikins. Simulations with these manikins would help designers to visualize how differently the extreme individuals would perform the task. Reach probability at the points in a 3D grid in the operational space is computed; for objects overlaid in this grid, approximate probabilities are derived from the grid for rendering them with colors indicating the reach probability. The method may also help in providing a rational basis for selection of personnel for a given task.
Resumo:
Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.
Resumo:
Real-time simulation of deformable solids is essential for some applications such as biological organ simulations for surgical simulators. In this work, deformable solids are approximated to be linear elastic, and an easy and straight forward numerical technique, the Finite Point Method (FPM), is used to model three dimensional linear elastostatics. Graphics Processing Unit (GPU) is used to accelerate computations. Results show that the Finite Point Method, together with GPU, can compute three dimensional linear elastostatic responses of solids at rates suitable for real-time graphics, for solids represented by reasonable number of points.
Resumo:
To realistically simulate the motion of flexible objects such as ropes, strings, snakes, or human hair,one strategy is to discretise the object into a large number of small rigid links connected by rotary or spherical joints. The discretised system is highly redundant and the rotations at the joints (or the motion of the other links) for a desired Cartesian motion of the end of a link cannot be solved uniquely. In this paper, we propose a novel strategy to resolve the redundancy in such hyper-redundant systems.We make use of the classical tractrix curve and its attractive features. For a desired Cartesian motion of the `head'of a link, the `tail' of the link is moved according to a tractrix,and recursively all links of the discretised objects are moved along different tractrix curves. We show that the use of a tractrix curve leads to a more `natural' motion of the entire object since the motion is distributed uniformly along the entire object with the displacements tending to diminish from the `head' to the `tail'. We also show that the computation of the motion of the links can be done in real time since it involves evaluation of simple algebraic, trigonometric and hyperbolic functions. The strategy is illustrated by simulations of a snake, tying of knots with a rope and a solution of the inverse kinematics of a planar hyper-redundant manipulator.
Resumo:
Image and video filtering is a key image-processing task in computer vision especially in noisy environment. In most of the cases the noise source is unknown and hence possess a major difficulty in the filtering operation. In this paper we present an error-correction based learning approach for iterative filtering. A new FIR filter is designed in which the filter coefficients are updated based on Widrow-Hoff rule. Unlike the standard filter the proposed filter has the ability to remove noise without the a priori knowledge of the noise. Experimental result shows that the proposed filter efficiently removes the noise and preserves the edges in the image. We demonstrate the capability of the proposed algorithm by testing it on standard images infected by Gaussian noise and on a real time video containing inherent noise. Experimental result shows that the proposed filter is better than some of the existing standard filters
Resumo:
In this paper we discuss the recent progresses in spectral finite element modeling of complex structures and its application in real-time structural health monitoring system based on sensor-actuator network and near real-time computation of Damage Force Indicator (DFI) vector. A waveguide network formalism is developed by mapping the original variational problem into the variational problem involving product spaces of 1D waveguides. Numerical convergence is studied using a h()-refinement scheme, where is the wavelength of interest. Computational issues towards successful implementation of this method with SHM system are discussed.
Resumo:
We consider the problem of scheduling semiconductor burn-in operations, where burn-in ovens are modelled as batch processing machines. Most of the studies assume that ready times and due dates of jobs are agreeable (i.e., ri < rj implies di ≤ dj). In many real world applications, the agreeable property assumption does not hold. Therefore, in this paper, scheduling of a single burn-in oven with non-agreeable release times and due dates along with non-identical job sizes as well as non-identical processing of time problem is formulated as a Non-Linear (0-1) Integer Programming optimisation problem. The objective measure of the problem is minimising the maximum completion time (makespan) of all jobs. Due to computational intractability, we have proposed four variants of a two-phase greedy heuristic algorithm. Computational experiments indicate that two out of four proposed algorithms have excellent average performance and also capable of solving any large-scale real life problems with a relatively low computational effort on a Pentium IV computer.
Resumo:
A multiple UAV search and attack mission in a battlefield involves allocating UAVs to different target tasks efficiently. This task allocation becomes difficult when there is no communication among the UAVs and the UAVs sensors have limited range to detect the targets and neighbouring UAVs, and assess target status. In this paper, we propose a team theoretic approach to efficiently allocate UAVs to the targets with the constraint that UAVs do not communicate among themselves and have limited sensor range. We study the performance of team theoretic approach for task allocation on a battle field scenario. The performance obtained through team theory is compared with two other methods, namely, limited sensor range but with communication among all the UAVs, and greedy strategy with limited sensor range and no communication. It is found that the team theoretic strategy performs the best even though it assumes limited sensor range and no communication.
Resumo:
It has been shown recently that the maximum rate of a 2-real-symbol (single-complex-symbol) maximum likelihood (ML) decodable, square space-time block codes (STBCs) with unitary weight matrices is 2a/2a complex symbols per channel use (cspcu) for 2a number of transmit antennas [1]. These STBCs are obtained from Unitary Weight Designs (UWDs). In this paper, we show that the maximum rates for 3- and 4-real-symbol (2-complex-symbol) ML decodable square STBCs from UWDs, for 2a transmit antennas, are 3(a-1)/2a and 4(a-1)/2a cspcu, respectively. STBCs achieving this maximum rate are constructed. A set of sufficient conditions on the signal set, required for these codes to achieve full-diversity are derived along with expressions for their coding gain.