5 resultados para Video Processing


Relevância:

70.00% 70.00%

Publicador:

Resumo:

FPGAs and GPUs are often used when real-time performance in video processing is required. An accelerated processor is chosen based on task-specific priorities (power consumption, processing time and detection accuracy), and this decision is normally made once at design time. All three characteristics are important, particularly in battery-powered systems. Here we propose a method for moving selection of processing platform from a single design-time choice to a continuous run time one.We implement Histogram of Oriented Gradients (HOG) detectors for cars and people and Mixture of Gaussians (MoG) motion detectors running across FPGA, GPU and CPU in a heterogeneous system. We use this to detect illegally parked vehicles in urban scenes. Power, time and accuracy information for each detector is characterised. An anomaly measure is assigned to each detected object based on its trajectory and location, when compared to learned contextual movement patterns. This drives processor and implementation selection, so that scenes with high behavioural anomalies are processed with faster but more power hungry implementations, but routine or static time periods are processed with power-optimised, less accurate, slower versions. Real-time performance is evaluated on video datasets including i-LIDS. Compared to power-optimised static selection, automatic dynamic implementation mapping is 10% more accurate but draws 12W extra power in our testbed desktop system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Field-programmable gate arrays are ideal hosts to custom accelerators for signal, image, and data processing but de- mand manual register transfer level design if high performance and low cost are desired. High-level synthesis reduces this design burden but requires manual design of complex on-chip and off-chip memory architectures, a major limitation in applications such as video processing. This paper presents an approach to resolve this shortcoming. A constructive process is described that can derive such accelerators, including on- and off-chip memory storage from a C description such that a user-defined throughput constraint is met. By employing a novel statement-oriented approach, dataflow intermediate models are derived and used to support simple ap- proaches for on-/off-chip buffer partitioning, derivation of custom on-chip memory hierarchies and architecture transformation to ensure user-defined throughput constraints are met with minimum cost. When applied to accelerators for full search motion estima- tion, matrix multiplication, Sobel edge detection, and fast Fourier transform, it is shown how real-time performance up to an order of magnitude in advance of existing commercial HLS tools is enabled whilst including all requisite memory infrastructure. Further, op- timizations are presented that reduce the on-chip buffer capacity and physical resource cost by up to 96% and 75%, respectively, whilst maintaining real-time performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.