973 resultados para Motion recognition
Resumo:
Many surveillance applications (object tracking, abandoned object detection) rely on detecting changes in a scene. Foreground segmentation is an effective way to extract the foreground from the scene, but these techniques cannot discriminate between objects that have temporarily stopped and those that are moving. We propose a series of modifications to an existing foreground segmentation system\cite{Butler2003} so that the foreground is further segmented into two or more layers. This yields an active layer of objects currently in motion and a passive layer of objects that have temporarily ceased motion which can itself be decomposed into multiple static layers. We also propose a variable threshold to cope with variable illumination, a feedback mechanism that allows an external process (i.e. surveillance system) to alter the motion detectors state, and a lighting compensation process and a shadow detector to reduce errors caused by lighting inconsistencies. The technique is demonstrated using outdoor surveillance footage, and is shown to be able to effectively deal with real world lighting conditions and overlapping objects.
Resumo:
Abandoned object detection (AOD) systems are required to run in high traffic situations, with high levels of occlusion. Systems rely on background segmentation techniques to locate abandoned objects, by detecting areas of motion that have stopped. This is often achieved by using a medium term motion detection routine to detect long term changes in the background. When AOD systems are integrated into person tracking system, this often results in two separate motion detectors being used to handle the different requirements. We propose a motion detection system that is capable of detecting medium term motion as well as regular motion. Multiple layers of medium term (static) motion can be detected and segmented. We demonstrate the performance of this motion detection system and as part of an abandoned object detection system.
Resumo:
This paper presents an object tracking system that utilises a hybrid multi-layer motion segmentation and optical flow algorithm. While many tracking systems seek to combine multiple modalities such as motion and depth or multiple inputs within a fusion system to improve tracking robustness, current systems have avoided the combination of motion and optical flow. This combination allows the use of multiple modes within the object detection stage. Consequently, different categories of objects, within motion or stationary, can be effectively detected utilising either optical flow, static foreground or active foreground information. The proposed system is evaluated using the ETISEO database and evaluation metrics and compared to a baseline system utilising a single mode foreground segmentation technique. Results demonstrate a significant improvement in tracking results can be made through the incorporation of the additional motion information.
Resumo:
Object tracking systems require accurate segmentation of the objects from the background for effective tracking. Motion segmentation or optical flow can be used to segment incoming images. Whilst optical flow allows multiple moving targets to be separated based on their individual velocities, optical flow techniques are prone to errors caused by changing lighting and occlusions, both common in a surveillance environment. Motion segmentation techniques are more robust to fluctuating lighting and occlusions, but don't provide information on the direction of the motion. In this paper we propose a combined motion segmentation/optical flow algorithm for use in object tracking. The proposed algorithm uses the motion segmentation results to inform the optical flow calculations and ensure that optical flow is only calculated in regions of motion, and improve the performance of the optical flow around the edge of moving objects. Optical flow is calculated at pixel resolution and tracking of flow vectors is employed to improve performance and detect discontinuities, which can indicate the location of overlaps between objects. The algorithm is evaluated by attempting to extract a moving target within the flow images, given expected horizontal and vertical movement (i.e. the algorithms intended use for object tracking). Results show that the proposed algorithm outperforms other widely used optical flow techniques for this surveillance application.
Resumo:
In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.
Resumo:
3D Motion capture is a fast evolving field and recent inertial technology may expand the artistic possibilities for its use in live performance. Inertial motion capture has three attributes that make it suitable for use with live performance; it is portable, easy to use and can operate in real-time. Using four projects, this paper discusses the suitability of inertial motion capture to live performance with a particular emphasis on dance. Dance is an artistic application of human movement and motion capture is the means to record human movement as digital data. As such, dance is clearly a field in which the use of real-time motion capture is likely to become more common, particularly as projected visual effects including real-time video are already often used in dance performances. Understandably, animation generated in real-time using motion capture is not as extensive or as clean as the highly mediated animation used in movies and games, but the quality is still impressive and the ‘liveness’ of the animation has compensating features that offer new ways of communicating with an audience.
Resumo:
This paper presents an implementation of an aircraft pose and motion estimator using visual systems as the principal sensor for controlling an Unmanned Aerial Vehicle (UAV) or as a redundant system for an Inertial Measure Unit (IMU) and gyros sensors. First, we explore the applications of the unified theory for central catadioptric cameras for attitude and heading estimation, explaining how the skyline is projected on the catadioptric image and how it is segmented and used to calculate the UAV’s attitude. Then we use appearance images to obtain a visual compass, and we calculate the relative rotation and heading of the aerial vehicle. Additionally, we show the use of a stereo system to calculate the aircraft height and to measure the UAV’s motion. Finally, we present a visual tracking system based on Fuzzy controllers working in both a UAV and a camera pan and tilt platform. Every part is tested using the UAV COLIBRI platform to validate the different approaches, which include comparison of the estimated data with the inertial values measured onboard the helicopter platform and the validation of the tracking schemes on real flights.
Resumo:
Acquiring accurate silhouettes has many applications in computer vision. This is usually done through motion detection, or a simple background subtraction under highly controlled environments (i.e. chroma-key backgrounds). Lighting and contrast issues in typical outdoor or office environments make accurate segmentation very difficult in these scenes. In this paper, gradients are used in conjunction with intensity and colour to provide a robust segmentation of motion, after which graph cuts are utilised to refine the segmentation. The results presented using the ETISEO database demonstrate that an improved segmentation is achieved through the combined use of motion detection and graph cuts, particularly in complex scenes.
Resumo:
The purpose of this chapter is to describe the use of caricatured contrasting scenarios (Bødker, 2000) and how they can be used to consider potential designs for disruptive technologies. The disruptive technology in this case is Automatic Speech Recognition (ASR) software in workplace settings. The particular workplace is the Magistrates Court of the Australian Capital Territory.----- Caricatured contrasting scenarios are ideally suited to exploring how ASR might be implemented in a particular setting because they allow potential implementations to be “sketched” quickly and with little effort. This sketching of potential interactions and the emphasis of both positive and negative outcomes allows the benefits and pitfalls of design decisions to become apparent.----- A brief description of the Court is given, describing the reasons for choosing the Court for this case study. The work of the Court is framed as taking place in two modes: Front of house, where the courtroom itself is, and backstage, where documents are processed and the business of the court is recorded and encoded into various systems.----- Caricatured contrasting scenarios describing the introduction of ASR to the front of house are presented and then analysed. These scenarios show that the introduction of ASR to the court would be highly problematic.----- The final section describes how ASR could be re-imagined in order to make it useful for the court. A final scenario is presented that describes how this re-imagined ASR could be integrated into both the front of house and backstage of the court in a way that could strengthen both processes.
Resumo:
Identifying an individual from surveillance video is a difficult, time consuming and labour intensive process. The proposed system aims to streamline this process by filtering out unwanted scenes and enhancing an individual's face through super-resolution. An automatic face recognition system is then used to identify the subject or present the human operator with likely matches from a database. A person tracker is used to speed up the subject detection and super-resolution process by tracking moving subjects and cropping a region of interest around the subject's face to reduce the number and size of the image frames to be super-resolved respectively. In this paper, experiments have been conducted to demonstrate how the optical flow super-resolution method used improves surveillance imagery for visual inspection as well as automatic face recognition on an Eigenface and Elastic Bunch Graph Matching system. The optical flow based method has also been benchmarked against the ``hallucination'' algorithm, interpolation methods and the original low-resolution images. Results show that both super-resolution algorithms improved recognition rates significantly. Although the hallucination method resulted in slightly higher recognition rates, the optical flow method produced less artifacts and more visually correct images suitable for human consumption.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
While spatial determinants of emmetropization have been examined extensively in animal models and spatial processing of human myopes has also been studied, there have been few studies investigating temporal aspects of emmetropization and temporal processing in human myopia. The influence of temporal light modulation on eye growth and refractive compensation has been observed in animal models and there is evidence of temporal visual processing deficits in individuals with high myopia or other pathologies. Given this, the aims of this work were to examine the relationships between myopia (i.e. degree of myopia and progression status) and temporal visual performance and to consider any temporal processing deficits in terms of the parallel retinocortical pathways. Three psychophysical studies investigating temporal processing performance were conducted in young adult myopes and non-myopes: (1) backward visual masking, (2) dot motion perception and (3) phantom contour. For each experiment there were approximately 30 young emmetropes, 30 low myopes (myopia less than 5 D) and 30 high myopes (5 to 12 D). In the backward visual masking experiment, myopes were also classified according to their progression status (30 stable myopes and 30 progressing myopes). The first study was based on the observation that the visibility of a target is reduced by a second target, termed the mask, presented quickly after the first target. Myopes were more affected by the mask when the task was biased towards the magnocellular pathway; myopes had a 25% mean reduction in performance compared with emmetropes. However, there was no difference in the effect of the mask when the task was biased towards the parvocellular system. For all test conditions, there was no significant correlation between backward visual masking task performance and either the degree of myopia or myopia progression status. The dot motion perception study measured detection thresholds for the minimum displacement of moving dots, the maximum displacement of moving dots and degree of motion coherence required to correctly determine the direction of motion. The visual processing of these tasks is dominated by the magnocellular pathway. Compared with emmetropes, high myopes had reduced ability to detect the minimum displacement of moving dots for stimuli presented at the fovea (20% higher mean threshold) and possibly at the inferior nasal retina. The minimum displacement threshold was significantly and positively correlated to myopia magnitude and axial length, and significantly and negatively correlated with retinal thickness for the inferior nasal retina. The performance of emmetropes and myopes for all the other dot motion perception tasks were similar. In the phantom contour study, the highest temporal frequency of the flickering phantom pattern at which the contour was visible was determined. Myopes had significantly lower flicker detection limits (21.8 ± 7.1 Hz) than emmetropes (25.6 ± 8.8 Hz) for tasks biased towards the magnocellular pathway for both high (99%) and low (5%) contrast stimuli. There was no difference in flicker limits for a phantom contour task biased towards the parvocellular pathway. For all phantom contour tasks, there was no significant correlation between flicker detection thresholds and magnitude of myopia. Of the psychophysical temporal tasks studied here those primarily involving processing by the magnocellular pathway revealed differences in performance of the refractive error groups. While there are a number of interpretations for this data, this suggests that there may be a temporal processing deficit in some myopes that is selective for the magnocellular system. The minimum displacement dot motion perception task appears the most sensitive test, of those studied, for investigating changes in visual temporal processing in myopia. Data from the visual masking and phantom contour tasks suggest that the alterations to temporal processing occur at an early stage of myopia development. In addition, the link between increased minimum displacement threshold and decreasing retinal thickness suggests that there is a retinal component to the observed modifications in temporal processing.