841 resultados para visual object detection
Resumo:
Object tracking systems require accurate segmentation of the objects from the background for effective tracking. Motion segmentation or optical flow can be used to segment incoming images. Whilst optical flow allows multiple moving targets to be separated based on their individual velocities, optical flow techniques are prone to errors caused by changing lighting and occlusions, both common in a surveillance environment. Motion segmentation techniques are more robust to fluctuating lighting and occlusions, but don't provide information on the direction of the motion. In this paper we propose a combined motion segmentation/optical flow algorithm for use in object tracking. The proposed algorithm uses the motion segmentation results to inform the optical flow calculations and ensure that optical flow is only calculated in regions of motion, and improve the performance of the optical flow around the edge of moving objects. Optical flow is calculated at pixel resolution and tracking of flow vectors is employed to improve performance and detect discontinuities, which can indicate the location of overlaps between objects. The algorithm is evaluated by attempting to extract a moving target within the flow images, given expected horizontal and vertical movement (i.e. the algorithms intended use for object tracking). Results show that the proposed algorithm outperforms other widely used optical flow techniques for this surveillance application.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.
Resumo:
Gabor representations have been widely used in facial analysis (face recognition, face detection and facial expression detection) due to their biological relevance and computational properties. Two popular Gabor representations used in literature are: 1) Log-Gabor and 2) Gabor energy filters. Even though these representations are somewhat similar, they also have distinct differences as the Log-Gabor filters mimic the simple cells in the visual cortex while the Gabor energy filters emulate the complex cells, which causes subtle differences in the responses. In this paper, we analyze the difference between these two Gabor representations and quantify these differences on the task of facial action unit (AU) detection. In our experiments conducted on the Cohn-Kanade dataset, we report an average area underneath the ROC curve (A`) of 92.60% across 17 AUs for the Gabor energy filters, while the Log-Gabor representation achieved an average A` of 96.11%. This result suggests that small spatial differences that the Log-Gabor filters pick up on are more useful for AU detection than the differences in contours and edges that the Gabor energy filters extract.
Resumo:
The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.
Resumo:
Visual recording devices such as video cameras, CCTVs, or webcams have been broadly used to facilitate work progress or safety monitoring on construction sites. Without human intervention, however, both real-time reasoning about captured scenes and interpretation of recorded images are challenging tasks. This article presents an exploratory method for automated object identification using standard video cameras on construction sites. The proposed method supports real-time detection and classification of mobile heavy equipment and workers. The background subtraction algorithm extracts motion pixels from an image sequence, the pixels are then grouped into regions to represent moving objects, and finally the regions are identified as a certain object using classifiers. For evaluating the method, the formulated computer-aided process was implemented on actual construction sites, and promising results were obtained. This article is expected to contribute to future applications of automated monitoring systems of work zone safety or productivity.
Resumo:
Autonomous development of sensorimotor coordination enables a robot to adapt and change its action choices to interact with the world throughout its lifetime. The Experience Network is a structure that rapidly learns coordination between visual and haptic inputs and motor action. This paper presents methods which handle the high dimensionality of the network state-space which occurs due to the simultaneous detection of multiple sensory features. The methods provide no significant increase in the complexity of the underlying representations and also allow emergent, task-specific, semantic information to inform action selection. Experimental results show rapid learning in a real robot, beginning with no sensorimotor mappings, to a mobile robot capable of wall avoidance and target acquisition.
Resumo:
This article presents a visual servoing system to follow a 3D moving object by a Micro Unmanned Aerial Vehicle (MUAV). The presented control strategy is based only on the visual information given by an adaptive tracking method based on the colour information. A visual fuzzy system has been developed for servoing the camera situated on a rotary wing MAUV, that also considers its own dynamics. This system is focused on continuously following of an aerial moving target object, maintaining it with a fixed safe distance and centred on the image plane. The algorithm is validated on real flights on outdoors scenarios, showing the robustness of the proposed systems against winds perturbations, illumination and weather changes among others. The obtained results indicate that the proposed algorithms is suitable for complex controls task, such object following and pursuit, flying in formation, as well as their use for indoor navigation
Resumo:
Purpose: To investigate the correlations of the global flash multifocal electroretinogram (MOFO mfERG) with common clinical visual assessments – Humphrey perimetry and Stratus circumpapillary retinal nerve fiber layer (RNFL) thickness measurement in type II diabetic patients. Methods: Forty-two diabetic patients participated in the study: ten were free from diabetic retinopathy (DR) while the remainder suffered from mild to moderate non-proliferative diabetic retinopathy (NPDR). Fourteen age-matched controls were recruited for comparison. MOFO mfERG measurements were made under high and low contrast conditions. Humphrey central 30-2 perimetry and Stratus OCT circumpapillary RNFL thickness measurements were also performed. Correlations between local values of implicit time and amplitude of the mfERG components (direct component (DC) and induced component (IC)), and perimetric sensitivity and RNFL thickness were evaluated by mapping the localized responses for the three subject groups. Results: MOFO mfERG was superior to perimetry and RNFL assessments in showing differences between the diabetic groups (with and without DR) and the controls. All the MOFO mfERG amplitudes (except IC amplitude at high contrast) correlated better with perimetry findings (Pearson’s r ranged from 0.23 to 0.36, p<0.01) than did the mfERG implicit time at both high and low contrasts across all subject groups. No consistent correlation was found between the mfERG and RNFL assessments for any group or contrast conditions. The responses of the local MOFO mfERG correlated with local perimetric sensitivity but not with RNFL thickness. Conclusion: Early functional changes in the diabetic retina seem to occur before morphological changes in the RNFL.
Resumo:
The increasingly widespread use of large-scale 3D virtual environments has translated into an increasing effort required from designers, developers and testers. While considerable research has been conducted into assisting the design of virtual world content and mechanics, to date, only limited contributions have been made regarding the automatic testing of the underpinning graphics software and hardware. In the work presented in this paper, two novel neural network-based approaches are presented to predict the correct visualization of 3D content. Multilayer perceptrons and self-organizing maps are trained to learn the normal geometric and color appearance of objects from validated frames and then used to detect novel or anomalous renderings in new images. Our approach is general, for the appearance of the object is learned rather than explicitly represented. Experiments were conducted on a game engine to determine the applicability and effectiveness of our algorithms. The results show that the neural network technology can be effectively used to address the problem of automatic and reliable visual testing of 3D virtual environments.
Resumo:
A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovementioned variations. We use affine subspaces not only to represent the object, but also the candidate areas that the object may occupy. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of non-Euclidean geometry of Grassmann manifolds. The tracking problem is then considered as an inference task in a Markov Chain Monte Carlo framework via particle filtering. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking-Learning-Detection and MILtrack.
Resumo:
We investigated memories of room-sized spatial layouts learned by sequentially or simultaneously viewing objects from a stationary position. In three experiments, sequential viewing (one or two objects at a time) yielded subsequent memory performance that was equivalent or superior to simultaneous viewing of all objects, even though sequential viewing lacked direct access to the entire layout. This finding was replicated by replacing sequential viewing with directed viewing in which all objects were presented simultaneously and participants’ attention was externally focused on each object sequentially, indicating that the advantage of sequential viewing over simultaneous viewing may have originated from focal attention to individual object locations. These results suggest that memory representation of object-to-object relations can be constructed efficiently by encoding each object location separately, when those locations are defined within a single spatial reference system. These findings highlight the importance of considering object presentation procedures when studying spatial learning mechanisms.
Resumo:
Previous behavioral studies reported a robust effect of increased naming latencies when objects to be named were blocked within semantic category, compared to items blocked between category. This semantic context effect has been attributed to various mechanisms including inhibition or excitation of lexico-semantic representations and incremental learning of associations between semantic features and names, and is hypothesized to increase demands on verbal self-monitoring during speech production. Objects within categories also share many visual structural features, introducing a potential confound when interpreting the level at which the context effect might occur. Consistent with previous findings, we report a significant increase in response latencies when naming categorically related objects within blocks, an effect associated with increased perfusion fMRI signal bilaterally in the hippocampus and in the left middle to posterior superior temporal cortex. No perfusion changes were observed in the middle section of the left middle temporal cortex, a region associated with retrieval of lexical-semantic information in previous object naming studies. Although a manipulation of visual feature similarity did not influence naming latencies, we observed perfusion increases in the perirhinal cortex for naming objects with similar visual features that interacted with the semantic context in which objects were named. These results provide support for the view that the semantic context effect in object naming occurs due to an incremental learning mechanism, and involves increased demands on verbal self-monitoring.
Resumo:
This paper provides a preliminary analysis of an autonomous uncooperative collision avoidance strategy for unmanned aircraft using image-based visual control. Assuming target detection, the approach consists of three parts. First, a novel decision strategy is used to determine appropriate reference image features to track for safe avoidance. This is achieved by considering the current rules of the air (regulations), the properties of spiral motion and the expected visual tracking errors. Second, a spherical visual predictive control (VPC) scheme is used to guide the aircraft along a safe spiral-like trajectory about the object. Lastly, a stopping decision based on thresholding a cost function is used to determine when to stop the avoidance behaviour. The approach does not require estimation of range or time to collision, and instead relies on tuning two mutually exclusive decision thresholds to ensure satisfactory performance.