13 resultados para Visual Tracking
Resumo:
Handling appearance variations is a very challenging problem for visual tracking. Existing methods usually solve this problem by relying on an effective appearance model with two features: (1) being capable of discriminating the tracked target from its background, (2) being robust to the target's appearance variations during tracking. Instead of integrating the two requirements into the appearance model, in this paper, we propose a tracking method that deals with these problems separately based on sparse representation in a particle filter framework. Each target candidate defined by a particle is linearly represented by the target and background templates with an additive representation error. Discriminating the target from its background is achieved by activating the target templates or the background templates in the linear system in a competitive manner. The target's appearance variations are directly modeled as the representation error. An online algorithm is used to learn the basis functions that sparsely span the representation error. The linear system is solved via ℓ1 minimization. The candidate with the smallest reconstruction error using the target templates is selected as the tracking result. We test the proposed approach using four sequences with heavy occlusions, large pose variations, drastic illumination changes and low foreground-background contrast. The proposed approach shows excellent performance in comparison with two latest state-of-the-art trackers.
Resumo:
In this paper, we propose a novel visual tracking framework, based on a decision-theoretic online learning algorithm namely NormalHedge. To make NormalHedge more robust against noise, we propose an adaptive NormalHedge algorithm, which exploits the historic information of each expert to perform more accurate prediction than the standard NormalHedge. Technically, we use a set of weighted experts to predict the state of the target to be tracked over time. The weight of each expert is online learned by pushing the cumulative regret of the learner towards that of the expert. Our simulation experiments demonstrate the effectiveness of the proposed adaptive NormalHedge, compared to the standard NormalHedge method. Furthermore, the experimental results of several challenging video sequences show that the proposed tracking method outperforms several state-of-the-art methods.
Resumo:
Sparse representation based visual tracking approaches have attracted increasing interests in the community in recent years. The main idea is to linearly represent each target candidate using a set of target and trivial templates while imposing a sparsity constraint onto the representation coefficients. After we obtain the coefficients using L1-norm minimization methods, the candidate with the lowest error, when it is reconstructed using only the target templates and the associated coefficients, is considered as the tracking result. In spite of promising system performance widely reported, it is unclear if the performance of these trackers can be maximised. In addition, computational complexity caused by the dimensionality of the feature space limits these algorithms in real-time applications. In this paper, we propose a real-time visual tracking method based on structurally random projection and weighted least squares techniques. In particular, to enhance the discriminative capability of the tracker, we introduce background templates to the linear representation framework. To handle appearance variations over time, we relax the sparsity constraint using a weighed least squares (WLS) method to obtain the representation coefficients. To further reduce the computational complexity, structurally random projection is used to reduce the dimensionality of the feature space while preserving the pairwise distances between the data points in the feature space. Experimental results show that the proposed approach outperforms several state-of-the-art tracking methods.
Resumo:
In this work, we propose a biologically inspired appearance model for robust visual tracking. Motivated in part by the success of the hierarchical organization of the primary visual cortex (area V1), we establish an architecture consisting of five layers: whitening, rectification, normalization, coding and polling. The first three layers stem from the models developed for object recognition. In this paper, our attention focuses on the coding and pooling layers. In particular, we use a discriminative sparse coding method in the coding layer along with spatial pyramid representation in the pooling layer, which makes it easier to distinguish the target to be tracked from its background in the presence of appearance variations. An extensive experimental study shows that the proposed method has higher tracking accuracy than several state-of-the-art trackers.
Resumo:
In this paper, a novel framework for visual tracking of human body parts is introduced. The approach presented demonstrates the feasibility of recovering human poses with data from a single uncalibrated camera by using a limb-tracking system based on a 2-D articulated model and a double-tracking strategy. Its key contribution is that the 2-D model is only constrained by biomechanical knowledge about human bipedal motion, instead of relying on constraints that are linked to a specific activity or camera view. These characteristics make our approach suitable for real visual surveillance applications. Experiments on a set of indoor and outdoor sequences demonstrate the effectiveness of our method on tracking human lower body parts. Moreover, a detail comparison with current tracking methods is presented.
Resumo:
Objective
Based on the theory of incentive sensitization, the aim of this study was to investigate differences in attentional processing of food-related visual cues between normal-weight and overweight/obese males and females.
Methods
Twenty-six normal-weight (14M, 12F) and 26 overweight/obese (14M, 12F) adults completed a visual probe task and an eye-tracking paradigm. Reaction times and eye movements to food and control images were collected during both a fasted and fed condition in a counterbalanced design.
Results
Participants had greater visual attention towards high-energy-density food images compared to low-energy-density food images regardless of hunger condition. This was most pronounced in overweight/obese males who had significantly greater maintained attention towards high-energy-density food images when compared with their normal-weight counterparts however no between weight group differences were observed for female participants.
Conclusions
High-energy-density food images appear to capture visual attention more readily than low-energy-density food images. Results also suggest the possibility of an altered visual food cue-associated reward system in overweight/obese males. Attentional processing of food cues may play a role in eating behaviors thus should be taken into consideration as part of an integrated approach to curbing obesity.
Resumo:
Primary Objective: To investigate the utility of using a new method of assessment for deficits in selective visual attention (SVA). Methods and Procedures: An independent groups design compared six participants with brain injuries with six participants from a non-brain injured control group. The Sensomotoric Instruments Eye Movement system with remote eye-tracking device (eye camera), and 2 sets of eight stimuli were employed to determine if the camera would be a sensitive discriminator of SVA in these groups. Main Outcomes and Results: The attention profile displayed by the brain injured group showed that they were slower, made more errors, were less accurate, and more indecisive than the control group. Conclusions: The utility of eye movement analysis as an assessment method was established, with implications for rehabilitation requiring further development. Key words: selective visual attention, eye movement analysis, brain injury
Resumo:
We present a multimodal detection and tracking algorithm for sensors composed of a camera mounted between two microphones. Target localization is performed on color-based change detection in the video modality and on time difference of arrival (TDOA) estimation between the two microphones in the audio modality. The TDOA is computed by multiband generalized cross correlation (GCC) analysis. The estimated directions of arrival are then postprocessed using a Riccati Kalman filter. The visual and audio estimates are finally integrated, at the likelihood level, into a particle filter (PF) that uses a zero-order motion model, and a weighted probabilistic data association (WPDA) scheme. We demonstrate that the Kalman filtering (KF) improves the accuracy of the audio source localization and that the WPDA helps to enhance the tracking performance of sensor fusion in reverberant scenarios. The combination of multiband GCC, KF, and WPDA within the particle filtering framework improves the performance of the algorithm in noisy scenarios. We also show how the proposed audiovisual tracker summarizes the observed scene by generating metadata that can be transmitted to other network nodes instead of transmitting the raw images and can be used for very low bit rate communication. Moreover, the generated metadata can also be used to detect and monitor events of interest.
Resumo:
This paper proposes a two-level 3D human pose tracking method for a specific action captured by several cameras. The generation of pose estimates relies on fitting a 3D articulated model on a Visual Hull generated from the input images. First, an initial pose estimate is constrained by a low dimensional manifold learnt by Temporal Laplacian Eigenmaps. Then, an improved global pose is calculated by refining individual limb poses. The validation of our method uses a public standard dataset and demonstrates its accurate and computational efficiency. © 2011 IEEE.
Resumo:
Background and objectives: Cognitive models suggest that attentional biases are integral in the maintenance of obsessive-compulsive symptoms (OCS). Such biases have been established experimentally in anxiety disorders; however, the evidence is unclear in Obsessive Compulsive disorder (OCD). In the present study, an eye-tracking methodology was employed to explore attentional biases in relation to OCS.
Methods: A convenience sample of 85 community volunteers was assessed on OCS using the Yale-Brown Obsessive Compulsive Scale-self report. Participants completed an eye-tracking paradigm where they were exposed to OCD, Aversive and Neutral visual stimuli. Indices of attentional bias were derived from the eye-tracking data.
Results: Simple linear regressions were performed with OCS severity as the predictor and eye-tracking measures of the different attentional biases for each of the three stimuli types were the criterion variables. Findings revealed that OCS severity moderately predicted greater frequency and duration of fixations on OCD stimuli, which reflect the maintenance attentional bias. No significant results were found in support of other biases.
Limitations: Interpretations based on a non-clinical sample limit the generalisability of the conclusions, although use of such samples in OCD research has been found to be comparable to clinical populations. Future research would include both clinical and sub-clinical participants.
Conclusions: Results provide some support for the theory of maintained attention in OCD attentional biases, as opposed to vigilance theory. Individuals with greater OCS do not orient to OCD stimuli any faster than individuals with lower OCS, but once a threat is identified, these individuals allocate more attention to OCS-relevant stimuli.
Resumo:
Objective
Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism.
Method
The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model.
Result
Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video.
Conclusion
This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.