12 resultados para Tracker
Resumo:
We present a multimodal detection and tracking algorithm for sensors composed of a camera mounted between two microphones. Target localization is performed on color-based change detection in the video modality and on time difference of arrival (TDOA) estimation between the two microphones in the audio modality. The TDOA is computed by multiband generalized cross correlation (GCC) analysis. The estimated directions of arrival are then postprocessed using a Riccati Kalman filter. The visual and audio estimates are finally integrated, at the likelihood level, into a particle filter (PF) that uses a zero-order motion model, and a weighted probabilistic data association (WPDA) scheme. We demonstrate that the Kalman filtering (KF) improves the accuracy of the audio source localization and that the WPDA helps to enhance the tracking performance of sensor fusion in reverberant scenarios. The combination of multiband GCC, KF, and WPDA within the particle filtering framework improves the performance of the algorithm in noisy scenarios. We also show how the proposed audiovisual tracker summarizes the observed scene by generating metadata that can be transmitted to other network nodes instead of transmitting the raw images and can be used for very low bit rate communication. Moreover, the generated metadata can also be used to detect and monitor events of interest.
Resumo:
In this paper, we present an inertial-sensor-based monitoring system for measuring the movement of human upper limbs. Two wearable inertial sensors are placed near the wrist and elbow joints, respectively. The measurement drift in segment orientation is dramatically reduced after a Kalman filter is applied to estimate inclinations using accelerations and turning rates from gyroscopes. Using premeasured lengths of the upper and lower arms, we compute the position of the wrist and elbow joints via a proposed kinematic model. Experimental results demonstrate that this new motion capture system, in comparison to an optical motion tracker, possesses an RMS position error of less than 0.009 m, with a drift of less than 0.005 ms-1 in five daily activities. In addition, the RMS angle error is less than 3??. This indicates that the proposed approach has performed well in terms of accuracy and reliability.
Resumo:
We present a Spatio-temporal 2D Models Framework (STMF) for 2D-Pose tracking. Space and time are discretized and a mixture of probabilistic "local models" is learnt associating 2D Shapes and 2D Stick Figures. Those spatio-temporal models generalize well for a particular viewpoint and state of the tracked action but some spatio-temporal discontinuities can appear along a sequence, as a direct consequence of the discretization. To overcome the problem, we propose to apply a Rao-Blackwellized Particle Filter (RBPF) in the 2D-Pose eigenspace, thus interpolating unseen data between view-based clusters. The fitness to the images of the predicted 2D-Poses is evaluated combining our STMF with spatio-temporal constraints. A robust, fast and smooth human motion tracker is obtained by tracking only the few most important dimensions of the state space and by refining deterministically with our STMF.
Resumo:
In this paper, we consider the problem of tracking similar objects. We show how a mean field approach can be used to deal with interacting targets and we compare it with Markov Chain Monte Carlo (MCMC). Two mean field implementations are presented. The first one is more general and uses particle filtering. We discuss some simplifications of the base algorithm that reduce the computation time. The second one is based on suitable Gaussian approximations of probability densities that lead to a set of self-consistent equations for the means and covariances. These equations give the Kalman solution if there is no interaction. Experiments have been performed on two kinds of sequences. The first kind is composed of a single long sequence of twenty roaming ants and was previously analysed using MCMC. In this case, our mean field algorithms obtain substantially better results. The second kind corresponds to selected sequences of a football match in which the interaction avoids tracker coalescence in situations where independent trackers fail.
Resumo:
The application of Eye Tracking (ET) to the study of social functioning in Asperger Syndrome (AS) provides a unique perspective into social attention and cognition in this atypical neurodevelopmental group. Research in this area has shown how ET can capture social attention atypicalities within this group, such as diminished fixations to the eye region when viewing still images and movie clips; increased fixation to the mouth region; reduced face gaze. Issues exist, however, within the literature, where the type (static/dynamic) and the content (ecological validity) of stimuli used appear to affect the nature of the gaze patterns reported. Objectives: Our research aims were: using the same group of adolescents with AS, to compare their viewing patterns to age and IQ matched typically developing (TD) adolescents using stimuli considered to represent a hierarchy of ecological validity, building from static facial images; through a non-verbal movie clip; through verbal footage from real-life conversation; to eye tracking during real-life conversation. Methods: Eleven participants with AS were compared to 11 TD adolescents, matched for age and IQ. In Study 1, participants were shown 2 sets of static facial images (emotion faces, still images taken from the dynamic clips). In Study 2, three dynamic clips were presented (1 non-verbal movie clip, 2 verbal footage from real-life conversation). Study 3 was an exploratory study of eye tracking during a real-life conversation. Eye movements were recorded via a HiSpeeed (240Hz) SMI eye tracker fitted with chin and forehead rests. Various methods of analysis were used, including a paradigm for temporal analysis of the eye movement data. Results: Results from these studies confirmed that the atypical nature of social attention in AS was successfully captured by this paradigm. While results differed across stimulus sets,
collectively they demonstrated how individuals with AS failed to focus on the most socially relevant aspects of the various stimuli presented. There was also evidence that the eye movements of the AS group were atypically affected by the presence of motion and verbal information. Discriminant Function Analysis demonstrated that the ecological validity of stimuli was an important factor in identifying atypicalities associated with AS, with more accurate classifications of AS and TD groups occurring for more naturalistic stimuli (dynamic rather than static). Graphical analysis of temporal sequences of eye movements revealed the atypical manner in which AS participants followed interactions within the dynamic stimuli. Taken together with data on the order of gaze patterns, more subtle atypicalities were detected in the gaze behaviour of AS individuals towards more socially pertinent regions of the dynamic stimuli. Conclusions: These results have potentially important implications for our understanding of deficits in Asperger Syndrome, as they show that, with more naturalistic stimuli, subtle differences in social attention can be detected that
Resumo:
We address the problem of multi-target tracking in realistic crowded conditions by introducing a novel dual-stage online tracking algorithm. The problem of data-association between tracks and detections, based on appearance, is often complicated by partial occlusion. In the first stage, we address the issue of occlusion with a novel method of robust data-association, that can be used to compute the appearance similarity between tracks and detections without the need for explicit knowledge of the occluded regions. In the second stage, broken tracks are linked based on motion and appearance, using an online-learned linking model. The online-learned motion-model for track linking uses the confident tracks from the first stage tracker as training examples. The new approach has been tested on the town centre dataset and has performance comparable with the present state-of-the-art
Resumo:
Sparse representation based visual tracking approaches have attracted increasing interests in the community in recent years. The main idea is to linearly represent each target candidate using a set of target and trivial templates while imposing a sparsity constraint onto the representation coefficients. After we obtain the coefficients using L1-norm minimization methods, the candidate with the lowest error, when it is reconstructed using only the target templates and the associated coefficients, is considered as the tracking result. In spite of promising system performance widely reported, it is unclear if the performance of these trackers can be maximised. In addition, computational complexity caused by the dimensionality of the feature space limits these algorithms in real-time applications. In this paper, we propose a real-time visual tracking method based on structurally random projection and weighted least squares techniques. In particular, to enhance the discriminative capability of the tracker, we introduce background templates to the linear representation framework. To handle appearance variations over time, we relax the sparsity constraint using a weighed least squares (WLS) method to obtain the representation coefficients. To further reduce the computational complexity, structurally random projection is used to reduce the dimensionality of the feature space while preserving the pairwise distances between the data points in the feature space. Experimental results show that the proposed approach outperforms several state-of-the-art tracking methods.
Resumo:
In this paper we extend the minimum-cost network flow approach to multi-target tracking, by incorporating a motion model, allowing the tracker to better cope with longterm occlusions and missed detections. In our new method, the tracking problem is solved iteratively: Firstly, an initial tracking solution is found without the help of motion information. Given this initial set of tracklets, the motion at each detection is estimated, and used to refine the tracking solution.
Finally, special edges are added to the tracking graph, allowing a further revised tracking solution to be found, where distant tracklets may be linked based on motion similarity. Our system has been tested on the PETS S2.L1 and Oxford town-center sequences, outperforming the baseline system, and achieving results comparable with the current state of the art.
Resumo:
Rationale
Previous research on attention bias in nondependent social drinkers has focused on adult samples with limited focus on the presence of attention bias for alcohol cues in adolescent social drinkers.
Objectives
The aim of this study was to examine the presence of alcohol attention bias in adolescents and the relationship of this cognitive bias to alcohol use and alcohol-related expectancies.
Methods
Attention bias in adolescent social drinkers and abstainers was measured using an eye tracker during exposure to alcohol and neutral cues. Questionnaires measured alcohol use and explicit alcohol expectancies.
Results
Adolescent social drinkers spent significantly more time fixating to alcohol stimuli compared to controls. Total fixation time to alcohol stimuli varied in accordance with level of alcohol consumption and was significantly associated with more positive alcohol expectancies. No evidence for automatic orienting to alcohol stimuli was found in adolescent social drinkers.
Conclusion
Attention bias in adolescent social drinkers appears to be underpinned by controlled attention suggesting that whilst participants in this study displayed alcohol attention bias comparable to that reported in adult studies, the bias has not developed to the point of automaticity. Initial fixations appeared to be driven by alternative attentional processes which are discussed further.
Resumo:
PURPOSE:
This study explored the gaze patterns of fully sighted and visually impaired subjects during the high-risk activity of crossing the street.
METHODS:
Gaze behavior of 12 fully sighted subjects, nine with visual impairment resulting from age-related macular degeneration and 12 with impairment resulting from glaucoma, was monitored using a portable eye tracker as they crossed at two unfamiliar intersections.
RESULTS:
All subject groups fixated primarily on vehicles and crossing elements but changed their fixation behavior as they moved from "walking to the curb" to "standing at the curb" and to "crossing the street." A comparison of where subjects fixated in the 4-second time period before crossing showed that the fully sighted who waited for the light to change fixated on the light, whereas the fully sighted who crossed early fixated primarily on vehicles. Visually impaired subjects crossing early or waiting for the light fixate primarily on vehicles.
CONCLUSIONS:
Vision status affects fixation allocation while performing the high-risk activity of street crossing. Crossing decision-making strategy corresponds to fixation behavior only for the fully sighted subjects.
Resumo:
PURPOSE: Subjects with significant peripheral field loss (PFL) self report difficulty in street crossing. In this study, we compared the traffic gap judgment ability of fully sighted and PFL subjects to determine whether accuracy in identifying crossable gaps was adversely affected because of field loss. Moreover, we explored the contribution of visual and nonvisual factors to traffic gap judgment ability. METHODS: Eight subjects with significant PFL as a result of advanced retinitis pigmentosa or glaucoma with binocular visual field <20 degrees and five age-matched normals (NV) were recruited. All subjects were required to judge when they perceived it was safe to cross at a 2-way 4-lane street while they stood on the curb. Eye movements were recorded by an eye tracker as the subjects performed the decision task. Movies of the eye-on-scene were made offline and fixation patterns were classified into either relevant or irrelevant. Subjects' street-crossing behavior, habitual approach to street crossing, and perceived difficulties were assessed. RESULTS: Compared with normal vision (NV) subjects, the PFL subjects identified 12% fewer crossable gaps while making 23% more errors by identifying a gap as crossable when it was too short (p < 0.05). The differences in traffic gap judgment ability of the PFL subjects might be explained by the significantly smaller fixation area (p = 0.006) and fewer fixations distributed to the relevant tasks (p = 0.001). The subjects' habitual approach to street crossing and perceived difficulties in street crossing (r > 0.60) were significantly correlated with traffic gap judgment performance. CONCLUSIONS: As a consequence of significant field loss, limited visual information about the traffic environment can be acquired, resulting in significantly reduced performance in judging safe crossable gaps. This poor traffic gap judgment ability in the PFL subjects raises important concerns for their safety when attempting to cross the street.
Resumo:
Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.