998 resultados para interactive tracking
Resumo:
We present a user supported tracking framework that combines automatic tracking with extended user input to create error free tracking results that are suitable for interactive video production. The goal of our approach is to keep the necessary user input as small as possible. In our framework, the user can select between different tracking algorithms - existing ones and new ones that are described in this paper. Furthermore, the user can automatically fuse the results of different tracking algorithms with our robust fusion approach. The tracked object can be marked in more than one frame, which can significantly improve the tracking result. After tracking, the user can validate the results in an easy way, thanks to the support of a powerful interpolation technique. The tracking results are iteratively improved until the complete track has been found. After the iterative editing process the tracking result of each object is stored in an interactive video file that can be loaded by our player for interactive videos.
Resumo:
Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.
Resumo:
Segmentation of novel or dynamic objects in a scene, often referred to as background sub- traction or foreground segmentation, is critical for robust high level computer vision applica- tions such as object tracking, object classifca- tion and recognition. However, automatic real- time segmentation for robotics still poses chal- lenges including global illumination changes, shadows, inter-re ections, colour similarity of foreground to background, and cluttered back- grounds. This paper introduces depth cues provided by structure from motion (SFM) for interactive segmentation to alleviate some of these challenges. In this paper, two prevailing interactive segmentation algorithms are com- pared; Lazysnapping [Li et al., 2004] and Grab- cut [Rother et al., 2004], both based on graph- cut optimisation [Boykov and Jolly, 2001]. The algorithms are extended to include depth cues rather than colour only as in the original pa- pers. Results show interactive segmentation based on colour and depth cues enhances the performance of segmentation with a lower er- ror with respect to ground truth.
Resumo:
This project investigates machine listening and improvisation in interactive music systems with the goal of improvising musically appropriate accompaniment to an audio stream in real-time. The input audio may be from a live musical ensemble, or playback of a recording for use by a DJ. I present a collection of robust techniques for machine listening in the context of Western popular dance music genres, and strategies of improvisation to allow for intuitive and musically salient interaction in live performance. The findings are embodied in a computational agent – the Jambot – capable of real-time musical improvisation in an ensemble setting. Conceptually the agent’s functionality is split into three domains: reception, analysis and generation. The project has resulted in novel techniques for addressing a range of issues in each of these domains. In the reception domain I present a novel suite of onset detection algorithms for real-time detection and classification of percussive onsets. This suite achieves reasonable discrimination between the kick, snare and hi-hat attacks of a standard drum-kit, with sufficiently low-latency to allow perceptually simultaneous triggering of accompaniment notes. The onset detection algorithms are designed to operate in the context of complex polyphonic audio. In the analysis domain I present novel beat-tracking and metre-induction algorithms that operate in real-time and are responsive to change in a live setting. I also present a novel analytic model of rhythm, based on musically salient features. This model informs the generation process, affording intuitive parametric control and allowing for the creation of a broad range of interesting rhythms. In the generation domain I present a novel improvisatory architecture drawing on theories of music perception, which provides a mechanism for the real-time generation of complementary accompaniment in an ensemble setting. All of these innovations have been combined into a computational agent – the Jambot, which is capable of producing improvised percussive musical accompaniment to an audio stream in real-time. I situate the architectural philosophy of the Jambot within contemporary debate regarding the nature of cognition and artificial intelligence, and argue for an approach to algorithmic improvisation that privileges the minimisation of cognitive dissonance in human-computer interaction. This thesis contains extensive written discussions of the Jambot and its component algorithms, along with some comparative analyses of aspects of its operation and aesthetic evaluations of its output. The accompanying CD contains the Jambot software, along with video documentation of experiments and performances conducted during the project.
Resumo:
In this paper, we propose an approach which attempts to solve the problem of surveillance event detection, assuming that we know the definition of the events. To facilitate the discussion, we first define two concepts. The event of interest refers to the event that the user requests the system to detect; and the background activities are any other events in the video corpus. This is an unsolved problem due to many factors as listed below: 1) Occlusions and clustering: The surveillance scenes which are of significant interest at locations such as airports, railway stations, shopping centers are often crowded, where occlusions and clustering of people are frequently encountered. This significantly affects the feature extraction step, and for instance, trajectories generated by object tracking algorithms are usually not robust under such a situation. 2) The requirement for real time detection: The system should process the video fast enough in both of the feature extraction and the detection step to facilitate real time operation. 3) Massive size of the training data set: Suppose there is an event that lasts for 1 minute in a video with a frame rate of 25fps, the number of frames for this events is 60X25 = 1500. If we want to have a training data set with many positive instances of the event, the video is likely to be very large in size (i.e. hundreds of thousands of frames or more). How to handle such a large data set is a problem frequently encountered in this application. 4) Difficulty in separating the event of interest from background activities: The events of interest often co-exist with a set of background activities. Temporal groundtruth typically very ambiguous, as it does not distinguish the event of interest from a wide range of co-existing background activities. However, it is not practical to annotate the locations of the events in large amounts of video data. This problem becomes more serious in the detection of multi-agent interactions, since the location of these events can often not be constrained to within a bounding box. 5) Challenges in determining the temporal boundaries of the events: An event can occur at any arbitrary time with an arbitrary duration. The temporal segmentation of events is difficult and ambiguous, and also affected by other factors such as occlusions.
Resumo:
This paper studies the development of a real-time stereovision system to track multiple infrared markers attached to a surgical instrument. Multiple stages of pipeline in field-programmable gate array (FPGA) are developed to recognize the targets in both left and right image planes and to give each target a unique label. The pipeline architecture includes a smoothing filter, an adaptive threshold module, a connected component labeling operation, and a centroid extraction process. A parallel distortion correction method is proposed and implemented in a dual-core DSP. A suitable kinematic model is established for the moving targets, and a novel set of parallel and interactive computation mechanisms is proposed to position and track the targets, which are carried out by a cross-computation method in a dual-core DSP. The proposed tracking system can track the 3-D coordinate, velocity, and acceleration of four infrared markers with a delay of 9.18 ms. Furthermore, it is capable of tracking a maximum of 110 infrared markers without frame dropping at a frame rate of 60 f/s. The accuracy of the proposed system can reach the scale of 0.37 mm RMS along the x- and y-directions and 0.45 mm RMS along the depth direction (the depth is from 0.8 to 0.45 m). The performance of the proposed system can meet the requirements of applications such as surgical navigation, which needs high real time and accuracy capability.
Resumo:
This paper studies the development of a real-time stereovision system to track multiple infrared markers attached to a surgical instrument. Multiple stages of pipeline in field-programmable gate array (FPGA) are developed to recognize the targets in both left and right image planes and to give each target a unique label. The pipeline architecture includes a smoothing filter, an adaptive threshold module, a connected component labeling operation, and a centroid extraction process. A parallel distortion correction method is proposed and implemented in a dual-core DSP. A suitable kinematic model is established for the moving targets, and a novel set of parallel and interactive computation mechanisms is proposed to position and track the targets, which are carried out by a cross-computation method in a dual-core DSP. The proposed tracking system can track the 3-D coordinate, velocity, and acceleration of four infrared markers with a delay of 9.18 ms. Furthermore, it is capable of tracking a maximum of 110 infrared markers without frame dropping at a frame rate of 60 f/s. The accuracy of the proposed system can reach the scale of 0.37 mm RMS along the x- and y-directions and 0.45 mm RMS along the depth direction (the depth is from 0.8 to 0.45 m). The performance of the proposed system can meet the requirements of applications such as surgical navigation, which needs high real time and accuracy capability.
Resumo:
Research in the field of sports performance is constantly developing new technology to help extract meaningful data to aid in understanding in a multitude of areas such as improving technical or motor performance. Video playback has previously been extensively used for exploring anticipatory behaviour. However, when using such systems, perception is not active. This loses key information that only emerges from the dynamics of the action unfolding over time and the active perception of the observer. Virtual reality (VR) may be used to overcome such issues. This paper presents the architecture and initial implementation of a novel VR cricket simulator, utilising state of the art motion capture technology (21 Vicon cameras capturing kinematic profile of elite bowlers) and emerging VR technology (Intersense IS-900 tracking combined with Qualisys Motion capture cameras with visual display via Sony Head Mounted Display HMZ-T1), applied in a cricket scenario to examine varying components of decision and action for cricket batters. This provided an experience with a high level of presence allowing for a real-time egocentric view-point to be presented to participants. Cyclical user-testing was carried out, utilisng both qualitative and quantitative approaches, with users reporting a positive experience in use of the system.
Resumo:
In this text, we present two stereo-based head tracking techniques along with a fast 3D model acquisition system. The first tracking technique is a robust implementation of stereo-based head tracking designed for interactive environments with uncontrolled lighting. We integrate fast face detection and drift reduction algorithms with a gradient-based stereo rigid motion tracking technique. Our system can automatically segment and track a user's head under large rotation and illumination variations. Precision and usability of this approach are compared with previous tracking methods for cursor control and target selection in both desktop and interactive room environments. The second tracking technique is designed to improve the robustness of head pose tracking for fast movements. Our iterative hybrid tracker combines constraints from the ICP (Iterative Closest Point) algorithm and normal flow constraint. This new technique is more precise for small movements and noisy depth than ICP alone, and more robust for large movements than the normal flow constraint alone. We present experiments which test the accuracy of our approach on sequences of real and synthetic stereo images. The 3D model acquisition system we present quickly aligns intensity and depth images, and reconstructs a textured 3D mesh. 3D views are registered with shape alignment based on our iterative hybrid tracker. We reconstruct the 3D model using a new Cubic Ray Projection merging algorithm which takes advantage of a novel data structure: the linked voxel space. We present experiments to test the accuracy of our approach on 3D face modelling using real-time stereo images.
Resumo:
The application of augmented reality (AR) technology for assembly guidance is a novel approach in the traditional manufacturing domain. In this paper, we propose an AR approach for assembly guidance using a virtual interactive tool that is intuitive and easy to use. The virtual interactive tool, termed the Virtual Interaction Panel (VirIP), involves two tasks: the design of the VirIPs and the real-time tracking of an interaction pen using a Restricted Coulomb Energy (RCE) neural network. The VirIP includes virtual buttons, which have meaningful assembly information that can be activated by an interaction pen during the assembly process. A visual assembly tree structure (VATS) is used for information management and assembly instructions retrieval in this AR environment. VATS is a hierarchical tree structure that can be easily maintained via a visual interface. This paper describes a typical scenario for assembly guidance using VirIP and VATS. The main characteristic of the proposed AR system is the intuitive way in which an assembly operator can easily step through a pre-defined assembly plan/sequence without the need of any sensor schemes or markers attached on the assembly components.
Resumo:
An aggregated farm-level index, the Agri-environmental Footprint Index (AFI), based on multiple criteria methods and representing a harmonised approach to evaluation of EU agri-environmental schemes is described. The index uses a common framework for the design and evaluation of policy that can be customised to locally relevant agri-environmental issues and circumstances. Evaluation can be strictly policy-focused, or broader and more holistic in that context-relevant assessment criteria that are not necessarily considered in the evaluated policy can nevertheless be incorporated. The Index structure is flexible, and can respond to diverse local needs. The process of Index construction is interactive, engaging farmers and other relevant stakeholders in a transparent decision-making process that can ensure acceptance of the outcome, help to forge an improved understanding of local agri-environmental priorities and potentially increase awareness of the critical role of farmers in environmental management. The structure of the AFI facilitates post-evaluation analysis of relative performance in different dimensions of the agri-environment, permitting identification of current strengths and weaknesses, and enabling future improvement in policy design. Quantification of the environmental impact of agriculture beyond the stated aims of policy using an 'unweighted' form of the AFI has potential as the basis of an ongoing system of environmental audit within a specified agricultural context. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
We introduce a classification-based approach to finding occluding texture boundaries. The classifier is composed of a set of weak learners, which operate on image intensity discriminative features that are defined on small patches and are fast to compute. A database that is designed to simulate digitized occluding contours of textured objects in natural images is used to train the weak learners. The trained classifier score is then used to obtain a probabilistic model for the presence of texture transitions, which can readily be used for line search texture boundary detection in the direction normal to an initial boundary estimate. This method is fast and therefore suitable for real-time and interactive applications. It works as a robust estimator, which requires a ribbon-like search region and can handle complex texture structures without requiring a large number of observations. We demonstrate results both in the context of interactive 2D delineation and of fast 3D tracking and compare its performance with other existing methods for line search boundary detection.
Resumo:
Visual telepresence seeks to extend existing teleoperative capability by supplying the operator with a 3D interactive view of the remote environment. This is achieved through the use of a stereo camera platform which, through appropriate 3D display devices, provides a distinct image to each eye of the operator, and which is slaved directly from the operator's head and eye movements. However, the resolution within current head mounted displays remains poor, thereby reducing the operator's visual acuity. This paper reports on the feasibility of incorporation of eye tracking to increase resolution and investigates the stability and control issues for such a system. Continuous domain and discrete simulations are presented which indicates that eye tracking provides a stable feedback loop for tracking applications, though some empirical testing (currently being initiated) of such a system will be required to overcome indicated stability problems associated with micro saccades of the human operator.
Resumo:
The incorporation of numerical weather predictions (NWP) into a flood warning system can increase forecast lead times from a few hours to a few days. A single NWP forecast from a single forecast centre, however, is insufficient as it involves considerable non-predictable uncertainties and can lead to a high number of false or missed warnings. Weather forecasts using multiple NWPs from various weather centres implemented on catchment hydrology can provide significantly improved early flood warning. The availability of global ensemble weather prediction systems through the ‘THORPEX Interactive Grand Global Ensemble’ (TIGGE) offers a new opportunity for the development of state-of-the-art early flood forecasting systems. This paper presents a case study using the TIGGE database for flood warning on a meso-scale catchment (4062 km2) located in the Midlands region of England. For the first time, a research attempt is made to set up a coupled atmospheric-hydrologic-hydraulic cascade system driven by the TIGGE ensemble forecasts. A probabilistic discharge and flood inundation forecast is provided as the end product to study the potential benefits of using the TIGGE database. The study shows that precipitation input uncertainties dominate and propagate through the cascade chain. The current NWPs fall short of representing the spatial precipitation variability on such a comparatively small catchment, which indicates need to improve NWPs resolution and/or disaggregating techniques to narrow down the spatial gap between meteorology and hydrology. The spread of discharge forecasts varies from centre to centre, but it is generally large and implies a significant level of uncertainties. Nevertheless, the results show the TIGGE database is a promising tool to forecast flood inundation, comparable with that driven by raingauge observation.