843 resultados para scene
Resumo:
A desktop tool for replay and analysis of gaze-enhanced multiparty virtual collaborative sessions is described. We linked three CAVE (TM)-like environments, creating a multiparty collaborative virtual space where avatars are animated with 3D gaze as well as head and hand motions in real time. Log files are recorded for subsequent playback and analysis Using the proposed software tool. During replaying the user can rotate the viewpoint and navigate in the simulated 3D scene. The playback mechanism relies on multiple distributed log files captured at every site. This structure enables an observer to experience latencies of movement and information transfer for every site as this is important fir conversation analysis. Playback uses an event-replay algorithm, modified to allow fast traversal of the scene by selective rendering of nodes, and to simulate fast random access. The tool's is analysis module can show each participant's 3D gaze points and areas where gaze has been concentrated.
Resumo:
Our eyes are input sensors which Provide our brains with streams of visual data. They have evolved to be extremely efficient, and they will constantly dart to-and-fro to rapidly build up a picture of the salient entities in a viewed scene. These actions are almost subconscious. However, they can provide telling signs of how the brain is decoding the visuals and call indicate emotional responses, prior to the viewer becoming aware of them. In this paper we discuss a method of tracking a user's eye movements, and Use these to calculate their gaze within an immersive virtual environment. We investigate how these gaze patterns can be captured and used to identify viewed virtual objects, and discuss how this can be used as a, natural method of interacting with the Virtual Environment. We describe a flexible tool that has been developed to achieve this, and detail initial validating applications that prove the concept.
Resumo:
In collaborative situations, eye gaze is a critical element of behavior which supports and fulfills many activities and roles. In current computer-supported collaboration systems, eye gaze is poorly supported. Even in a state-of-the-art video conferencing system such as the access grid, although one can see the face of the user, much of the communicative power of eye gaze is lost. This article gives an overview of some preliminary work that looks towards integrating eye gaze into an immersive collaborative virtual environment and assessing the impact that this would have on interaction between the users of such a system. Three experiments were conducted to assess the efficacy of eye gaze within immersive virtual environments. In each experiment, subjects observed on a large screen the eye-gaze behavior of an avatar. The eye-gaze behavior of that avatar had previously been recorded from a user with the use of a head-mounted eye tracker. The first experiment was conducted to assess the difference between users' abilities to judge what objects an avatar is looking at with only head gaze being viewed and also with eye- and head-gaze data being displayed. The results from the experiment show that eye gaze is of vital importance to the subjects, correctly identifying what a person is looking at in an immersive virtual environment. The second experiment examined whether a monocular or binocular eye-tracker would be required. This was examined by testing subjects' ability to identify where an avatar was looking from their eye direction alone, or by eye direction combined with convergence. This experiment showed that convergence had a significant impact on the subjects' ability to identify where the avatar was looking. The final experiment looked at the effects of stereo and mono-viewing of the scene, with the subjects being asked to identify where the avatar was looking. This experiment showed that there was no difference in the subjects' ability to detect where the avatar was gazing. This is followed by a description of how the eye-tracking system has been integrated into an immersive collaborative virtual environment and some preliminary results from the use of such a system.
Resumo:
This paper describes a real-time multi-camera surveillance system that can be applied to a range of application domains. This integrated system is designed to observe crowded scenes and has mechanisms to improve tracking of objects that are in close proximity. The four component modules described in this paper are (i) motion detection using a layered background model, (ii) object tracking based on local appearance, (iii) hierarchical object recognition, and (iv) fused multisensor object tracking using multiple features and geometric constraints. This integrated approach to complex scene tracking is validated against a number of representative real-world scenarios to show that robust, real-time analysis can be performed. Copyright (C) 2007 Hindawi Publishing Corporation. All rights reserved.
Resumo:
A new class of shape features for region classification and high-level recognition is introduced. The novel Randomised Region Ray (RRR) features can be used to train binary decision trees for object category classification using an abstract representation of the scene. In particular we address the problem of human detection using an over segmented input image. We therefore do not rely on pixel values for training, instead we design and train specialised classifiers on the sparse set of semantic regions which compose the image. Thanks to the abstract nature of the input, the trained classifier has the potential to be fast and applicable to extreme imagery conditions. We demonstrate and evaluate its performance in people detection using a pedestrian dataset.
Resumo:
Calibrated cameras are an extremely useful resource for computer vision scenarios. Typically, cameras are calibrated through calibration targets, measurements of the observed scene, or self-calibrated through features matched between cameras with overlapping fields of view. This paper considers an approach to camera calibration based on observations of a pedestrian and compares the resulting calibration to a commonly used approach requiring that measurements be made of the scene.
Resumo:
The present work presents a new method for activity extraction and reporting from video based on the aggregation of fuzzy relations. Trajectory clustering is first employed mainly to discover the points of entry and exit of mobiles appearing in the scene. In a second step, proximity relations between resulting clusters of detected mobiles and contextual elements from the scene are modeled employing fuzzy relations. These can then be aggregated employing typical soft-computing algebra. A clustering algorithm based on the transitive closure calculation of the fuzzy relations allows building the structure of the scene and characterises the ongoing different activities of the scene. Discovered activity zones can be reported as activity maps with different granularities thanks to the analysis of the transitive closure matrix. Taking advantage of the soft relation properties, activity zones and related activities can be labeled in a more human-like language. We present results obtained on real videos corresponding to apron monitoring in the Toulouse airport in France.
Resumo:
The Rank Forum on Vitamin D was held on 2nd and 3rd July 2009 at the University of Surrey, Guildford, UK. The workshop consisted of a series of scene-setting presentations to address the current issues and challenges concerning vitamin D and health, and included an open discussion focusing on the identification of the concentrations of serum 25-hydroxyvitamin D (25(OH)D) (a marker of vitamin D status) that may be regarded as optimal, and the implications this process may have in the setting of future dietary reference values for vitamin D in the UK. The Forum was in agreement with the fact that it is desirable for all of the population to have a serum 25(OH)D concentration above 25 nmol/l, but it discussed some uncertainty about the strength of evidence for the need to aim for substantially higher concentrations (25(OH)D concentrations . 75 nmol/l). Any discussion of ‘optimal’ concentration of serum 25(OH)D needs to define ‘optimal’ with care since it is important to consider the normal distribution of requirements and the vitamin D needs for a wide range of outcomes. Current UK reference values concentrate on the requirements of particular subgroups of the population; this differs from the approaches used in other European countries where a wider range of age groups tend to be covered. With the re-emergence of rickets and the public health burden of low vitamin D status being already apparent, there is a need for urgent action from policy makers and risk managers. The Forum highlighted concerns regarding the failure of implementation of existing strategies in the UK for achieving current vitamin D recommendations.
Resumo:
In this paper we report the degree of reliability of image sequences taken by off-the-shelf TV cameras for modeling camera rotation and reconstructing 3D structure using computer vision techniques. This is done in spite of the fact that computer vision systems usually use imaging devices that are specifically designed for the human vision. Our scenario consists of a static scene and a mobile camera moving through the scene. The scene is any long axial building dominated by features along the three principal orientations and with at least one wall containing prominent repetitive planar features such as doors, windows bricks etc. The camera is an ordinary commercial camcorder moving along the axial axis of the scene and is allowed to rotate freely within the range +/- 10 degrees in all directions. This makes it possible that the camera be held by a walking unprofessional cameraman with normal gait, or to be mounted on a mobile robot. The system has been tested successfully on sequence of images of a variety of structured, but fairly cluttered scenes taken by different walking cameramen. The potential application areas of the system include medicine, robotics and photogrammetry.
Resumo:
This paper presents an enhanced hypothesis verification strategy for 3D object recognition. A new learning methodology is presented which integrates the traditional dichotomic object-centred and appearance-based representations in computer vision giving improved hypothesis verification under iconic matching. The "appearance" of a 3D object is learnt using an eigenspace representation obtained as it is tracked through a scene. The feature representation implicitly models the background and the objects observed enabling the segmentation of the objects from the background. The method is shown to enhance model-based tracking, particularly in the presence of clutter and occlusion, and to provide a basis for identification. The unified approach is discussed in the context of the traffic surveillance domain. The approach is demonstrated on real-world image sequences and compared to previous (edge-based) iconic evaluation techniques.
Resumo:
This paper presents recent developments to a vision-based traffic surveillance system which relies extensively on the use of geometrical and scene context. Firstly, a highly parametrised 3-D model is reported, able to adopt the shape of a wide variety of different classes of vehicle (e.g. cars, vans, buses etc.), and its subsequent specialisation to a generic car class which accounts for commonly encountered types of car (including saloon, batchback and estate cars). Sample data collected from video images, by means of an interactive tool, have been subjected to principal component analysis (PCA) to define a deformable model having 6 degrees of freedom. Secondly, a new pose refinement technique using “active” models is described, able to recover both the pose of a rigid object, and the structure of a deformable model; an assessment of its performance is examined in comparison with previously reported “passive” model-based techniques in the context of traffic surveillance. The new method is more stable, and requires fewer iterations, especially when the number of free parameters increases, but shows somewhat poorer convergence. Typical applications for this work include robot surveillance and navigation tasks.
Resumo:
Retinal blurring resulting from the human eye's depth of focus has been shown to assist visual perception. Infinite focal depth within stereoscopically displayed virtual environments may cause undesirable effects, for instance, objects positioned at a distance in front of or behind the observer's fixation point will be perceived in sharp focus with large disparities thereby causing diplopia. Although published research on incorporation of synthetically generated Depth of Field (DoF) suggests that this might act as an enhancement to perceived image quality, no quantitative testimonies of perceptional performance gains exist. This may be due to the difficulty of dynamic generation of synthetic DoF where focal distance is actively linked to fixation distance. In this paper, such a system is described. A desktop stereographic display is used to project a virtual scene in which synthetically generated DoF is actively controlled from vergence-derived distance. A performance evaluation experiment on this system which involved subjects carrying out observations in a spatially complex virtual environment was undertaken. The virtual environment consisted of components interconnected by pipes on a distractive background. The subject was tasked with making an observation based on the connectivity of the components. The effects of focal depth variation in static and actively controlled focal distance conditions were investigated. The results and analysis are presented which show that performance gains may be achieved by addition of synthetic DoF. The merits of the application of synthetic DoF are discussed.
Resumo:
The objective of a Visual Telepresence System is to provide the operator with a high fidelity image from a remote stereo camera pair linked to a pan/tilt device such that the operator may reorient the camera position by use of head movement. Systems such as these which utilise virtual reality style helmet mounted displays have a number of limitations. The geometry of the camera positions and of the displays is generally fixed and is most suitable only for viewing elements of a scene at a particular distance. To address such limitations, a prototype system has been developed where the geometry of the displays and cameras is dynamically controlled by the eye movement of the operator. This paper explores why it is necessary to actively adjust the display system as well as the cameras and justifies the use of mechanical adjustment of the displays as an alternative to adjustment by electronic or image processing methods. The electronic and mechanical design is described including optical arrangements and control algorithms. The performance and accuracy of the system is assessed with respect to eye movement.
Resumo:
A near real-time flood detection algorithm giving a synoptic overview of the extent of flooding in both urban and rural areas, and capable of working during night-time and day-time even if cloud was present, could be a useful tool for operational flood relief management. The paper describes an automatic algorithm using high resolution Synthetic Aperture Radar (SAR) satellite data that builds on existing approaches, including the use of image segmentation techniques prior to object classification to cope with the very large number of pixels in these scenes. Flood detection in urban areas is guided by the flood extent derived in adjacent rural areas. The algorithm assumes that high resolution topographic height data are available for at least the urban areas of the scene, in order that a SAR simulator may be used to estimate areas of radar shadow and layover. The algorithm proved capable of detecting flooding in rural areas using TerraSAR-X with good accuracy, and in urban areas with reasonable accuracy. The accuracy was reduced in urban areas partly because of TerraSAR-X’s restricted visibility of the ground surface due to radar shadow and layover.
Resumo:
A near real-time flood detection algorithm giving a synoptic overview of the extent of flooding in both urban and rural areas, and capable of working during night-time and day-time even if cloud was present, could be a useful tool for operational flood relief management and flood forecasting. The paper describes an automatic algorithm using high resolution Synthetic Aperture Radar (SAR) satellite data that assumes that high resolution topographic height data are available for at least the urban areas of the scene, in order that a SAR simulator may be used to estimate areas of radar shadow and layover. The algorithm proved capable of detecting flooding in rural areas using TerraSAR-X with good accuracy, and in urban areas with reasonable accuracy.