941 resultados para Enunciation scene
Resumo:
This thesis investigates the problem of estimating the three-dimensional structure of a scene from a sequence of images. Structure information is recovered from images continuously using shading, motion or other visual mechanisms. A Kalman filter represents structure in a dense depth map. With each new image, the filter first updates the current depth map by a minimum variance estimate that best fits the new image data and the previous estimate. Then the structure estimate is predicted for the next time step by a transformation that accounts for relative camera motion. Experimental evaluation shows the significant improvement in quality and computation time that can be achieved using this technique.
Resumo:
Techniques, suitable for parallel implementation, for robust 2D model-based object recognition in the presence of sensor error are studied. Models and scene data are represented as local geometric features and robust hypothesis of feature matchings and transformations is considered. Bounds on the error in the image feature geometry are assumed constraining possible matchings and transformations. Transformation sampling is introduced as a simple, robust, polynomial-time, and highly parallel method of searching the space of transformations to hypothesize feature matchings. Key to the approach is that error in image feature measurement is explicitly accounted for. A Connection Machine implementation and experiments on real images are presented.
Resumo:
The problems under consideration center around the interpretation of binocular stereo disparity. In particular, the goal is to establish a set of mappings from stereo disparity to corresponding three-dimensional scene geometry. An analysis has been developed that shows how disparity information can be interpreted in terms of three-dimensional scene properties, such as surface depth, discontinuities, and orientation. These theoretical developments have been embodied in a set of computer algorithms for the recovery of scene geometry from input stereo disparity. The results of applying these algorithms to several disparity maps are presented. Comparisons are made to the interpretation of stereo disparity by biological systems.
Resumo:
A computer may gather a lot of information from its environment in an optical or graphical manner. A scene, as seen for instance from a TV camera or a picture, can be transformed into a symbolic description of points and lines or surfaces. This thesis describes several programs, written in the language CONVERT, for the analysis of such descriptions in order to recognize, differentiate and identify desired objects or classes of objects in the scene. Examples are given in each case. Although the recognition might be in terms of projections of 2-dim and 3-dim objects, we do not deal with stereoscopic information. One of our programs (Polybrick) identifies parallelepipeds in a scene which may contain partially hidden bodies and non-parallelepipedic objects. The program TD works mainly with 2-dimensional figures, although under certain conditions successfully identifies 3-dim objects. Overlapping objects are identified when they are transparent. A third program, DT, works with 3-dim and 2-dim objects, and does not identify objects which are not completely seen. Important restrictions and suppositions are: (a) the input is assumed perfect (noiseless), and in a symbolic format; (b) no perspective deformation is considered. A portion of this thesis is devoted to the study of models (symbolic representations) of the objects we want to identify; different schemes, some of them already in use, are discussed. Focusing our attention on the more general problem of identification of general objects when they substantially overlap, we propose some schemes for their recognition, and also analyze some problems that are met.
Resumo:
What are the characteristics of the process by which an intent is transformed into a plan and then a program? How is a program debugged? This paper analyzes these questions in the context of understanding simple turtle programs. To understand and debug a program, a description of its intent is required. For turtle programs, this is a model of the desired geometric picture. a picture language is provided for this purpose. Annotation is necessary for documenting the performance of a program in such a way that the system can examine the procedures behavior as well as consider hypothetical lines of development due to tentative debugging edits. A descriptive framework representing both causality and teleology is developed. To understand the relation between program and model, the plan must be known. The plan is a description of the methodology for accomplishing the model. Concepts are explicated for translating the global intent of a declarative model into the local imperative code of a program. Given the plan, model and program, the system can interpret the picture and recognize inconsistencies. The description of the discrepancies between the picture actually produced by the program and the intended scene is the input to a debugging system. Repair of the program is based on a combination of general debugging techniques and specific fixing knowledge associated with the geometric model primitives. In both the plan and repairing the bugs, the system exhibits an interesting style of analysis. It is capable of debugging itself and reformulating its analysis of a plan or bug in response to self-criticism. In this fashion, it can qualitatively reformulate its theory of the program or error to account for surprises or anomalies.
Resumo:
A system for visual recognition is described, with implications for the general problem of representation of knowledge to assist control. The immediate objective is a computer system that will recognize objects in a visual scene, specifically hammers. The computer receives an array of light intensities from a device like a television camera. It is to locate and identify the hammer if one is present. The computer must produce from the numerical "sensory data" a symbolic description that constitutes its perception of the scene. Of primary concern is the control of the recognition process. Control decisions should be guided by the partial results obtained on the scene. If a hammer handle is observed this should suggest that the handle is part of a hammer and advise where to look for the hammer head. The particular knowledge that a handle has been found combines with general knowledge about hammers to influence the recognition process. This use of knowledge to direct control is denoted here by the term "active knowledge". A descriptive formalism is presented for visual knowledge which identifies the relationships relevant to the active use of the knowledge. A control structure is provided which can apply knowledge organized in this fashion actively to the processing of a given scene.
Resumo:
The research reported here concerns the principles used to automatically generate three-dimensional representations from line drawings of scenes. The computer programs involved look at scenes which consist of polyhedra and which may contain shadows and various kinds of coincidentally aligned scene features. Each generated description includes information about edge shape (convex, concave, occluding, shadow, etc.), about the type of illumination for each region (illuminated, projected shadow, or oriented away from the light source), and about the spacial orientation of regions. The methods used are based on the labeling schemes of Huffman and Clowes; this research provides a considerable extension to their work and also gives theoretical explanations to the heuristic scene analysis work of Guzman, Winston, and others.
Resumo:
This paper describes an experiment developed to study the performance of virtual agent animated cues within digital interfaces. Increasingly, agents are used in virtual environments as part of the branding process and to guide user interaction. However, the level of agent detail required to establish and enhance efficient allocation of attention remains unclear. Although complex agent motion is now possible, it is costly to implement and so should only be routinely implemented if a clear benefit can be shown. Pevious methods of assessing the effect of gaze-cueing as a solution to scene complexity have relied principally on two-dimensional static scenes and manual peripheral inputs. Two experiments were run to address the question of agent cues on human-computer interfaces. Both experiments measured the efficiency of agent cues analyzing participant responses either by gaze or by touch respectively. In the first experiment, an eye-movement recorder was used to directly assess the immediate overt allocation of attention by capturing the participant’s eyefixations following presentation of a cueing stimulus. We found that a fully animated agent could speed up user interaction with the interface. When user attention was directed using a fully animated agent cue, users responded 35% faster when compared with stepped 2-image agent cues, and 42% faster when compared with a static 1-image cue. The second experiment recorded participant responses on a touch screen using same agent cues. Analysis of touch inputs confirmed the results of gaze-experiment, where fully animated agent made shortest time response with a slight decrease on the time difference comparisons. Responses to fully animated agent were 17% and 20% faster when compared with 2-image and 1-image cue severally. These results inform techniques aimed at engaging users’ attention in complex scenes such as computer games and digital transactions within public or social interaction contexts by demonstrating the benefits of dynamic gaze and head cueing directly on the users’ eye movements and touch responses.
Resumo:
Huelse, M, Barr, D R W, Dudek, P: Cellular Automata and non-static image processing for embodied robot systems on a massively parallel processor array. In: Adamatzky, A et al. (eds) AUTOMATA 2008, Theory and Applications of Cellular Automata. Luniver Press, 2008, pp. 504-510. Sponsorship: EPSRC
Resumo:
Chapter 1 RAE2008
Resumo:
Wydział Neofilologii
Resumo:
Space carving has emerged as a powerful method for multiview scene reconstruction. Although a wide variety of methods have been proposed, the quality of the reconstruction remains highly-dependent on the photometric consistency measure, and the threshold used to carve away voxels. In this paper, we present a novel photo-consistency measure that is motivated by a multiset variant of the chamfer distance. The new measure is robust to high amounts of within-view color variance and also takes into account the projection angles of back-projected pixels. Another critical issue in space carving is the selection of the photo-consistency threshold used to determine what surface voxels are kept or carved away. In this paper, a reliable threshold selection technique is proposed that examines the photo-consistency values at contour generator points. Contour generators are points that lie on both the surface of the object and the visual hull. To determine the threshold, a percentile ranking of the photo-consistency values of these generator points is used. This improved technique is applicable to a wide variety of photo-consistency measures, including the new measure presented in this paper. Also presented in this paper is a method to choose between photo-consistency measures, and voxel array resolutions prior to carving using receiver operating characteristic (ROC) curves.
Resumo:
Partial occlusions are commonplace in a variety of real world computer vision applications: surveillance, intelligent environments, assistive robotics, autonomous navigation, etc. While occlusion handling methods have been proposed, most methods tend to break down when confronted with numerous occluders in a scene. In this paper, a layered image-plane representation for tracking people through substantial occlusions is proposed. An image-plane representation of motion around an object is associated with a pre-computed graphical model, which can be instantiated efficiently during online tracking. A global state and observation space is obtained by linking transitions between layers. A Reversible Jump Markov Chain Monte Carlo approach is used to infer the number of people and track them online. The method outperforms two state-of-the-art methods for tracking over extended occlusions, given videos of a parking lot with numerous vehicles and a laboratory with many desks and workstations.
Resumo:
Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a web cam in a user’s home. Moreover, the signers’ clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The Histogram of Oriented Gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a Support Vector Machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the Vocabulary Guided Pyramid Match Kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.
Resumo:
We introduce a method for recovering the spatial and temporal alignment between two or more views of objects moving over a ground plane. Existing approaches either assume that the streams are globally synchronized, so that only solving the spatial alignment is needed, or that the temporal misalignment is small enough so that exhaustive search can be performed. In contrast, our approach can recover both the spatial and temporal alignment. We compute for each trajectory a number of interesting segments, and we use their description to form putative matches between trajectories. Each pair of corresponding interesting segments induces a temporal alignment, and defines an interval of common support across two views of an object that is used to recover the spatial alignment. Interesting segments and their descriptors are defined using algebraic projective invariants measured along the trajectories. Similarity between interesting segments is computed taking into account the statistics of such invariants. Candidate alignment parameters are verified checking the consistency, in terms of the symmetric transfer error, of all the putative pairs of corresponding interesting segments. Experiments are conducted with two different sets of data, one with two views of an outdoor scene featuring moving people and cars, and one with four views of a laboratory sequence featuring moving radio-controlled cars.