839 resultados para Moving Objects
Resumo:
The human visual ability to perceive depth looks like a puzzle. We perceive three-dimensional spatial information quickly and efficiently by using the binocular stereopsis of our eyes and, what is mote important the learning of the most common objects which we achieved through living. Nowadays, modelling the behaviour of our brain is a fiction, that is why the huge problem of 3D perception and further, interpretation is split into a sequence of easier problems. A lot of research is involved in robot vision in order to obtain 3D information of the surrounded scene. Most of this research is based on modelling the stereopsis of humans by using two cameras as if they were two eyes. This method is known as stereo vision and has been widely studied in the past and is being studied at present, and a lot of work will be surely done in the future. This fact allows us to affirm that this topic is one of the most interesting ones in computer vision. The stereo vision principle is based on obtaining the three dimensional position of an object point from the position of its projective points in both camera image planes. However, before inferring 3D information, the mathematical models of both cameras have to be known. This step is known as camera calibration and is broadly describes in the thesis. Perhaps the most important problem in stereo vision is the determination of the pair of homologue points in the two images, known as the correspondence problem, and it is also one of the most difficult problems to be solved which is currently investigated by a lot of researchers. The epipolar geometry allows us to reduce the correspondence problem. An approach to the epipolar geometry is describes in the thesis. Nevertheless, it does not solve it at all as a lot of considerations have to be taken into account. As an example we have to consider points without correspondence due to a surface occlusion or simply due to a projection out of the camera scope. The interest of the thesis is focused on structured light which has been considered as one of the most frequently used techniques in order to reduce the problems related lo stereo vision. Structured light is based on the relationship between a projected light pattern its projection and an image sensor. The deformations between the pattern projected into the scene and the one captured by the camera, permits to obtain three dimensional information of the illuminated scene. This technique has been widely used in such applications as: 3D object reconstruction, robot navigation, quality control, and so on. Although the projection of regular patterns solve the problem of points without match, it does not solve the problem of multiple matching, which leads us to use hard computing algorithms in order to search the correct matches. In recent years, another structured light technique has increased in importance. This technique is based on the codification of the light projected on the scene in order to be used as a tool to obtain an unique match. Each token of light is imaged by the camera, we have to read the label (decode the pattern) in order to solve the correspondence problem. The advantages and disadvantages of stereo vision against structured light and a survey on coded structured light are related and discussed. The work carried out in the frame of this thesis has permitted to present a new coded structured light pattern which solves the correspondence problem uniquely and robust. Unique, as each token of light is coded by a different word which removes the problem of multiple matching. Robust, since the pattern has been coded using the position of each token of light with respect to both co-ordinate axis. Algorithms and experimental results are included in the thesis. The reader can see examples 3D measurement of static objects, and the more complicated measurement of moving objects. The technique can be used in both cases as the pattern is coded by a single projection shot. Then it can be used in several applications of robot vision. Our interest is focused on the mathematical study of the camera and pattern projector models. We are also interested in how these models can be obtained by calibration, and how they can be used to obtained three dimensional information from two correspondence points. Furthermore, we have studied structured light and coded structured light, and we have presented a new coded structured light pattern. However, in this thesis we started from the assumption that the correspondence points could be well-segmented from the captured image. Computer vision constitutes a huge problem and a lot of work is being done at all levels of human vision modelling, starting from a)image acquisition; b) further image enhancement, filtering and processing, c) image segmentation which involves thresholding, thinning, contour detection, texture and colour analysis, and so on. The interest of this thesis starts in the next step, usually known as depth perception or 3D measurement.
Resumo:
Model based vision allows use of prior knowledge of the shape and appearance of specific objects to be used in the interpretation of a visual scene; it provides a powerful and natural way to enforce the view consistency constraint. A model based vision system has been developed within ESPRIT VIEWS: P2152 which is able to classify and track moving objects (cars and other vehicles) in complex, cluttered traffic scenes. The fundamental basis of the method has been previously reported. This paper presents recent developments which have extended the scope of the system to include (i) multiple cameras, (ii) variable camera geometry, and (iii) articulated objects. All three enhancements have easily been accommodated within the original model-based approach
Resumo:
Several pixel-based people counting methods have been developed over the years. Among these the product of scale-weighted pixel sums and a linear correlation coefficient is a popular people counting approach. However most approaches have paid little attention to resolving the true background and instead take all foreground pixels into account. With large crowds moving at varying speeds and with the presence of other moving objects such as vehicles this approach is prone to problems. In this paper we present a method which concentrates on determining the true-foreground, i.e. human-image pixels only. To do this we have proposed, implemented and comparatively evaluated a human detection layer to make people counting more robust in the presence of noise and lack of empty background sequences. We show the effect of combining human detection with a pixel-map based algorithm to i) count only human-classified pixels and ii) prevent foreground pixels belonging to humans from being absorbed into the background model. We evaluate the performance of this approach on the PETS 2009 dataset using various configurations of the proposed methods. Our evaluation demonstrates that the basic benchmark method we implemented can achieve an accuracy of up to 87% on sequence ¿S1.L1 13-57 View 001¿ and our proposed approach can achieve up to 82% on sequence ¿S1.L3 14-33 View 001¿ where the crowd stops and the benchmark accuracy falls to 64%.
Resumo:
Within the context of active vision, scant attention has been paid to the execution of motion saccades—rapid re-adjustments of the direction of gaze to attend to moving objects. In this paper we first develop a methodology for, and give real-time demonstrations of, the use of motion detection and segmentation processes to initiate capture saccades towards a moving object. The saccade is driven by both position and velocity of the moving target under the assumption of constant target velocity, using prediction to overcome the delay introduced by visual processing. We next demonstrate the use of a first order approximation to the segmented motion field to compute bounds on the time-to-contact in the presence of looming motion. If the bound falls below a safe limit, a panic saccade is fired, moving the camera away from the approaching object. We then describe the use of image motion to realize smooth pursuit, tracking using velocity information alone, where the camera is moved so as to null a single constant image motion fitted within a central image region. Finally, we glue together capture saccades with smooth pursuit, thus effecting changes in both what is being attended to and how it is being attended to. To couple the different visual activities of waiting, saccading, pursuing and panicking, we use a finite state machine which provides inherent robustness outside of visual processing and provides a means of making repeated exploration. We demonstrate in repeated trials that the transition from saccadic motion to tracking is more likely to succeed using position and velocity control, than when using position alone.
Resumo:
Several accounts put forth to explain the flash-lag effect (FLE) rely mainly on either spatial or temporal mechanisms. Here we investigated the relationship between these mechanisms by psychophysical and theoretical approaches. In a first experiment we assessed the magnitudes of the FLE and temporal-order judgments performed under identical visual stimulation. The results were interpreted by means of simulations of an artificial neural network, that wits also employed to make predictions concerning the F LE. The model predicted that a spatio-temporal mislocalisation would emerge from two, continuous and abrupt-onset, moving stimuli. Additionally, a straightforward prediction of the model revealed that the magnitude of this mislocalisation should be task-dependent, increasing when the use of the abrupt-onset moving stimulus switches from a temporal marker only to both temporal and spatial markers. Our findings confirmed the model`s predictions and point to an indissoluble interplay between spatial facilitation and processing delays in the FLE.
Resumo:
Automated virtual camera control has been widely used in animation and interactive virtual environments. We have developed a multiple sparse camera based free view video system prototype that allows users to control the position and orientation of a virtual camera, enabling the observation of a real scene in three dimensions (3D) from any desired viewpoint. Automatic camera control can be activated to follow selected objects by the user. Our method combines a simple geometric model of the scene composed of planes (virtual environment), augmented with visual information from the cameras and pre-computed tracking information of moving targets to generate novel perspective corrected 3D views of the virtual camera and moving objects. To achieve real-time rendering performance, view-dependent textured mapped billboards are used to render the moving objects at their correct locations and foreground masks are used to remove the moving objects from the projected video streams. The current prototype runs on a PC with a common graphics card and can generate virtual 2D views from three cameras of resolution 768 x 576 with several moving objects at about 11 fps. (C)2011 Elsevier Ltd. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The problem of dynamic camera calibration considering moving objects in close range environments using straight lines as references is addressed. A mathematical model for the correspondence of a straight line in the object and image spaces is discussed. This model is based on the equivalence between the vector normal to the interpretation plane in the image space and the vector normal to the rotated interpretation plane in the object space. In order to solve the dynamic camera calibration, Kalman Filtering is applied; an iterative process based on the recursive property of the Kalman Filter is defined, using the sequentially estimated camera orientation parameters to feedback the feature extraction process in the image. For the dynamic case, e.g. an image sequence of a moving object, a state prediction and a covariance matrix for the next instant is obtained using the available estimates and the system model. Filtered state estimates can be computed from these predicted estimates using the Kalman Filtering approach and based on the system model parameters with good quality, for each instant of an image sequence. The proposed approach was tested with simulated and real data. Experiments with real data were carried out in a controlled environment, considering a sequence of images of a moving cube in a linear trajectory over a flat surface.
Avaliação de uma técnica para geração de modelos digitais de superfície utilizando múltiplas imagens
Resumo:
The efficient generation of digital surface model (DSM) from optical images has been explored for many years and the results are dependent on the project characteristics (image resolution, size of overlap between images, among others), of the image matching techniques and the computer capabilities for the image processing. The points generated from image matching have a direct impact on the quality of the DSM and, consequently, influence the need for the costly step of edition. This work aims at assessing experimentally a technique for DSM generation by matching of multiple images (two or more) simultaneously using the vertical line locus method (VLL). The experiments were performed with six images of the urban area of Presidente Prudente/SP, with a ground sample distance (GSD) of approximately 7cm. DSMs of a small area with homogeneous texture, repetitive pattern, moving objects including shadows and trees were generated to assess the quality of the developed procedure. This obtained DSM was compared to cloud points acquired by LASER (Light Amplification by Simulated Emission of Radiation) scanning as wells as with a DSM generated by Leica Photogrammetric Suite (LPS) software. The accomplished results showed that the MDS generated by the implemented technique has a geometric quality compatible with the reference models.
Resumo:
Máster Universitario en Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)
Resumo:
People remember moving objects as having moved farther along in their path of motion than is actually the case; this is known as representational momentum (RM). Some authors have argued that RM is an internalization of environmental properties such as physical momentum and gravity. Five experiments demonstrated that a similar memory bias could not have been learned from the environment. For right-handed Ss, objects apparently moving to the right engendered a larger memory bias in the direction of motion than did those moving to the left. This effect, clearly not derived from real-world lateral asymmetries, was relatively insensitive to changes in apparent velocity and the type of object used, and it may be confined to objects in the left half of visual space. The left–right effect may be an intrinsic property of the visual operating system, which may in turn have affected certain cultural conventions of left and right in art and other domains.
Resumo:
Previous studies on motion perception revealed motion-processing brain areas sensitive to changes in luminance and texture (low-level) and changes in salience (high-level). The present functional magnetic resonance imaging (fMRI) study focused on motion standstill. This phenomenon, occurring at fast presentation frequencies of visual moving objects that are perceived as static, has not been previously explored by neuroimaging techniques. Thirteen subjects were investigated while perceiving apparent motion at 4 Hz, at 30 Hz (motion standstill), isoluminant static and flickering stimuli, fixation cross, and blank screen, presented randomly and balanced for rapid event-related fMRI design. Blood oxygenation level-dependent (BOLD) signal in the occipito-temporal brain region MT/V5 increased during apparent motion perception. Here we could demonstrate that brain areas like the posterior part of the right inferior parietal lobule (IPL) demonstrated higher BOLD-signal during motion standstill. These findings suggest that the activation of higher-order motion areas is elicited by apparent motion at high presentation rates (motion standstill). We interpret this observation as a manifestation of an orienting reaction in IPL towards stimulus motion that might be detected but not resolved by other motion-processing areas (i.e., MT/V5).
Resumo:
The aging population has become a burning issue for all modern societies around the world recently. There are two important issues existing now to be solved. One is how to continuously monitor the movements of those people having suffered a stroke in natural living environment for providing more valuable feedback to guide clinical interventions. The other one is how to guide those old people effectively when they are at home or inside other buildings and to make their life easier and convenient. Therefore, human motion tracking and navigation have been active research fields with the increasing number of elderly people. However, motion capture has been extremely challenging to go beyond laboratory environments and obtain accurate measurements of human physical activity especially in free-living environments, and navigation in free-living environments also poses some problems such as the denied GPS signal and the moving objects commonly presented in free-living environments. This thesis seeks to develop new technologies to enable accurate motion tracking and positioning in free-living environments. This thesis comprises three specific goals using our developed IMU board and the camera from the imaging source company: (1) to develop a robust and real-time orientation algorithm using only the measurements from IMU; (2) to develop a robust distance estimation in static free-living environments to estimate people’s position and navigate people in static free-living environments and simultaneously the scale ambiguity problem, usually appearing in the monocular camera tracking, is solved by integrating the data from the visual and inertial sensors; (3) in case of moving objects viewed by the camera existing in free-living environments, to firstly design a robust scene segmentation algorithm and then respectively estimate the motion of the vIMU system and moving objects. To achieve real-time orientation tracking, an Adaptive-Gain Orientation Filter (AGOF) is proposed in this thesis based on the basic theory of deterministic approach and frequency-based approach using only measurements from the newly developed MARG (Magnet, Angular Rate, and Gravity) sensors. To further obtain robust positioning, an adaptive frame-rate vision-aided IMU system is proposed to develop and implement fast vIMU ego-motion estimation algorithms, where the orientation is estimated in real time from MARG sensors in the first step and then used to estimate the position based on the data from visual and inertial sensors. In case of the moving objects viewed by the camera existing in free-living environments, a robust scene segmentation algorithm is firstly proposed to obtain position estimation and simultaneously the 3D motion of moving objects. Finally, corresponding simulations and experiments have been carried out.
Resumo:
Information Centric Networking (ICN) as an emerging paradigm for the Future Internet has initially been rather focusing on bandwidth savings in wired networks, but there might also be some significant potential to support communication in mobile wireless networks as well as opportunistic network scenarios, where end systems have spontaneous but time-limited contact to exchange data. This chapter addresses the reasoning why ICN has an important role in mobile and opportunistic networks by identifying several challenges in mobile and opportunistic Information-Centric Networks and discussing appropriate solutions for them. In particular, it discusses the issues of receiver and source mobility. Source mobility needs special attention. Solutions based on routing protocol extensions, indirection, and separation of name resolution and data transfer are discussed. Moreover, the chapter presents solutions for problems in opportunistic Information-Centric Networks. Among those are mechanisms for efficient content discovery in neighbour nodes, resume mechanisms to recover from intermittent connectivity disruptions, a novel agent delegation mechanisms to offload content discovery and delivery to mobile agent nodes, and the exploitation of overhearing to populate routing tables of mobile nodes. Some preliminary performance evaluation results of these developed mechanisms are provided.
Resumo:
In sports games, it is often necessary to perceive a large number of moving objects (e.g., the ball and players). In this context, the role of peripheral vision for processing motion information in the periphery is often discussed especially when motor responses are required. In an attempt to test the basal functionality of peripheral vision in those sports-games situations, a Multiple Object Tracking (MOT) task that requires to track a certain number of targets amidst distractors, was chosen. Participants’ primary task was to recall four targets (out of 10 rectangular stimuli) after six seconds of quasi-random motion. As a second task, a button had to be pressed if a target change occurred (Exp 1: stop vs. form change to a diamond for 0.5 s; Exp 2: stop vs. slowdown for 0.5 s). While eccentricities of changes (5-10° vs. 15-20°) were manipulated, decision accuracy (recall and button press correct), motor response time as well as saccadic reaction time were calculated as dependent variables. Results show that participants indeed used peripheral vision to detect changes, because either no or very late saccades to the changed target were executed in correct trials. Moreover, a saccade was more often executed when eccentricities were small. Response accuracies were higher and response times were lower in the stop conditions of both experiments while larger eccentricities led to higher response times in all conditions. Summing up, it could be shown that monitoring targets and detecting changes can be processed by peripheral vision only and that a monitoring strategy on the basis of peripheral vision may be the optimal one as saccades may be afflicted with certain costs. Further research is planned to address the question whether this functionality is also evident in sports tasks.