988 resultados para dynamic scenes
Resumo:
One of the most widely used techniques in computer vision for foreground detection is to model each background pixel as a Mixture of Gaussians (MoG). While this is effective for a static camera with a fixed or a slowly varying background, it fails to handle any fast, dynamic movement in the background. In this paper, we propose a generalised framework, called region-based MoG (RMoG), that takes into consideration neighbouring pixels while generating the model of the observed scene. The model equations are derived from Expectation Maximisation theory for batch mode, and stochastic approximation is used for online mode updates. We evaluate our region-based approach against ten sequences containing dynamic backgrounds, and show that the region-based approach provides a performance improvement over the traditional single pixel MoG. For feature and region sizes that are equal, the effect of increasing the learning rate is to reduce both true and false positives. Comparison with four state-of-the art approaches shows that RMoG outperforms the others in reducing false positives whilst still maintaining reasonable foreground definition. Lastly, using the ChangeDetection (CDNet 2014) benchmark, we evaluated RMoG against numerous surveillance scenes and found it to amongst the leading performers for dynamic background scenes, whilst providing comparable performance for other commonly occurring surveillance scenes.
Resumo:
Interactive ray tracing of non-trivial scenes is just becoming feasible on single graphics processing units (GPU). Recent work in this area focuses on building effective acceleration structures, which work well under the constraints of current GPUs. Most approaches are targeted at static scenes and only allow navigation in the virtual scene. So far support for dynamic scenes has not been considered for GPU implementations. We have developed a GPU-based ray tracing system for dynamic scenes consisting of a set of individual objects. Each object may independently move around, but its geometry and topology are static.
Resumo:
For broadcasting purposes MIXED REALITY, the combination of real and virtual scene content, has become ubiquitous nowadays. Mixed Reality recording still requires expensive studio setups and is often limited to simple color keying. We present a system for Mixed Reality applications which uses depth keying and provides threedimensional mixing of real and artificial content. It features enhanced realism through automatic shadow computation which we consider a core issue to obtain realism and a convincing visual perception, besides the correct alignment of the two modalities and correct occlusion handling. Furthermore we present a possibility to support placement of virtual content in the scene. Core feature of our system is the incorporation of a TIME-OF-FLIGHT (TOF)-camera device. This device delivers real-time depth images of the environment at a reasonable resolution and quality. This camera is used to build a static environment model and it also allows correct handling of mutual occlusions between real and virtual content, shadow computation and enhanced content planning. The presented system is inexpensive, compact, mobile, flexible and provides convenient calibration procedures. Chroma-keying is replaced by depth-keying which is efficiently performed on the GRAPHICS PROCESSING UNIT (GPU) by the usage of an environment model and the current ToF-camera image. Automatic extraction and tracking of dynamic scene content is herewith performed and this information is used for planning and alignment of virtual content. An additional sustainable feature is that depth maps of the mixed content are available in real-time, which makes the approach suitable for future 3DTV productions. The presented paper gives an overview of the whole system approach including camera calibration, environment model generation, real-time keying and mixing of virtual and real content, shadowing for virtual content and dynamic object tracking for content planning.
Resumo:
Exposure Fusion and other HDR techniques generate well-exposed images from a bracketed image sequence while reproducing a large dynamic range that far exceeds the dynamic range of a single exposure. Common to all these techniques is the problem that the smallest movements in the captured images generate artefacts (ghosting) that dramatically affect the quality of the final images. This limits the use of HDR and Exposure Fusion techniques because common scenes of interest are usually dynamic. We present a method that adapts Exposure Fusion, as well as standard HDR techniques, to allow for dynamic scene without introducing artefacts. Our method detects clusters of moving pixels within a bracketed exposure sequence with simple binary operations. We show that the proposed technique is able to deal with a large amount of movement in the scene and different movement configurations. The result is a ghost-free and highly detailed exposure fused image at a low computational cost.
Resumo:
Multi-view head-pose estimation in low-resolution, dynamic scenes is difficult due to blurred facial appearance and perspective changes as targets move around freely in the environment. Under these conditions, acquiring sufficient training examples to learn the dynamic relationship between position, face appearance and head-pose can be very expensive. Instead, a transfer learning approach is proposed in this work. Upon learning a weighted-distance function from many examples where the target position is fixed, we adapt these weights to the scenario where target positions are varying. The adaptation framework incorporates reliability of the different face regions for pose estimation under positional variation, by transforming the target appearance to a canonical appearance corresponding to a reference scene location. Experimental results confirm effectiveness of the proposed approach, which outperforms state-of-the-art by 9.5% under relevant conditions. To aid further research on this topic, we also make DPOSE- a dynamic, multi-view head-pose dataset with ground-truth publicly available with this paper.
Resumo:
Recovering the motion of a non-rigid body from a set of monocular images permits the analysis of dynamic scenes in uncontrolled environments. However, the extension of factorisation algorithms for rigid structure from motion to the low-rank non-rigid case has proved challenging. This stems from the comparatively hard problem of finding a linear “corrective transform” which recovers the projection and structure matrices from an ambiguous factorisation. We elucidate that this greater difficulty is due to the need to find multiple solutions to a non-trivial problem, casting a number of previous approaches as alleviating this issue by either a) introducing constraints on the basis, making the problems nonidentical, or b) incorporating heuristics to encourage a diverse set of solutions, making the problems inter-dependent. While it has previously been recognised that finding a single solution to this problem is sufficient to estimate cameras, we show that it is possible to bootstrap this partial solution to find the complete transform in closed-form. However, we acknowledge that our method minimises an algebraic error and is thus inherently sensitive to deviation from the low-rank model. We compare our closed-form solution for non-rigid structure with known cameras to the closed-form solution of Dai et al. [1], which we find to produce only coplanar reconstructions. We therefore make the recommendation that 3D reconstruction error always be measured relative to a trivial reconstruction such as a planar one.
Resumo:
The aim of this research is to present, interpret and analyze the phenomenon of pilgrimage in a contemporary, suburban Greek nunnery, and to elucidate the different functions that the present-day convent has for its pilgrims. The scope of the study is limited to a case nunnery, the convent of the Dormition of the Virgin, which is situated in Northern Greece. The main corpus of data utilized for this work consists of 25 interviews and field diary material, which was collected in the convent mainly during the academic year 2002-2003 and summer 2005 by means of participant observation and unstructured thematic interviewing. It must be noted that most Greek nunneries are not really communities of hermits but institutions that operate in complex interaction with the surrounding society. Thus, the main interest in this study is in the interaction between pilgrims and nuns. Pilgrimage is seen here as a significant and concrete form of interaction, which in fact makes the contemporary nunneries dynamic scenes of religious, social and sometimes even political life. The focus of the analysis is on the pilgrims’ experiences, reflected upon on the levels of the individual, the Church institution, and society in general. This study shows that pilgrimage in a suburban nunnery, such as the convent of the Dormition, can be seen as part of everyday religiosity. Many pilgrims visit the convent regularly and the visitation is a lifestyle the pilgrims have chosen and wish to maintain. Pilgrimage to a contemporary Greek nunnery should not be ennobled, but seen as part of a popular religious sentiment. The visits offer pilgrims various tools for reflecting on their personal life situations and on questions of identity. For them the full round of liturgical worship is a very good reason for going to the convent, and many see it as a way of maintaining their faith and of feeling close to God. Despite cultural developments such as secularization and globalization, pilgrims are quite loyal to the convent they visit. It represents the positive values of ‘Greekness’ and therefore they also trust the nuns’ approach to various matters, both personal and political. The coalition of Orthodoxy and nationalism is also visible in their attitudes towards the convent, which they see as a guardian of Hellenism and as nurturing Greek values both now and in the future.
Resumo:
提出了一项光线跟踪新技术,能有效提高光线在空白区域的行进速度.该技术首先用一种新方法创建均匀空间网格,然后用较少的空盒自适应聚集空的空间网格,以加快光线跟踪的计算.新加速结构的创建时间复杂度和空间复杂度均是O(n),而相应的光线跟踪计算的时间复杂度为O(logn),与kd树结构相当.当该结构与已有的一些加速结构结合后,能很好地处理大规模动态场景.比如,光线逐根跟踪且计算二次衍生光线时,新技术可在普通PC机上高真实感地交互绘制包含6G三角面片的多Buddha动态场景.
Resumo:
Cette thèse s'intéresse à des aspects du tournage, de la projection et de la perception du cinéma stéréo panoramique, appelé aussi cinéma omnistéréo. Elle s'inscrit en grande partie dans le domaine de la vision par ordinateur, mais elle touche aussi aux domaines de l'infographie et de la perception visuelle humaine. Le cinéma omnistéréo projette sur des écrans immersifs des vidéos qui fournissent de l'information sur la profondeur de la scène tout autour des spectateurs. Ce type de cinéma comporte des défis liés notamment au tournage de vidéos omnistéréo de scènes dynamiques, à la projection polarisée sur écrans très réfléchissants rendant difficile l'estimation de leur forme par reconstruction active, aux distorsions introduites par l'omnistéréo pouvant fausser la perception des profondeurs de la scène. Notre thèse a tenté de relever ces défis en apportant trois contributions majeures. Premièrement, nous avons développé la toute première méthode de création de vidéos omnistéréo par assemblage d'images pour des mouvements stochastiques et localisés. Nous avons mis au point une expérience psychophysique qui montre l'efficacité de la méthode pour des scènes sans structure isolée, comme des courants d'eau. Nous proposons aussi une méthode de tournage qui ajoute à ces vidéos des mouvements moins contraints, comme ceux d'acteurs. Deuxièmement, nous avons introduit de nouveaux motifs lumineux qui permettent à une caméra et un projecteur de retrouver la forme d'objets susceptibles de produire des interréflexions. Ces motifs sont assez généraux pour reconstruire non seulement les écrans omnistéréo, mais aussi des objets très complexes qui comportent des discontinuités de profondeur du point de vue de la caméra. Troisièmement, nous avons montré que les distorsions omnistéréo sont négligeables pour un spectateur placé au centre d'un écran cylindrique, puisqu'elles se situent à la périphérie du champ visuel où l'acuité devient moins précise.
Resumo:
The authors present an active vision system which performs a surveillance task in everyday dynamic scenes. The system is based around simple, rapid motion processors and a control strategy which uses both position and velocity information. The surveillance task is defined in terms of two separate behavioral subsystems, saccade and smooth pursuit, which are demonstrated individually on the system. It is shown how these and other elementary responses to 2D motion can be built up into behavior sequences, and how judicious close cooperation between vision and control results in smooth transitions between the behaviors. These ideas are demonstrated by an implementation of a saccade to smooth pursuit surveillance system on a high-performance robotic hand/eye platform.
Resumo:
We present a new approach to diffuse reflectance estimation for dynamic scenes. Non-parametric image statistics are used to transfer reflectance properties from a static example set to a dynamic image sequence. The approach allows diffuse reflectance estimation for surface materials with inhomogeneous appearance, such as those which commonly occur with patterned or textured clothing. Material editing is also possible by transferring edited reflectance properties. Material reflectance properties are initially estimated from static images of the subject under multiple directional illuminations using photometric stereo. The estimated reflectance together with the corresponding image under uniform ambient illumination form a prior set of reference material observations. Material reflectance properties are then estimated for video sequences of a moving person captured under uniform ambient illumination by matching the observed local image statistics to the reference observations. Results demonstrate that the transfer of reflectance properties enables estimation of the dynamic surface normals and subsequent relighting combined with material editing. This approach overcomes limitations of previous work on material transfer and relighting of dynamic scenes which was limited to surfaces with regions of homogeneous reflectance. We evaluate our approach for relighting 3D model sequences reconstructed from multiple view video. Comparison to previous model relighting demonstrates improved reproduction of detailed texture and shape dynamics.
Resumo:
Unusual event detection in crowded scenes remains challenging because of the diversity of events and noise. In this paper, we present a novel approach for unusual event detection via sparse reconstruction of dynamic textures over an overcomplete basis set, with the dynamic texture described by local binary patterns from three orthogonal planes (LBPTOP). The overcomplete basis set is learnt from the training data where only the normal items observed. In the detection process, given a new observation, we compute the sparse coefficients using the Dantzig Selector algorithm which was proposed in the literature of compressed sensing. Then the reconstruction errors are computed, based on which we detect the abnormal items. Our application can be used to detect both local and global abnormal events. We evaluate our algorithm on UCSD Abnormality Datasets for local anomaly detection, which is shown to outperform current state-of-the-art approaches, and we also get promising results for rapid escape detection using the PETS2009 dataset.
Resumo:
As a social species in a constantly changing environment, humans rely heavily on the informational richness and communicative capacity of the face. Thus, understanding how the brain processes information about faces in real-time is of paramount importance. The N170 is a high temporal resolution electrophysiological index of the brain's early response to visual stimuli that is reliably elicited in carefully controlled laboratory-based studies. Although the N170 has often been reported to be of greatest amplitude to faces, there has been debate regarding whether this effect might be an artifact of certain aspects of the controlled experimental stimulation schedules and materials. To investigate whether the N170 can be identified in more realistic conditions with highly variable and cluttered visual images and accompanying auditory stimuli we recorded EEG 'in the wild', while participants watched pop videos. Scene-cuts to faces generated a clear N170 response, and this was larger than the N170 to transitions where the videos cut to non-face stimuli. Within participants, wild-type face N170 amplitudes were moderately correlated to those observed in a typical laboratory experiment. Thus, we demonstrate that the face N170 is a robust and ecologically valid phenomenon and not an artifact arising as an unintended consequence of some property of the more typical laboratory paradigm.