903 resultados para night vision system
Resumo:
Calibration of a camera system is a necessary step in any stereo metric process. It correlates all cameras to a common coordinate system by measuring the intrinsic and extrinsic parameters of each camera. Currently, manual calibration of a camera system is the only way to achieve calibration in civil engineering operations that require stereo metric processes (photogrammetry, videogrammetry, vision based asset tracking, etc). This type of calibration however is time-consuming and labor-intensive. Furthermore, in civil engineering operations, camera systems are exposed to open, busy sites. In these conditions, the position of presumably stationary cameras can easily be changed due to external factors such as wind, vibrations or due to an unintentional push/touch from personnel on site. In such cases manual calibration must be repeated. In order to address this issue, several self-calibration algorithms have been proposed. These algorithms use Projective Geometry, Absolute Conic and Kruppa Equations and variations of these to produce processes that achieve calibration. However, most of these methods do not consider all constraints of a camera system such as camera intrinsic constraints, scene constraints, camera motion or varying camera intrinsic properties. This paper presents a novel method that takes all constraints into consideration to auto-calibrate cameras using an image alignment algorithm originally meant for vision based tracking. In this method, image frames are taken from cameras. These frames are used to calculate the fundamental matrix that gives epipolar constraints. Intrinsic and extrinsic properties of cameras are acquired from this calculation. Test results are presented in this paper with recommendations for further improvement.
Resumo:
Vision trackers have been proposed as a promising alternative for tracking at large-scale, congested construction sites. They provide the location of a large number of entities in a camera view across frames. However, vision trackers provide only two-dimensional (2D) pixel coordinates, which are not adequate for construction applications. This paper proposes and validates a method that overcomes this limitation by employing stereo cameras and converting 2D pixel coordinates to three-dimensional (3D) metric coordinates. The proposed method consists of four steps: camera calibration, camera pose estimation, 2D tracking, and triangulation. Given that the method employs fixed, calibrated stereo cameras with a long baseline, appropriate algorithms are selected for each step. Once the first two steps reveal camera system parameters, the third step determines 2D pixel coordinates of entities in subsequent frames. The 2D coordinates are triangulated on the basis of the camera system parameters to obtain 3D coordinates. The methodology presented in this paper has been implemented and tested with data collected from a construction site. The results demonstrate the suitability of this method for on-site tracking purposes.
Resumo:
Advances in the development of computer vision, miniature Micro-Electro-Mechanical Systems (MEMS) and Wireless Sensor Network (WSN) offer intriguing possibilities that can radically alter the paradigms underlying existing methods of condition assessment and monitoring of ageing civil engineering infrastructure. This paper describes some of the outcomes of the European Science Foundation project "Micro-Measurement and Monitoring System for Ageing Underground Infrastructures (Underground M3)". The main aim of the project was to develop a system that uses a tiered approach to monitor the degree and rate of tunnel deterioration. The system comprises of (1) Tier 1: Micro-detection using advances in computer vision and (2) Tier 2: Micro-monitoring and communication using advances in MEMS and WSN. These potentially low-cost technologies will be able to reduce costs associated with end-of-life structures, which is essential to the viability of rehabilitation, repair and reuse. The paper describes the actual deployment and testing of these innovative monitoring tools in tunnels of London Underground, Prague Metro and Barcelona Metro. © 2012 Taylor & Francis Group.
Resumo:
This paper describes an interactive system for quickly modelling 3D body shapes from a single image. It provides the user with a convenient way to obtain their 3D body shapes so as to try on virtual garments online. For the ease of use, we first introduce a novel interface for users to conveniently extract anthropometric measurements from a single photo, while using readily available scene cues for automatic image rectification. Then, we propose a unified probabilistic framework using Gaussian processes, which predict the body parameters from input measurements while correcting the aspect ratio ambiguity resulting from photo rectification. Extensive experiments and user studies have supported the efficacy of our system. This system is now being exploited commercially online1. © 2011. The copyright of this document resides with its authors.
Resumo:
This paper presents a novel architecture of vision chip for fast traffic lane detection (FTLD). The architecture consists of a 32*32 SIMD processing element (PE) array processor and a dual-core RISC processor. The PE array processor performs low-level pixel-parallel image processing at high speed and outputs image features for high-level image processing without I/O bottleneck. The dual-core processor carries out high-level image processing. A parallel fast lane detection algorithm for this architecture is developed. The FPGA system with a CMOS image sensor is used to implement the architecture. Experiment results show that the system can perform the fast traffic lane detection at 50fps rate. It is much faster than previous works and has good robustness that can operate in various intensity of light. The novel architecture of vision chip is able to meet the demand of real-time lane departure warning system.
Resumo:
For night remote surveillance, we present a method, the range-gated laser stroboscopic imaging(RGLSI), which uses a new kind of time delay integration mode to integrate target signals so that night remote surveillance can be realized by a low-energy illuminated laser. The time delay integration in this method has no influence on the video frame rate. Compared with the traditional range-gated laser imaging, RGLSI can reduce scintillation and target speckle effects and significantly improve the image signal-to-noise ratio analyzed. Even under low light level and low visibility conditions, the RGLSI system can effectively work. In a preliminary experiment, we have detected and recognized a railway bridge one kilometer away under a visibility of six kilometers, when the effective illuminated energy is 29.5 mu J.
Resumo:
A portable 3D laser scanning system has been designed and built for robot vision. By tilting the charge coupled device (CCD) plane of portable 3D scanning system according to the Scheimpflug condition, the depth-of-view is successfully extended from less than 40 to 100 mm. Based on the tilted camera model, the traditional two-step camera calibration method is modified by introducing the angle factor. Meanwhile, a novel segmental calibration approach, i.e., dividing the whole work range into two parts and calibrating, respectively, with corresponding system parameters, is proposed to effectively improve the measurement accuracy of the large depth-of-view 3D laser scanner. In the process of 3D reconstruction, different calibration parameters are used to transform the 2D coordinates into 3D coordinates according to the different positions of the image in the CCD plane, and the measurement accuracy of 60 mu m is obtained experimentally. Finally, the experiment of scanning a lamina by the large depth-of-view portable 3D laser scanner used by an industrial robot IRB 4400 is also employed to demonstrate the effectiveness and high measurement accuracy of our scanning system. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
This thesis describes Sonja, a system which uses instructions in the course of visually-guided activity. The thesis explores an integration of research in vision, activity, and natural language pragmatics. Sonja's visual system demonstrates the use of several intermediate visual processes, particularly visual search and routines, previously proposed on psychophysical grounds. The computations Sonja performs are compatible with the constraints imposed by neuroscientifically plausible hardware. Although Sonja can operate autonomously, it can also make flexible use of instructions provided by a human advisor. The system grounds its understanding of these instructions in perception and action.
Resumo:
Barnes, D. P., Lee, M. H., Hardy, N. W. (1983). A control and monitoring system for multiple-sensor industrial robots. In Proc. 3rd. Int. Conf. Robot Vision and Sensory Controls, Cambridge, MA. USA., 471-479.
Resumo:
A vision based technique for non-rigid control is presented that can be used for animation and video game applications. The user grasps a soft, squishable object in front of a camera that can be moved and deformed in order to specify motion. Active Blobs, a non-rigid tracking technique is used to recover the position, rotation and non-rigid deformations of the object. The resulting transformations can be applied to a texture mapped mesh, thus allowing the user to control it interactively. Our use of texture mapping hardware allows us to make the system responsive enough for interactive animation and video game character control.
Resumo:
Many people suffer from conditions that lead to deterioration of motor control and makes access to the computer using traditional input devices difficult. In particular, they may loose control of hand movement to the extent that the standard mouse cannot be used as a pointing device. Most current alternatives use markers or specialized hardware to track and translate a user's movement to pointer movement. These approaches may be perceived as intrusive, for example, wearable devices. Camera-based assistive systems that use visual tracking of features on the user's body often require cumbersome manual adjustment. This paper introduces an enhanced computer vision based strategy where features, for example on a user's face, viewed through an inexpensive USB camera, are tracked and translated to pointer movement. The main contributions of this paper are (1) enhancing a video based interface with a mechanism for mapping feature movement to pointer movement, which allows users to navigate to all areas of the screen even with very limited physical movement, and (2) providing a customizable, hierarchical navigation framework for human computer interaction (HCI). This framework provides effective use of the vision-based interface system for accessing multiple applications in an autonomous setting. Experiments with several users show the effectiveness of the mapping strategy and its usage within the application framework as a practical tool for desktop users with disabilities.
Resumo:
How do reactive and planned behaviors interact in real time? How are sequences of such behaviors released at appropriate times during autonomous navigation to realize valued goals? Controllers for both animals and mobile robots, or animats, need reactive mechanisms for exploration, and learned plans to reach goal objects once an environment becomes familiar. The SOVEREIGN (Self-Organizing, Vision, Expectation, Recognition, Emotion, Intelligent, Goaloriented Navigation) animat model embodies these capabilities, and is tested in a 3D virtual reality environment. SOVEREIGN includes several interacting subsystems which model complementary properties of cortical What and Where processing streams and which clarify similarities between mechanisms for navigation and arm movement control. As the animat explores an environment, visual inputs are processed by networks that are sensitive to visual form and motion in the What and Where streams, respectively. Position-invariant and sizeinvariant recognition categories are learned by real-time incremental learning in the What stream. Estimates of target position relative to the animat are computed in the Where stream, and can activate approach movements toward the target. Motion cues from animat locomotion can elicit head-orienting movements to bring a new target into view. Approach and orienting movements are alternately performed during animat navigation. Cumulative estimates of each movement are derived from interacting proprioceptive and visual cues. Movement sequences are stored within a motor working memory. Sequences of visual categories are stored in a sensory working memory. These working memories trigger learning of sensory and motor sequence categories, or plans, which together control planned movements. Predictively effective chunk combinations are selectively enhanced via reinforcement learning when the animat is rewarded. Selected planning chunks effect a gradual transition from variable reactive exploratory movements to efficient goal-oriented planned movement sequences. Volitional signals gate interactions between model subsystems and the release of overt behaviors. The model can control different motor sequences under different motivational states and learns more efficient sequences to rewarded goals as exploration proceeds.
Resumo:
CONFIGR (CONtour FIgure GRound) is a computational model based on principles of biological vision that completes sparse and noisy image figures. Within an integrated vision/recognition system, CONFIGR posits an initial recognition stage which identifies figure pixels from spatially local input information. The resulting, and typically incomplete, figure is fed back to the “early vision” stage for long-range completion via filling-in. The reconstructed image is then re-presented to the recognition system for global functions such as object recognition. In the CONFIGR algorithm, the smallest independent image unit is the visible pixel, whose size defines a computational spatial scale. Once pixel size is fixed, the entire algorithm is fully determined, with no additional parameter choices. Multi-scale simulations illustrate the vision/recognition system. Open-source CONFIGR code is available online, but all examples can be derived analytically, and the design principles applied at each step are transparent. The model balances filling-in as figure against complementary filling-in as ground, which blocks spurious figure completions. Lobe computations occur on a subpixel spatial scale. Originally designed to fill-in missing contours in an incomplete image such as a dashed line, the same CONFIGR system connects and segments sparse dots, and unifies occluded objects from pieces locally identified as figure in the initial recognition stage. The model self-scales its completion distances, filling-in across gaps of any length, where unimpeded, while limiting connections among dense image-figure pixel groups that already have intrinsic form. Long-range image completion promises to play an important role in adaptive processors that reconstruct images from highly compressed video and still camera images.
Resumo:
— Consideration of how people respond to the question What is this? has suggested new problem frontiers for pattern recognition and information fusion, as well as neural systems that embody the cognitive transformation of declarative information into relational knowledge. In contrast to traditional classification methods, which aim to find the single correct label for each exemplar (This is a car), the new approach discovers rules that embody coherent relationships among labels which would otherwise appear contradictory to a learning system (This is a car, that is a vehicle, over there is a sedan). This talk will describe how an individual who experiences exemplars in real time, with each exemplar trained on at most one category label, can autonomously discover a hierarchy of cognitive rules, thereby converting local information into global knowledge. Computational examples are based on the observation that sensors working at different times, locations, and spatial scales, and experts with different goals, languages, and situations, may produce apparently inconsistent image labels, which are reconciled by implicit underlying relationships that the network’s learning process discovers. The ARTMAP information fusion system can, moreover, integrate multiple separate knowledge hierarchies, by fusing independent domains into a unified structure. In the process, the system discovers cross-domain rules, inferring multilevel relationships among groups of output classes, without any supervised labeling of these relationships. In order to self-organize its expert system, the ARTMAP information fusion network features distributed code representations which exploit the model’s intrinsic capacity for one-to-many learning (This is a car and a vehicle and a sedan) as well as many-to-one learning (Each of those vehicles is a car). Fusion system software, testbed datasets, and articles are available from http://cns.bu.edu/techlab.
Resumo:
An improved Boundary Contour System (BCS) and Feature Contour System (FCS) neural network model of preattentive vision is applied to large images containing range data gathered by a synthetic aperture radar (SAR) sensor. The goal of processing is to make structures such as motor vehicles, roads, or buildings more salient and more interpretable to human observers than they are in the original imagery. Early processing by shunting center-surround networks compresses signal dynamic range and performs local contrast enhancement. Subsequent processing by filters sensitive to oriented contrast, including short-range competition and long-range cooperation, segments the image into regions. The segmentation is performed by three "copies" of the BCS and FCS, of small, medium, and large scales, wherein the "short-range" and "long-range" interactions within each scale occur over smaller or larger distances, corresponding to the size of the early filters of each scale. A diffusive filling-in operation within the segmented regions at each scale produces coherent surface representations. The combination of BCS and FCS helps to locate and enhance structure over regions of many pixels, without the resulting blur characteristic of approaches based on low spatial frequency filtering alone.