321 resultados para Computer vision teaching
Resumo:
In this paper we describe cooperative control algorithms for robots and sensor nodes in an underwater environment. Cooperative navigation is defined as the ability of a coupled system of autonomous robots to pool their resources to achieve long-distance navigation and a larger controllability space. Other types of useful cooperation in underwater environments include: exchange of information such as data download and retasking; cooperative localization and tracking; and physical connection (docking) for tasks such as deployment of underwater sensor networks, collection of nodes and rescue of damaged robots. We present experimental results obtained with an underwater system that consists of two very different robots and a number of sensor network modules. We present the hardware and software architecture of this underwater system. We then describe various interactions between the robots and sensor nodes and between the two robots, including cooperative navigation. Finally, we describe our experiments with this underwater system and present data.
Resumo:
This paper presents an investigation into event detection in crowded scenes, where the event of interest co-occurs with other activities and only binary labels at the clip level are available. The proposed approach incorporates a fast feature descriptor from the MPEG domain, and a novel multiple instance learning (MIL) algorithm using sparse approximation and random sensing. MPEG motion vectors are used to build particle trajectories that represent the motion of objects in uniform video clips, and the MPEG DCT coefficients are used to compute a foreground map to remove background particles. Trajectories are transformed into the Fourier domain, and the Fourier representations are quantized into visual words using the K-Means algorithm. The proposed MIL algorithm models the scene as a linear combination of independent events, where each event is a distribution of visual words. Experimental results show that the proposed approaches achieve promising results for event detection compared to the state-of-the-art.
Resumo:
Non-rigid face alignment is a very important task in a large range of applications but the existing tracking based non-rigid face alignment methods are either inaccurate or requiring person-specific model. This dissertation has developed simultaneous alignment algorithms that overcome these constraints and provide alignment with high accuracy, efficiency, robustness to varying image condition, and requirement of only generic model.
Resumo:
In this paper, we present SMART (Sequence Matching Across Route Traversals): a vision- based place recognition system that uses whole image matching techniques and odometry information to improve the precision-recall performance, latency and general applicability of the SeqSLAM algorithm. We evaluate the system’s performance on challenging day and night journeys over several kilometres at widely varying vehicle velocities from 0 to 60 km/h, compare performance to the current state-of- the-art SeqSLAM algorithm, and provide parameter studies that evaluate the effectiveness of each system component. Using 30-metre sequences, SMART achieves place recognition performance of 81% recall at 100% precision, outperforming SeqSLAM, and is robust to significant degradations in odometry.
Resumo:
The ability to automate forced landings in an emergency such as engine failure is an essential ability to improve the safety of Unmanned Aerial Vehicles operating in General Aviation airspace. By using active vision to detect safe landing zones below the aircraft, the reliability and safety of such systems is vastly improved by gathering up-to-the-minute information about the ground environment. This paper presents the Site Detection System, a methodology utilising a downward facing camera to analyse the ground environment in both 2D and 3D, detect safe landing sites and characterise them according to size, shape, slope and nearby obstacles. A methodology is presented showing the fusion of landing site detection from 2D imagery with a coarse Digital Elevation Map and dense 3D reconstructions using INS-aided Structure-from-Motion to improve accuracy. Results are presented from an experimental flight showing the precision/recall of landing sites in comparison to a hand-classified ground truth, and improved performance with the integration of 3D analysis from visual Structure-from-Motion.
Resumo:
The huge amount of CCTV footage available makes it very burdensome to process these videos manually through human operators. This has made automated processing of video footage through computer vision technologies necessary. During the past several years, there has been a large effort to detect abnormal activities through computer vision techniques. Typically, the problem is formulated as a novelty detection task where the system is trained on normal data and is required to detect events which do not fit the learned ‘normal’ model. There is no precise and exact definition for an abnormal activity; it is dependent on the context of the scene. Hence there is a requirement for different feature sets to detect different kinds of abnormal activities. In this work we evaluate the performance of different state of the art features to detect the presence of the abnormal objects in the scene. These include optical flow vectors to detect motion related anomalies, textures of optical flow and image textures to detect the presence of abnormal objects. These extracted features in different combinations are modeled using different state of the art models such as Gaussian mixture model(GMM) and Semi- 2D Hidden Markov model(HMM) to analyse the performances. Further we apply perspective normalization to the extracted features to compensate for perspective distortion due to the distance between the camera and objects of consideration. The proposed approach is evaluated using the publicly available UCSD datasets and we demonstrate improved performance compared to other state of the art methods.
Resumo:
A security system based on the recognition of the iris of human eyes using the wavelet transform is presented. The zero-crossings of the wavelet transform are used to extract the unique features obtained from the grey-level profiles of the iris. The recognition process is performed in two stages. The first stage consists of building a one-dimensional representation of the grey-level profiles of the iris, followed by obtaining the wavelet transform zerocrossings of the resulting representation. The second stage is the matching procedure for iris recognition. The proposed approach uses only a few selected intermediate resolution levels for matching, thus making it computationally efficient as well as less sensitive to noise and quantisation errors. A normalisation process is implemented to compensate for size variations due to the possible changes in the camera-to-face distance. The technique has been tested on real images in both noise-free and noisy conditions. The technique is being investigated for real-time implementation, as a stand-alone system, for access control to high-security areas.
Resumo:
Recent modelling of socio-economic costs by the Australian railway industry in 2010 has estimated the cost of level crossing accidents to exceed AU$116 million annually. To better understand causal factors that contribute to these accidents, the Cooperative Research Centre for Rail Innovation is running a project entitled Baseline Level Crossing Video. The project aims to improve the recording of level crossing safety data by developing an intelligent system capable of detecting near-miss incidents and capturing quantitative data around these incidents. To detect near-miss events at railway level crossings a video analytics module is being developed to analyse video footage obtained from forward-facing cameras installed on trains. This paper presents a vision base approach for the detection of these near-miss events. The video analytics module is comprised of object detectors and a rail detection algorithm, allowing the distance between a detected object and the rail to be determined. An existing publicly available Histograms of Oriented Gradients (HOG) based object detector algorithm is used to detect various types of vehicles in each video frame. As vehicles are usually seen from a sideway view from the cabin’s perspective, the results of the vehicle detector are verified using an algorithm that can detect the wheels of each detected vehicle. Rail detection is facilitated using a projective transformation of the video, such that the forward-facing view becomes a bird’s eye view. Line Segment Detector is employed as the feature extractor and a sliding window approach is developed to track a pair of rails. Localisation of the vehicles is done by projecting the results of the vehicle and rail detectors on the ground plane allowing the distance between the vehicle and rail to be calculated. The resultant vehicle positions and distance are logged to a database for further analysis. We present preliminary results regarding the performance of a prototype video analytics module on a data set of videos containing more than 30 different railway level crossings. The video data is captured from a journey of a train that has passed through these level crossings.
Resumo:
This work aims at developing a planetary rover capable of acting as an assistant astrobiologist: making a preliminary analysis of the collected visual images that will help to make better use of the scientists time by pointing out the most interesting pieces of data. This paper focuses on the problem of detecting and recognising particular types of stromatolites. Inspired by the processes actual astrobiologists go through in the field when identifying stromatolites, the processes we investigate focus on recognising characteristics associated with biogenicity. The extraction of these characteristics is based on the analysis of geometrical structure enhanced by passing the images of stromatolites into an edge-detection filter and its Fourier Transform, revealing typical spatial frequency patterns. The proposed analysis is performed on both simulated images of stromatolite structures and images of real stromatolites taken in the field by astrobiologists.
Resumo:
Camera-laser calibration is necessary for many robotics and computer vision applications. However, existing calibration toolboxes still require laborious effort from the operator in order to achieve reliable and accurate results. This paper proposes algorithms that augment two existing trustful calibration methods with an automatic extraction of the calibration object from the sensor data. The result is a complete procedure that allows for automatic camera-laser calibration. The first stage of the procedure is automatic camera calibration which is useful in its own right for many applications. The chessboard extraction algorithm it provides is shown to outperform openly available techniques. The second stage completes the procedure by providing automatic camera-laser calibration. The procedure has been verified by extensive experimental tests with the proposed algorithms providing a major reduction in time required from an operator in comparison to manual methods.
Resumo:
Object classification is plagued by the issue of session variation. Session variation describes any variation that makes one instance of an object look different to another, for instance due to pose or illumination variation. Recent work in the challenging task of face verification has shown that session variability modelling provides a mechanism to overcome some of these limitations. However, for computer vision purposes, it has only been applied in the limited setting of face verification. In this paper we propose a local region based intersession variability (ISV) modelling approach, and apply it to challenging real-world data. We propose a region based session variability modelling approach so that local session variations can be modelled, termed Local ISV. We then demonstrate the efficacy of this technique on a challenging real-world fish image database which includes images taken underwater, providing significant real-world session variations. This Local ISV approach provides a relative performance improvement of, on average, 23% on the challenging MOBIO, Multi-PIE and SCface face databases. It also provides a relative performance improvement of 35% on our challenging fish image dataset.
Resumo:
Computer vision is increasingly becoming interested in the rapid estimation of object detectors. The canonical strategy of using Hard Negative Mining to train a Support Vector Machine is slow, since the large negative set must be traversed at least once per detector. Recent work has demonstrated that, with an assumption of signal stationarity, Linear Discriminant Analysis is able to learn comparable detectors without ever revisiting the negative set. Even with this insight, the time to learn a detector can still be on the order of minutes. Correlation filters, on the other hand, can produce a detector in under a second. However, this involves the unnatural assumption that the statistics are periodic, and requires the negative set to be re-sampled per detector size. These two methods differ chie y in the structure which they impose on the co- variance matrix of all examples. This paper is a comparative study which develops techniques (i) to assume periodic statistics without needing to revisit the negative set and (ii) to accelerate the estimation of detectors with aperiodic statistics. It is experimentally verified that periodicity is detrimental.
Resumo:
This paper presents a novel framework for the unsupervised alignment of an ensemble of temporal sequences. This approach draws inspiration from the axiom that an ensemble of temporal signals stemming from the same source/class should have lower rank when "aligned" rather than "misaligned". Our approach shares similarities with recent state of the art methods for unsupervised images ensemble alignment (e.g. RASL) which breaks the problem into a set of image alignment problems (which have well known solutions i.e. the Lucas-Kanade algorithm). Similarly, we propose a strategy for decomposing the problem of temporal ensemble alignment into a similar set of independent sequence problems which we claim can be solved reliably through Dynamic Time Warping (DTW). We demonstrate the utility of our method using the Cohn-Kanade+ dataset, to align expression onset across multiple sequences, which allows us to automate the rapid discovery of event annotations.
Resumo:
This paper describes a novel obstacle detection system for autonomous robots in agricultural field environments that uses a novelty detector to inform stereo matching. Stereo vision alone erroneously detects obstacles in environments with ambiguous appearance and ground plane such as in broad-acre crop fields with harvested crop residue. The novelty detector estimates the probability density in image descriptor space and incorporates image-space positional understanding to identify potential regions for obstacle detection using dense stereo matching. The results demonstrate that the system is able to detect obstacles typical to a farm at day and night. This system was successfully used as the sole means of obstacle detection for an autonomous robot performing a long term two hour coverage task travelling 8.5 km.
Resumo:
In this paper, the problem of moving object detection in aerial video is addressed. While motion cues have been extensively exploited in the literature, how to use spatial information is still an open problem. To deal with this issue, we propose a novel hierarchical moving target detection method based on spatiotemporal saliency. Temporal saliency is used to get a coarse segmentation, and spatial saliency is extracted to obtain the object’s appearance details in candidate motion regions. Finally, by combining temporal and spatial saliency information, we can get refined detection results. Additionally, in order to give a full description of the object distribution, spatial saliency is detected in both pixel and region levels based on local contrast. Experiments conducted on the VIVID dataset show that the proposed method is efficient and accurate.