316 resultados para 280208 Computer Vision


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Public buildings and large infrastructure are typically monitored by tens or hundreds of cameras, all capturing different physical spaces and observing different types of interactions and behaviours. However to date, in large part due to limited data availability, crowd monitoring and operational surveillance research has focused on single camera scenarios which are not representative of real-world applications. In this paper we present a new, publicly available database for large scale crowd surveillance. Footage from 12 cameras for a full work day covering the main floor of a busy university campus building, including an internal and external foyer, elevator foyers, and the main external approach are provided; alongside annotation for crowd counting (single or multi-camera) and pedestrian flow analysis for 10 and 6 sites respectively. We describe how this large dataset can be used to perform distributed monitoring of building utilisation, and demonstrate the potential of this dataset to understand and learn the relationship between different areas of a building.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It is not uncommon to hear a person of interest described by their height, build, and clothing (i.e. type and colour). These semantic descriptions are commonly used by people to describe others, as they are quick to communicate and easy to understand. However such queries are not easily utilised within intelligent video surveillance systems, as they are difficult to transform into a representation that can be utilised by computer vision algorithms. In this paper we propose a novel approach that transforms such a semantic query into an avatar in the form of a channel representation that is searchable within a video stream. We show how spatial, colour and prior information (person shape) can be incorporated into the channel representation to locate a target using a particle-filter like approach. We demonstrate state-of-the-art performance for locating a subject in video based on a description, achieving a relative performance improvement of 46.7% over the baseline. We also apply this approach to person re-detection, and show that the approach can be used to re-detect a person in a video steam without the use of person detection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Diffusion weighted magnetic resonance (MR) imaging is a powerful tool that can be employed to study white matter microstructure by examining the 3D displacement profile of water molecules in brain tissue. By applying diffusion-sensitized gradients along a minimum of 6 directions, second-order tensors can be computed to model dominant diffusion processes. However, conventional DTI is not sufficient to resolve crossing fiber tracts. Recently, a number of high-angular resolution schemes with greater than 6 gradient directions have been employed to address this issue. In this paper, we introduce the Tensor Distribution Function (TDF), a probability function defined on the space of symmetric positive definite matrices. Here, fiber crossing is modeled as an ensemble of Gaussian diffusion processes with weights specified by the TDF. Once this optimal TDF is determined, the diffusion orientation distribution function (ODF) can easily be computed by analytic integration of the resulting displacement probability function.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a novel vision-based underwater robotic system for the identification and control of Crown-Of-Thorns starfish (COTS) in coral reef environments. COTS have been identified as one of the most significant threats to Australia's Great Barrier Reef. These starfish literally eat coral, impacting large areas of reef and the marine ecosystem that depends on it. Evidence has suggested that land-based nutrient runoff has accelerated recent outbreaks of COTS requiring extensive use of divers to manually inject biological agents into the starfish in an attempt to control population numbers. Facilitating this control program using robotics is the goal of our research. In this paper we introduce a vision-based COTS detection and tracking system based on a Random Forest Classifier (RFC) trained on images from underwater footage. To track COTS with a moving camera, we embed the RFC in a particle filter detector and tracker where the predicted class probability of the RFC is used as an observation probability to weight the particles, and we use a sparse optical flow estimation for the prediction step of the filter. The system is experimentally evaluated in a realistic laboratory setup using a robotic arm that moves a camera at different speeds and heights over a range of real-size images of COTS in a reef environment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Automated digital recordings are useful for large-scale temporal and spatial environmental monitoring. An important research effort has been the automated classification of calling bird species. In this paper we examine a related task, retrieval of birdcalls from a database of audio recordings, similar to a user supplied query call. Such a retrieval task can sometimes be more useful than an automated classifier. We compare three approaches to similarity-based birdcall retrieval using spectral ridge features and two kinds of gradient features, structure tensor and the histogram of oriented gradients. The retrieval accuracy of our spectral ridge method is 94% compared to 82% for the structure tensor method and 90% for the histogram of gradients method. Additionally, this approach potentially offers a more compact representation and is more computationally efficient.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As critical infrastructure such as transportation hubs continue to grow in complexity, greater importance is placed on monitoring these facilities to ensure their secure and efficient operation. In order to achieve these goals, technology continues to evolve in response to the needs of various infrastructure. To date, however, the focus of technology for surveillance has been primarily concerned with security, and little attention has been placed on assisting operations and monitoring performance in real-time. Consequently, solutions have emerged to provide real-time measurements of queues and crowding in spaces, but have been installed as system add-ons (rather than making better use of existing infrastructure), resulting in expensive infrastructure outlay for the owner/operator, and an overload of surveillance systems which in itself creates further complexity. Given many critical infrastructure already have camera networks installed, it is much more desirable to better utilise these networks to address operational monitoring as well as security needs. Recently, a growing number of approaches have been proposed to monitor operational aspects such as pedestrian throughput, crowd size and dwell times. In this paper, we explore how these techniques relate to and complement the more commonly seen security analytics, and demonstrate the value that can be added by operational analytics by demonstrating their performance on airport surveillance data. We explore how multiple analytics and systems can be combined to better leverage the large amount of data that is available, and we discuss the applicability and resulting benefits of the proposed framework for the ongoing operation of airports and airport networks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Even though crashes between trains and road users are rare events at railway level crossings, they are one of the major safety concerns for the Australian railway industry. Nearmiss events at level crossings occur more frequently, and can provide more information about factors leading to level crossing incidents. In this paper we introduce a video analytic approach for automatically detecting and localizing vehicles from cameras mounted on trains for detecting near-miss events. To detect and localize vehicles at level crossings we extract patches from an image and classify each patch for detecting vehicles. We developed a region proposals algorithm for generating patches, and we use a Convolutional Neural Network (CNN) for classifying each patch. To localize vehicles in images we combine the patches that are classified as vehicles according to their CNN scores and positions. We compared our system with the Deformable Part Models (DPM) and Regions with CNN features (R-CNN) object detectors. Experimental results on a railway dataset show that the recall rate of our proposed system is 29% higher than what can be achieved with DPM or R-CNN detectors.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The latest generation of Deep Convolutional Neural Networks (DCNN) have dramatically advanced challenging computer vision tasks, especially in object detection and object classification, achieving state-of-the-art performance in several computer vision tasks including text recognition, sign recognition, face recognition and scene understanding. The depth of these supervised networks has enabled learning deeper and hierarchical representation of features. In parallel, unsupervised deep learning such as Convolutional Deep Belief Network (CDBN) has also achieved state-of-the-art in many computer vision tasks. However, there is very limited research on jointly exploiting the strength of these two approaches. In this paper, we investigate the learning capability of both methods. We compare the output of individual layers and show that many learnt filters and outputs of the corresponding level layer are almost similar for both approaches. Stacking the DCNN on top of unsupervised layers or replacing layers in the DCNN with the corresponding learnt layers in the CDBN can improve the recognition/classification accuracy and training computational expense. We demonstrate the validity of the proposal on ImageNet dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Reconstructing 3D motion data is highly under-constrained due to several common sources of data loss during measurement, such as projection, occlusion, or miscorrespondence. We present a statistical model of 3D motion data, based on the Kronecker structure of the spatiotemporal covariance of natural motion, as a prior on 3D motion. This prior is expressed as a matrix normal distribution, composed of separable and compact row and column covariances. We relate the marginals of the distribution to the shape, trajectory, and shape-trajectory models of prior art. When the marginal shape distribution is not available from training data, we show how placing a hierarchical prior over shapes results in a convex MAP solution in terms of the trace-norm. The matrix normal distribution, fit to a single sequence, outperforms state-of-the-art methods at reconstructing 3D motion data in the presence of significant data loss, while providing covariance estimates of the imputed points.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aerial surveys conducted using manned or unmanned aircraft with customized camera payloads can generate a large number of images. Manual review of these images to extract data is prohibitive in terms of time and financial resources, thus providing strong incentive to automate this process using computer vision systems. There are potential applications for these automated systems in areas such as surveillance and monitoring, precision agriculture, law enforcement, asset inspection, and wildlife assessment. In this paper, we present an efficient machine learning system for automating the detection of marine species in aerial imagery. The effectiveness of our approach can be credited to the combination of a well-suited region proposal method and the use of Deep Convolutional Neural Networks (DCNNs). In comparison to previous algorithms designed for the same purpose, we have been able to dramatically improve recall to more than 80% and improve precision to 27% by using DCNNs as the core approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a system to analyze long field recordings with low signal-to-noise ratio (SNR) for bio-acoustic monitoring. A method based on spectral peak track, Shannon entropy, harmonic structure and oscillation structure is proposed to automatically detect anuran (frog) calling activity. Gaussian mixture model (GMM) is introduced for modelling those features. Four anuran species widespread in Queensland, Australia, are selected to evaluate the proposed system. A visualization method based on extracted indices is employed for detection of anuran calling activity which achieves high accuracy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bioacoustic data can be used for monitoring animal species diversity. The deployment of acoustic sensors enables acoustic monitoring at large temporal and spatial scales. We describe a content-based birdcall retrieval algorithm for the exploration of large data bases of acoustic recordings. In the algorithm, an event-based searching scheme and compact features are developed. In detail, ridge events are detected from audio files using event detection on spectral ridges. Then event alignment is used to search through audio files to locate candidate instances. A similarity measure is then applied to dimension-reduced spectral ridge feature vectors. The event-based searching method processes a smaller list of instances for faster retrieval. The experimental results demonstrate that our features achieve better success rate than existing methods and the feature dimension is greatly reduced.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The world is rich with information such as signage and maps to assist humans to navigate. We present a method to extract topological spatial information from a generic bitmap floor plan and build a topometric graph that can be used by a mobile robot for tasks such as path planning and guided exploration. The algorithm first detects and extracts text in an image of the floor plan. Using the locations of the extracted text, flood fill is used to find the rooms and hallways. Doors are found by matching SURF features and these form the connections between rooms, which are the edges of the topological graph. Our system is able to automatically detect doors and differentiate between hallways and rooms, which is important for effective navigation. We show that our method can extract a topometric graph from a floor plan and is robust against ambiguous cases most commonly seen in floor plans including elevators and stairwells.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recovering the motion of a non-rigid body from a set of monocular images permits the analysis of dynamic scenes in uncontrolled environments. However, the extension of factorisation algorithms for rigid structure from motion to the low-rank non-rigid case has proved challenging. This stems from the comparatively hard problem of finding a linear “corrective transform” which recovers the projection and structure matrices from an ambiguous factorisation. We elucidate that this greater difficulty is due to the need to find multiple solutions to a non-trivial problem, casting a number of previous approaches as alleviating this issue by either a) introducing constraints on the basis, making the problems nonidentical, or b) incorporating heuristics to encourage a diverse set of solutions, making the problems inter-dependent. While it has previously been recognised that finding a single solution to this problem is sufficient to estimate cameras, we show that it is possible to bootstrap this partial solution to find the complete transform in closed-form. However, we acknowledge that our method minimises an algebraic error and is thus inherently sensitive to deviation from the low-rank model. We compare our closed-form solution for non-rigid structure with known cameras to the closed-form solution of Dai et al. [1], which we find to produce only coplanar reconstructions. We therefore make the recommendation that 3D reconstruction error always be measured relative to a trivial reconstruction such as a planar one.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Domain-invariant representations are key to addressing the domain shift problem where the training and test exam- ples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be di- rectly suitable for such a comparison, since some of the fea- tures may have been distorted by the domain shift, or may be domain specific. In this paper, we introduce a Domain Invariant Projection approach: An unsupervised domain adaptation method that overcomes this issue by extracting the information that is invariant across the source and tar- get domains. More specifically, we learn a projection of the data to a low-dimensional latent space where the distance between the empirical distributions of the source and target examples is minimized. We demonstrate the effectiveness of our approach on the task of visual object recognition and show that it outperforms state-of-the-art methods on a stan- dard domain adaptation benchmark dataset