991 resultados para computer vision


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis investigates the fusion of 3D visual information with 2D image cues to provide 3D semantic maps of large-scale environments in which a robot traverses for robotic applications. A major theme of this thesis was to exploit the availability of 3D information acquired from robot sensors to improve upon 2D object classification alone. The proposed methods have been evaluated on several indoor and outdoor datasets collected from mobile robotic platforms including a quadcopter and ground vehicle covering several kilometres of urban roads.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multi-touch interfaces across a wide range of hardware platforms are becoming pervasive. This is due to the adoption of smart phones and tablets in both the consumer and corporate market place. This paper proposes a human-machine interface to interact with unmanned aerial systems based on the philosophy of multi-touch hardware-independent high-level interaction with multiple systems simultaneously. Our approach incorporates emerging development methods for multi-touch interfaces on mobile platforms. A framework is defined for supporting multiple protocols. An open source solution is presented that demonstrates: architecture supporting different communications hardware; an extensible approach for supporting multiple protocols; and the ability to monitor and interact with multiple UAVs from multiple clients simultaneously. Validation tests were conducted to assess the performance, scalability and impact on packet latency under different client configurations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present an approach for image-based surface classification using multi-class Support Vector Machine (SVM). Classifying surfaces in aerial images is an important step towards an increased aircraft autonomy in emergency landing situations. We design a one-vs-all SVM classifier and conduct experiments on five data sets. Results demonstrate consistent overall performance figures over 88% and approximately 8% more accurate to those published on multi-class SVM on the KTH TIPS data set. We also show per-class performance values by using normalised confusion matrices. Our approach is designed to be executed online using a minimum set of feature attributes representing a feasible and ready-to-deploy system for onboard execution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aggressive behavior at the steering wheel has been indicated as a contributing factor in a majority of crashes and anger has been compared to alcohol impairment in terms of probability to cause a crash. It has been shown that being in a state of anger or excitement while driving can decrease the drivers’ performances. . This paper reports the evaluation of 6 novel design alternatives of In-Vehicle Information Systems (IVIS) aimed at mitigating driver aggression. Each application presented was designed to tackle the following contributing factors to driver aggression: competitiveness, anonymity, territoriality, stress as well as social and emotional isolation. The 6 applications were simulated using computer vision algorithm to automatically overlay the real traffic conditions with ‘Head-Up Display’ visualizations. Two applications emerged over the others from participant’s evaluation: shared music combined the known calming effect of music with the sense of sympathy and intimacy caused by hearing other drivers’ music. The Shared Snapshot application provided an immediate gratification and was evaluated as a potential prevention of roadside quarrels. The paper presents Theoretical foundation, participant’s evaluations, implications and limitations of the study.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Texture information in the iris image is not uniform in discriminatory information content for biometric identity verification. The bits in an iris code obtained from the image differ in their consistency from one sample to another for the same identity. In this work, errors in bit strings are systematically analysed in order to investigate the effect of light-induced and drug-induced pupil dilation and constriction on the consistency of iris texture information. The statistics of bit errors are computed for client and impostor distributions as functions of radius and angle. Under normal conditions, a V-shaped radial trend of decreasing bit errors towards the central region of the iris is obtained for client matching, and it is observed that the distribution of errors as a function of angle is uniform. When iris images are affected by pupil dilation or constriction the radial distribution of bit errors is altered. A decreasing trend from the pupil outwards is observed for constriction, whereas a more uniform trend is observed for dilation. The main increase in bit errors occurs closer to the pupil in both cases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In elite sports, nearly all performances are captured on video. Despite the massive amounts of video that has been captured in this domain over the last 10-15 years, most of it remains in an 'unstructured' or 'raw' form, meaning it can only be viewed or manually annotated/tagged with higher-level event labels which is time consuming and subjective. As such, depending on the detail or depth of annotation, the value of the collected repositories of archived data is minimal as it does not lend itself to large-scale analysis and retrieval. One such example is swimming, where each race of a swimmer is captured on a camcorder and in-addition to the split-times (i.e., the time it takes for each lap), stroke rate and stroke-lengths are manually annotated. In this paper, we propose a vision-based system which effectively 'digitizes' a large collection of archived swimming races by estimating the location of the swimmer in each frame, as well as detecting the stroke rate. As the videos are captured from moving hand-held cameras which are located at different positions and angles, we show our hierarchical-based approach to tracking the swimmer and their different parts is robust to these issues and allows us to accurately estimate the swimmer location and stroke rates.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a method of representing audience behavior through facial and body motions from a single video stream, and use these features to predict the rating for feature-length movies. This is a very challenging problem as: i) the movie viewing environment is dark and contains views of people at different scales and viewpoints; ii) the duration of feature-length movies is long (80-120 mins) so tracking people uninterrupted for this length of time is still an unsolved problem, and; iii) expressions and motions of audience members are subtle, short and sparse making labeling of activities unreliable. To circumvent these issues, we use an infrared illuminated test-bed to obtain a visually uniform input. We then utilize motion-history features which capture the subtle movements of a person within a pre-defined volume, and then form a group representation of the audience by a histogram of pair-wise correlations over a small-window of time. Using this group representation, we learn our movie rating classifier from crowd-sourced ratings collected by rottentomatoes.com and show our prediction capability on audiences from 30 movies across 250 subjects (> 50 hrs).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a topological localization method based on optical flow information. We analyse the statistical characteristics of the optical flow signal and demonstrate that the flow vectors can be used to identify and describe key locations in the environment. The key locations (nodes) correspond to significant scene changes and depth discontinuities. Since optical flow vectors contain position, magnitude and angle information, for each node, we extract low and high order statistical moments of the vectors and use them as descriptors for that node. Once a database of nodes and their corresponding optical flow features is created, the robot can perform topological localization by using the Mahalanobis distance between the current frame and the database. This is supported by field trials, which illustrate the repeatability of the proposed method for detecting and describing key locations in indoor and outdoor environments in challenging and diverse lighting conditions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A method for calculating visual odometry for ground vehicles with car-like kinematic motion constraints similar to Ackerman's steering model is presented. By taking advantage of this non-holonomic driving constraint we show a simple and practical solution to the odometry calculation by clever placement of a single camera. The method has been implemented successfully on a large industrial forklift and a Toyota Prado SUV. Results from our industrial test site is presented demonstrating the applicability of this method as a replacement for wheel encoder-based odometry for these vehicles.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Existing crowd counting algorithms rely on holistic, local or histogram based features to capture crowd properties. Regression is then employed to estimate the crowd size. Insufficient testing across multiple datasets has made it difficult to compare and contrast different methodologies. This paper presents an evaluation across multiple datasets to compare holistic, local and histogram based methods, and to compare various image features and regression models. A K-fold cross validation protocol is followed to evaluate the performance across five public datasets: UCSD, PETS 2009, Fudan, Mall and Grand Central datasets. Image features are categorised into five types: size, shape, edges, keypoints and textures. The regression models evaluated are: Gaussian process regression (GPR), linear regression, K nearest neighbours (KNN) and neural networks (NN). The results demonstrate that local features outperform equivalent holistic and histogram based features; optimal performance is observed using all image features except for textures; and that GPR outperforms linear, KNN and NN regression

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Novel computer vision techniques have been developed to automatically detect unusual events in crowded scenes from video feeds of surveillance cameras. The research is useful in the design of the next generation intelligent video surveillance systems. Two major contributions are the construction of a novel machine learning model for multiple instance learning through compressive sensing, and the design of novel feature descriptors in the compressed video domain.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent modelling of socio-economic costs by the Australian railway industry in 2010 has estimated the cost of level crossing accidents to exceed AU$116 million annually. To better understand the causal factors of these accidents, a video analytics application is being developed to automatically detect near-miss incidents using forward facing videos from trains. As near-miss events occur more frequently than collisions, by detecting these occurrences there will be more safety data available for analysis. The application that is being developed will improve the objectivity of near-miss reporting by providing quantitative data about the position of vehicles at level crossings through the automatic analysis of video footage. In this paper we present a novel method for detecting near-miss occurrences at railway level crossings from video data of trains. Our system detects and localizes vehicles at railway level crossings. It also detects the position of railways to calculate the distance of the detected vehicles to the railway centerline. The system logs the information about the position of the vehicles and railway centerline into a database for further analysis by the safety data recording and analysis system, to determine whether or not the event is a near-miss. We present preliminary results of our system on a dataset of videos taken from a train that passed through 14 railway level crossings. We demonstrate the robustness of our system by showing the results of our system on day and night videos.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces a new method to automate the detection of marine species in aerial imagery using a Machine Learning approach. Our proposed system has at its core, a convolutional neural network. We compare this trainable classifier to a handcrafted classifier based on color features, entropy and shape analysis. Experiments demonstrate that the convolutional neural network outperforms the handcrafted solution. We also introduce a negative training example-selection method for situations where the original training set consists of a collection of labeled images in which the objects of interest (positive examples) have been marked by a bounding box. We show that picking random rectangles from the background is not necessarily the best way to generate useful negative examples with respect to learning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Representation of facial expressions using continuous dimensions has shown to be inherently more expressive and psychologically meaningful than using categorized emotions, and thus has gained increasing attention over recent years. Many sub-problems have arisen in this new field that remain only partially understood. A comparison of the regression performance of different texture and geometric features and investigation of the correlations between continuous dimensional axes and basic categorized emotions are two of these. This paper presents empirical studies addressing these problems, and it reports results from an evaluation of different methods for detecting spontaneous facial expressions within the arousal-valence dimensional space (AV). The evaluation compares the performance of texture features (SIFT, Gabor, LBP) against geometric features (FAP-based distances), and the fusion of the two. It also compares the prediction of arousal and valence, obtained using the best fusion method, to the corresponding ground truths. Spatial distribution, shift, similarity, and correlation are considered for the six basic categorized emotions (i.e. anger, disgust, fear, happiness, sadness, surprise). Using the NVIE database, results show that the fusion of LBP and FAP features performs the best. The results from the NVIE and FEEDTUM databases reveal novel findings about the correlations of arousal and valence dimensions to each of six basic emotion categories.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Due to the popularity of security cameras in public places, it is of interest to design an intelligent system that can efficiently detect events automatically. This paper proposes a novel algorithm for multi-person event detection. To ensure greater than real-time performance, features are extracted directly from compressed MPEG video. A novel histogram-based feature descriptor that captures the angles between extracted particle trajectories is proposed, which allows us to capture motion patterns of multi-person events in the video. To alleviate the need for fine-grained annotation, we propose the use of Labelled Latent Dirichlet Allocation, a “weakly supervised” method that allows the use of coarse temporal annotations which are much simpler to obtain. This novel system is able to run at approximately ten times real-time, while preserving state-of-theart detection performance for multi-person events on a 100-hour real-world surveillance dataset (TRECVid SED).