952 resultados para Appearance-based Navigation
Resumo:
Wall and terrain following is a challenging problem for small, fast, and fragile robot vehicles. This paper presents a robust algorithm based on wide field integration of optic flow. Solutions for two dimensional and three dimensional wall following is provided for vehicles with non-holonomic velocity constraints that ensure that the focus of expansion of the flow field is known. The potential of the proposed algorithm is demonstrated in a simulation environment.
Resumo:
Micro aerial vehicles (MAVs) are a rapidly growing area of research and development in robotics. For autonomous robot operations, localization has typically been calculated using GPS, external camera arrays, or onboard range or vision sensing. In cluttered indoor or outdoor environments, onboard sensing is the only viable option. In this paper we present an appearance-based approach to visual SLAM on a flying MAV using only low quality vision. Our approach consists of a visual place recognition algorithm that operates on 1000 pixel images, a lightweight visual odometry algorithm, and a visual expectation algorithm that improves the recall of place sequences and the precision with which they are recalled as the robot flies along a similar path. Using data gathered from outdoor datasets, we show that the system is able to perform visual recognition with low quality, intermittent visual sensory data. By combining the visual algorithms with the RatSLAM system, we also demonstrate how the algorithms enable successful SLAM.
Resumo:
Gait recognition approaches continue to struggle with challenges including view-invariance, low-resolution data, robustness to unconstrained environments, and fluctuating gait patterns due to subjects carrying goods or wearing different clothes. Although computationally expensive, model based techniques offer promise over appearance based techniques for these challenges as they gather gait features and interpret gait dynamics in skeleton form. In this paper, we propose a fast 3D ellipsoidal-based gait recognition algorithm using a 3D voxel model derived from multi-view silhouette images. This approach directly solves the limitations of view dependency and self-occlusion in existing ellipse fitting model-based approaches. Voxel models are segmented into four components (left and right legs, above and below the knee), and ellipsoids are fitted to each region using eigenvalue decomposition. Features derived from the ellipsoid parameters are modeled using a Fourier representation to retain the temporal dynamic pattern for classification. We demonstrate the proposed approach using the CMU MoBo database and show that an improvement of 15-20% can be achieved over a 2D ellipse fitting baseline.
Resumo:
Gait energy images (GEIs) and its variants form the basis of many recent appearance-based gait recognition systems. The GEI combines good recognition performance with a simple implementation, though it suffers problems inherent to appearance-based approaches, such as being highly view dependent. In this paper, we extend the concept of the GEI to 3D, to create what we call the gait energy volume, or GEV. A basic GEV implementation is tested on the CMU MoBo database, showing improvements over both the GEI baseline and a fused multi-view GEI approach. We also demonstrate the efficacy of this approach on partial volume reconstructions created from frontal depth images, which can be more practically acquired, for example, in biometric portals implemented with stereo cameras, or other depth acquisition systems. Experiments on frontal depth images are evaluated on an in-house developed database captured using the Microsoft Kinect, and demonstrate the validity of the proposed approach.
Resumo:
Virtual environments can provide, through digital games and online social interfaces, extremely exciting forms of interactive entertainment. Because of their capability in displaying and manipulating information in natural and intuitive ways, such environments have found extensive applications in decision support, education and training in the health and science domains amongst others. Currently, the burden of validating both the interactive functionality and visual consistency of a virtual environment content is entirely carried out by developers and play-testers. While considerable research has been conducted in assisting the design of virtual world content and mechanics, to date, only limited contributions have been made regarding the automatic testing of the underpinning graphics software and hardware. The aim of this thesis is to determine whether the correctness of the images generated by a virtual environment can be quantitatively defined, and automatically measured, in order to facilitate the validation of the content. In an attempt to provide an environment-independent definition of visual consistency, a number of classification approaches were developed. First, a novel model-based object description was proposed in order to enable reasoning about the color and geometry change of virtual entities during a play-session. From such an analysis, two view-based connectionist approaches were developed to map from geometry and color spaces to a single, environment-independent, geometric transformation space; we used such a mapping to predict the correct visualization of the scene. Finally, an appearance-based aliasing detector was developed to show how incorrectness too, can be quantified for debugging purposes. Since computer games heavily rely on the use of highly complex and interactive virtual worlds, they provide an excellent test bed against which to develop, calibrate and validate our techniques. Experiments were conducted on a game engine and other virtual worlds prototypes to determine the applicability and effectiveness of our algorithms. The results show that quantifying visual correctness in virtual scenes is a feasible enterprise, and that effective automatic bug detection can be performed through the techniques we have developed. We expect these techniques to find application in large 3D games and virtual world studios that require a scalable solution to testing their virtual world software and digital content.
In the pursuit of effective affective computing : the relationship between features and registration
Resumo:
For facial expression recognition systems to be applicable in the real world, they need to be able to detect and track a previously unseen person's face and its facial movements accurately in realistic environments. A highly plausible solution involves performing a "dense" form of alignment, where 60-70 fiducial facial points are tracked with high accuracy. The problem is that, in practice, this type of dense alignment had so far been impossible to achieve in a generic sense, mainly due to poor reliability and robustness. Instead, many expression detection methods have opted for a "coarse" form of face alignment, followed by an application of a biologically inspired appearance descriptor such as the histogram of oriented gradients or Gabor magnitudes. Encouragingly, recent advances to a number of dense alignment algorithms have demonstrated both high reliability and accuracy for unseen subjects [e.g., constrained local models (CLMs)]. This begs the question: Aside from countering against illumination variation, what do these appearance descriptors do that standard pixel representations do not? In this paper, we show that, when close to perfect alignment is obtained, there is no real benefit in employing these different appearance-based representations (under consistent illumination conditions). In fact, when misalignment does occur, we show that these appearance descriptors do work well by encoding robustness to alignment error. For this work, we compared two popular methods for dense alignment-subject-dependent active appearance models versus subject-independent CLMs-on the task of action-unit detection. These comparisons were conducted through a battery of experiments across various publicly available data sets (i.e., CK+, Pain, M3, and GEMEP-FERA). We also report our performance in the recent 2011 Facial Expression Recognition and Analysis Challenge for the subject-independent task.
The backfilled GEI : a cross-capture modality gait feature for frontal and side-view gait recognition
Resumo:
In this paper, we propose a novel direction for gait recognition research by proposing a new capture-modality independent, appearance-based feature which we call the Back-filled Gait Energy Image (BGEI). It can can be constructed from both frontal depth images, as well as the more commonly used side-view silhouettes, allowing the feature to be applied across these two differing capturing systems using the same enrolled database. To evaluate this new feature, a frontally captured depth-based gait dataset was created containing 37 unique subjects, a subset of which also contained sequences captured from the side. The results demonstrate that the BGEI can effectively be used to identify subjects through their gait across these two differing input devices, achieving rank-1 match rate of 100%, in our experiments. We also compare the BGEI against the GEI and GEV in their respective domains, using the CASIA dataset and our depth dataset, showing that it compares favourably against them. The experiments conducted were performed using a sparse representation based classifier with a locally discriminating input feature space, which show significant improvement in performance over other classifiers used in gait recognition literature, achieving state of the art results with the GEI on the CASIA dataset.
Resumo:
In this paper, we explore the effectiveness of patch-based gradient feature extraction methods when applied to appearance-based gait recognition. Extending existing popular feature extraction methods such as HOG and LDP, we propose a novel technique which we term the Histogram of Weighted Local Directions (HWLD). These 3 methods are applied to gait recognition using the GEI feature, with classification performed using SRC. Evaluations on the CASIA and OULP datasets show significant improvements using these patch-based methods over existing implementations, with the proposed method achieving the highest recognition rate for the respective datasets. In addition, the HWLD can easily be extended to 3D, which we demonstrate using the GEV feature on the DGD dataset, observing improvements in performance.
Resumo:
After first observing a person, the task of person re-identification involves recognising an individual at different locations across a network of cameras at a later time. Traditionally, this task has been performed by first extracting appearance features of an individual and then matching these features to the previous observation. However, identifying an individual based solely on appearance can be ambiguous, particularly when people wear similar clothing (i.e. people dressed in uniforms in sporting and school settings). This task is made more difficult when the resolution of the input image is small as is typically the case in multi-camera networks. To circumvent these issues, we need to use other contextual cues. In this paper, we use "group" information as our contextual feature to aid in the re-identification of a person, which is heavily motivated by the fact that people generally move together as a collective group. To encode group context, we learn a linear mapping function to assign each person to a "role" or position within the group structure. We then combine the appearance and group context cues using a weighted summation. We demonstrate how this improves performance of person re-identification in a sports environment over appearance based-features.
Resumo:
In this paper we describe cooperative control algorithms for robots and sensor nodes in an underwater environment. Cooperative navigation is defined as the ability of a coupled system of autonomous robots to pool their resources to achieve long-distance navigation and a larger controllability space. Other types of useful cooperation in underwater environments include: exchange of information such as data download and retasking; cooperative localization and tracking; and physical connection (docking) for tasks such as deployment of underwater sensor networks, collection of nodes and rescue of damaged robots. We present experimental results obtained with an underwater system that consists of two very different robots and a number of sensor network modules. We present the hardware and software architecture of this underwater system. We then describe various interactions between the robots and sensor nodes and between the two robots, including cooperative navigation. Finally, we describe our experiments with this underwater system and present data.
Resumo:
The relationship between coronal knee laxity and the restraining properties of the collateral ligaments remains unknown. This study investigated correlations between the structural properties of the collateral ligaments and stress angles used in computer-assisted total knee arthroplasty (TKA), measured with an optically based navigation system. Ten fresh-frozen cadaveric knees (mean age: 81 ± 11 years) were dissected to leave the menisci, cruciate ligaments, posterior joint capsule and collateral ligaments. The resected femur and tibia were rigidly secured within a test system which permitted kinematic registration of the knee using a commercially available image-free navigation system. Frontal plane knee alignment and varus-valgus stress angles were acquired. The force applied during varus-valgus testing was quantified. Medial and lateral bone-collateral ligament-bone specimens were then prepared, mounted within a uni-axial materials testing machine, and extended to failure. Force and displacement data were used to calculate the principal structural properties of the ligaments. The mean varus laxity was 4 ± 1° and the mean valgus laxity was 4 ± 2°. The corresponding mean manual force applied was 10 ± 3 N and 11 ± 4 N, respectively. While measures of knee laxity were independent of the ultimate tensile strength and stiffness of the collateral ligaments, there was a significant correlation between the force applied during stress testing and the instantaneous stiffness of the medial (r = 0.91, p = 0.001) and lateral (r = 0.68, p = 0.04) collateral ligaments. These findings suggest that clinicians may perceive a rate of change of ligament stiffness as the end-point during assessment of collateral knee laxity.
Resumo:
We present a pole inspection system for outdoor environments comprising a high-speed camera on a vertical take-off and landing (VTOL) aerial platform. The pole inspection task requires a vehicle to fly close to a structure while maintaining a fixed stand-off distance from it. Typical GPS errors make GPS-based navigation unsuitable for this task however. When flying outdoors a vehicle is also affected by aerodynamics disturbances such as wind gusts, so the onboard controller must be robust to these disturbances in order to maintain the stand-off distance. Two problems must therefor be addressed: fast and accurate state estimation without GPS, and the design of a robust controller. We resolve these problems by a) performing visual + inertial relative state estimation and b) using a robust line tracker and a nested controller design. Our state estimation exploits high-speed camera images (100Hz) and 70Hz IMU data fused in an Extended Kalman Filter (EKF). We demonstrate results from outdoor experiments for pole-relative hovering, and pole circumnavigation where the operator provides only yaw commands. Lastly, we show results for image-based 3D reconstruction and texture mapping of a pole to demonstrate the usefulness for inspection tasks.
Resumo:
Person re-identification is particularly challenging due to significant appearance changes across separate camera views. In order to re-identify people, a representative human signature should effectively handle differences in illumination, pose and camera parameters. While general appearance-based methods are modelled in Euclidean spaces, it has been argued that some applications in image and video analysis are better modelled via non-Euclidean manifold geometry. To this end, recent approaches represent images as covariance matrices, and interpret such matrices as points on Riemannian manifolds. As direct classification on such manifolds can be difficult, in this paper we propose to represent each manifold point as a vector of similarities to class representers, via a recently introduced form of Bregman matrix divergence known as the Stein divergence. This is followed by using a discriminative mapping of similarity vectors for final classification. The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately. Comparative evaluations on benchmark ETHZ and iLIDS datasets for the person re-identification task show that the proposed approach obtains better performance than recent techniques such as Histogram Plus Epitome, Partial Least Squares, and Symmetry-Driven Accumulation of Local Features.
Resumo:
Facial expression recognition (FER) has been dramatically developed in recent years, thanks to the advancements in related fields, especially machine learning, image processing and human recognition. Accordingly, the impact and potential usage of automatic FER have been growing in a wide range of applications, including human-computer interaction, robot control and driver state surveillance. However, to date, robust recognition of facial expressions from images and videos is still a challenging task due to the difficulty in accurately extracting the useful emotional features. These features are often represented in different forms, such as static, dynamic, point-based geometric or region-based appearance. Facial movement features, which include feature position and shape changes, are generally caused by the movements of facial elements and muscles during the course of emotional expression. The facial elements, especially key elements, will constantly change their positions when subjects are expressing emotions. As a consequence, the same feature in different images usually has different positions. In some cases, the shape of the feature may also be distorted due to the subtle facial muscle movements. Therefore, for any feature representing a certain emotion, the geometric-based position and appearance-based shape normally changes from one image to another image in image databases, as well as in videos. This kind of movement features represents a rich pool of both static and dynamic characteristics of expressions, which playa critical role for FER. The vast majority of the past work on FER does not take the dynamics of facial expressions into account. Some efforts have been made on capturing and utilizing facial movement features, and almost all of them are static based. These efforts try to adopt either geometric features of the tracked facial points, or appearance difference between holistic facial regions in consequent frames or texture and motion changes in loca- facial regions. Although achieved promising results, these approaches often require accurate location and tracking of facial points, which remains problematic.