889 resultados para computer vision, facial expression recognition, swig, red5, actionscript, ruby on rails, html5
Resumo:
This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.
The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.
Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.
Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.
The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.
Resumo:
This paper is an overview of the development and application of Computer Vision for the Structural Health
Monitoring (SHM) of Bridges. A brief explanation of SHM is provided, followed by a breakdown of the stages of computer
vision techniques separated into laboratory and field trials. Qualitative evaluations and comparison of these methods have been
provided along with the proposal of guidelines for new vision-based SHM systems.
Resumo:
Much of the bridge stock on major transport links in North America and Europe was constructed in the 1950s and 1960s and has since deteriorated or is carrying loads far in excess of the original design loads. Structural Health Monitoring Systems (SHM) can provide valuable information on the bridge capacity but the application of such systems is currently limited by access and bridge type. This paper investigates the use of computer vision systems for SHM. A series of field tests have been carried out to test the accuracy of displacement measurements using contactless methods. A video image of each test was processed using a modified version of the optical flow tracking method to track displacement. These results have been validated with an established measurement method using linear variable differential transformers (LVDTs). The results obtained from the algorithm provided an accurate comparison with the validation measurements. The calculated displacements agree within 2% of the verified LVDT measurements, a number of post processing methods were then applied to attempt to reduce this error.
Resumo:
Much of the bridge stock on major transport links in North America and Europe was constructed in the 1950’s and 1960’s and has since deteriorated or is carrying loads far in excess of the original design loads. Structural Health Monitoring Systems (SHM) can provide valuable information on the bridge capacity but the application of such systems is currently limited by access and system cost. This paper investigates the development of a low cost portable SHM system using commercially available cameras and computer vision techniques. A series of laboratory tests have been carried out to test the accuracy of displacement measurements using contactless methods. The results from each of the tests have been validated with established measurement methods, such as linear variable differential transformers (LVDTs). A video image of each test was processed using two different digital image correlation programs. The results obtained from the digital image correlation methods provided an accurate comparison with the validation measurements. The calculated displacements agree within 4% of the verified measurements LVDT measurements in most cases confirming the suitability full camera based SHM systems
Resumo:
Strawberries harvested for processing as frozen fruits are currently de-calyxed manually in the field. This process requires the removal of the stem cap with green leaves (i.e. the calyx) and incurs many disadvantages when performed by hand. Not only does it necessitate the need to maintain cutting tool sanitation, but it also increases labor time and exposure of the de-capped strawberries before in-plant processing. This leads to labor inefficiency and decreased harvest yield. By moving the calyx removal process from the fields to the processing plants, this new practice would reduce field labor and improve management and logistics, while increasing annual yield. As labor prices continue to increase, the strawberry industry has shown great interest in the development and implementation of an automated calyx removal system. In response, this dissertation describes the design, operation, and performance of a full-scale automatic vision-guided intelligent de-calyxing (AVID) prototype machine. The AVID machine utilizes commercially available equipment to produce a relatively low cost automated de-calyxing system that can be retrofitted into existing food processing facilities. This dissertation is broken up into five sections. The first two sections include a machine overview and a 12-week processing plant pilot study. Results of the pilot study indicate the AVID machine is able to de-calyx grade-1-with-cap conical strawberries at roughly 66 percent output weight yield at a throughput of 10,000 pounds per hour. The remaining three sections describe in detail the three main components of the machine: a strawberry loading and orientation conveyor, a machine vision system for calyx identification, and a synchronized multi-waterjet knife calyx removal system. In short, the loading system utilizes rotational energy to orient conical strawberries. The machine vision system determines cut locations through RGB real-time feature extraction. The high-speed multi-waterjet knife system uses direct drive actuation to locate 30,000 psi cutting streams to precise coordinates for calyx removal. Based on the observations and studies performed within this dissertation, the AVID machine is seen to be a viable option for automated high-throughput strawberry calyx removal. A summary of future tasks and further improvements is discussed at the end.
Resumo:
In the past few years, human facial age estimation has drawn a lot of attention in the computer vision and pattern recognition communities because of its important applications in age-based image retrieval, security control and surveillance, biomet- rics, human-computer interaction (HCI) and social robotics. In connection with these investigations, estimating the age of a person from the numerical analysis of his/her face image is a relatively new topic. Also, in problems such as Image Classification the Deep Neural Networks have given the best results in some areas including age estimation. In this work we use three hand-crafted features as well as five deep features that can be obtained from pre-trained deep convolutional neural networks. We do a comparative study of the obtained age estimation results with these features.
Resumo:
In the conceptual framework of affective neuroscience, this thesis intends to advance the understanding of the plasticity mechanisms of other’s emotional facial expression representations. Chapter 1 outlines a description of the neurophysiological bases of Hebbian plasticity, reviews influential studies that adopted paired associative stimulation procedures, and introduces new lines of research where the impact of cortico-cortical paired associative stimulation protocols on higher order cognitive functions is investigated. The experiments in Chapter 2 aimed to test the modulatory influence of a perceptual-motor training, based on the execution of emotional expressions, on the subsequent emotion intensity judgements of others’ high (i.e., full visible) and low-intensity (i.e., masked) emotional expressions. As a result of the training-induced learning, participants showed a significant congruence effect, as indicated by relatively higher expression intensity ratings for the same emotion as the one that was previously trained. Interestingly, although judged as overall less emotionally intense, surgical facemasks did not prevent the emotion-specific effects of the training to occur, suggesting that covering the lower part of other’s face do not interact with the training-induced congruence effect. In Chapter 3 it was implemented a transcranial magnetic stimulation study targeting neural pathways involving re-entrant input from higher order brain regions into lower levels of the visual processing hierarchy. We focused on cortical visual networks within the temporo-occipital stream underpinning the processing of emotional faces and susceptible to plastic adaptations. Importantly, we tested the plasticity-induced effects in a state dependent manner, by administering ccPAS while presenting different facial expressions yet afferent to a specific emotion. Results indicated that the discrimination accuracy of emotion-specific expressions is enhanced following the ccPAS treatment, suggesting that a multi-coil TMS intervention might represent a suitable tool to drive brain remodeling at a neural network level, and consequently influence a specific behavior.
Resumo:
One of the most visionary goals of Artificial Intelligence is to create a system able to mimic and eventually surpass the intelligence observed in biological systems including, ambitiously, the one observed in humans. The main distinctive strength of humans is their ability to build a deep understanding of the world by learning continuously and drawing from their experiences. This ability, which is found in various degrees in all intelligent biological beings, allows them to adapt and properly react to changes by incrementally expanding and refining their knowledge. Arguably, achieving this ability is one of the main goals of Artificial Intelligence and a cornerstone towards the creation of intelligent artificial agents. Modern Deep Learning approaches allowed researchers and industries to achieve great advancements towards the resolution of many long-standing problems in areas like Computer Vision and Natural Language Processing. However, while this current age of renewed interest in AI allowed for the creation of extremely useful applications, a concerningly limited effort is being directed towards the design of systems able to learn continuously. The biggest problem that hinders an AI system from learning incrementally is the catastrophic forgetting phenomenon. This phenomenon, which was discovered in the 90s, naturally occurs in Deep Learning architectures where classic learning paradigms are applied when learning incrementally from a stream of experiences. This dissertation revolves around the Continual Learning field, a sub-field of Machine Learning research that has recently made a comeback following the renewed interest in Deep Learning approaches. This work will focus on a comprehensive view of continual learning by considering algorithmic, benchmarking, and applicative aspects of this field. This dissertation will also touch on community aspects such as the design and creation of research tools aimed at supporting Continual Learning research, and the theoretical and practical aspects concerning public competitions in this field.
Resumo:
Vision systems are powerful tools playing an increasingly important role in modern industry, to detect errors and maintain product standards. With the enlarged availability of affordable industrial cameras, computer vision algorithms have been increasingly applied in industrial manufacturing processes monitoring. Until a few years ago, industrial computer vision applications relied only on ad-hoc algorithms designed for the specific object and acquisition setup being monitored, with a strong focus on co-designing the acquisition and processing pipeline. Deep learning has overcome these limits providing greater flexibility and faster re-configuration. In this work, the process to be inspected consists in vials’ pack formation entering a freeze-dryer, which is a common scenario in pharmaceutical active ingredient packaging lines. To ensure that the machine produces proper packs, a vision system is installed at the entrance of the freeze-dryer to detect eventual anomalies with execution times compatible with the production specifications. Other constraints come from sterility and safety standards required in pharmaceutical manufacturing. This work presents an overview about the production line, with particular focus on the vision system designed, and about all trials conducted to obtain the final performance. Transfer learning, alleviating the requirement for a large number of training data, combined with data augmentation methods, consisting in the generation of synthetic images, were used to effectively increase the performances while reducing the cost of data acquisition and annotation. The proposed vision algorithm is composed by two main subtasks, designed respectively to vials counting and discrepancy detection. The first one was trained on more than 23k vials (about 300 images) and tested on 5k more (about 75 images), whereas 60 training images and 52 testing images were used for the second one.
Resumo:
La présente recherche est constituée de deux études. Dans l’étude 1, il s’agit d’améliorer la validité écologique des travaux sur la reconnaissance émotionnelle faciale (REF) en procédant à la validation de stimuli qui permettront d’étudier cette question en réalité virtuelle. L’étude 2 vise à documenter la relation entre le niveau de psychopathie et la performance à une tâche de REF au sein d’un échantillon de la population générale. Pour ce faire, nous avons créé des personnages virtuels animés de différentes origines ethniques exprimant les six émotions fondamentales à différents niveaux d’intensité. Les stimuli, sous forme statique et dynamique, ont été évalués par des étudiants universitaires. Les résultats de l’étude 1 indiquent que les stimuli virtuels, en plus de comporter plusieurs traits distinctifs, constituent un ensemble valide pour étudier la REF. L’étude 2 a permis de constater qu’un score plus élevé à l’échelle de psychopathie, spécifiquement à la facette de l’affect plat, est associé à une plus grande sensibilité aux expressions émotionnelles, particulièrement pour la tristesse. Inversement, un niveau élevé de tendances criminelles est, pour sa part, associé à une certaine insensibilité générale et à un déficit spécifique pour le dégoût. Ces résultats sont spécifiques aux participants masculins. Les données s’inscrivent dans une perspective évolutive de la psychopathie. L’étude met en évidence l’importance d’étudier l’influence respective des facettes de la personnalité psychopathique, ce même dans des populations non-cliniques. De plus, elle souligne la manifestation différentielle des tendances psychopathiques chez les hommes et chez les femmes.
Resumo:
We present a new approach to model and classify breast parenchymal tissue. Given a mammogram, first, we will discover the distribution of the different tissue densities in an unsupervised manner, and second, we will use this tissue distribution to perform the classification. We achieve this using a classifier based on local descriptors and probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature. We studied the influence of different descriptors like texture and SIFT features at the classification stage showing that textons outperform SIFT in all cases. Moreover we demonstrate that pLSA automatically extracts meaningful latent aspects generating a compact tissue representation based on their densities, useful for discriminating on mammogram classification. We show the results of tissue classification over the MIAS and DDSM datasets. We compare our method with approaches that classified these same datasets showing a better performance of our proposal
Resumo:
Catadioptric sensors are combinations of mirrors and lenses made in order to obtain a wide field of view. In this paper we propose a new sensor that has omnidirectional viewing ability and it also provides depth information about the nearby surrounding. The sensor is based on a conventional camera coupled with a laser emitter and two hyperbolic mirrors. Mathematical formulation and precise specifications of the intrinsic and extrinsic parameters of the sensor are discussed. Our approach overcomes limitations of the existing omni-directional sensors and eventually leads to reduced costs of production
Resumo:
Positioning a robot with respect to objects by using data provided by a camera is a well known technique called visual servoing. In order to perform a task, the object must exhibit visual features which can be extracted from different points of view. Then, visual servoing is object-dependent as it depends on the object appearance. Therefore, performing the positioning task is not possible in presence of nontextured objets or objets for which extracting visual features is too complex or too costly. This paper proposes a solution to tackle this limitation inherent to the current visual servoing techniques. Our proposal is based on the coded structured light approach as a reliable and fast way to solve the correspondence problem. In this case, a coded light pattern is projected providing robust visual features independently of the object appearance
Resumo:
An algorithm for tracking multiple feature positions in a dynamic image sequence is presented. This is achieved using a combination of two trajectory-based methods, with the resulting hybrid algorithm exhibiting the advantages of both. An optimizing exchange algorithm is described which enables short feature paths to be tracked without prior knowledge of the motion being studied. The resulting partial trajectories are then used to initialize a fast predictor algorithm which is capable of rapidly tracking multiple feature paths. As this predictor algorithm becomes tuned to the feature positions being tracked, it is shown how the location of occluded or poorly detected features can be predicted. The results of applying this tracking algorithm to data obtained from real-world scenes are then presented.
Resumo:
The 3D shape of an object and its 3D location have traditionally thought of as very separate entities, although both can be described within a single 3D coordinate frame. Here, 3D shape and location are considered as two aspects of a view-based approach to representing depth, avoiding the use of 3D coordinate frames.