884 resultados para Computer Vision, Object Alignment, Lucas-Kanade, Inverse-Compositional, Gradient-Decent


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tesis en inglés. Eliminadas las páginas en blanco del pdf

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general. However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of "intelligence". The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them. CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems. HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain. In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.

The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.

Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.

Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.

The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a pattern recognition method focused on paintings images. The purpose is construct a system able to recognize authors or art styles based on common elements of his work (here called patterns). The method is based on comparing images that contain the same or similar patterns. It uses different computer vision techniques, like SIFT and SURF, to describe the patterns in descriptors, K-Means to classify and simplify these descriptors, and RANSAC to determine and detect good results. The method are good to find patterns of known images but not so good if they are not.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’objectif de cette thèse par articles est de présenter modestement quelques étapes du parcours qui mènera (on espère) à une solution générale du problème de l’intelligence artificielle. Cette thèse contient quatre articles qui présentent chacun une différente nouvelle méthode d’inférence perceptive en utilisant l’apprentissage machine et, plus particulièrement, les réseaux neuronaux profonds. Chacun de ces documents met en évidence l’utilité de sa méthode proposée dans le cadre d’une tâche de vision par ordinateur. Ces méthodes sont applicables dans un contexte plus général, et dans certains cas elles on tété appliquées ailleurs, mais ceci ne sera pas abordé dans le contexte de cette de thèse. Dans le premier article, nous présentons deux nouveaux algorithmes d’inférence variationelle pour le modèle génératif d’images appelé codage parcimonieux “spike- and-slab” (CPSS). Ces méthodes d’inférence plus rapides nous permettent d’utiliser des modèles CPSS de tailles beaucoup plus grandes qu’auparavant. Nous démontrons qu’elles sont meilleures pour extraire des détecteur de caractéristiques quand très peu d’exemples étiquetés sont disponibles pour l’entraînement. Partant d’un modèle CPSS, nous construisons ensuite une architecture profonde, la machine de Boltzmann profonde partiellement dirigée (MBP-PD). Ce modèle a été conçu de manière à simplifier d’entraînement des machines de Boltzmann profondes qui nécessitent normalement une phase de pré-entraînement glouton pour chaque couche. Ce problème est réglé dans une certaine mesure, mais le coût d’inférence dans le nouveau modèle est relativement trop élevé pour permettre de l’utiliser de manière pratique. Dans le deuxième article, nous revenons au problème d’entraînement joint de machines de Boltzmann profondes. Cette fois, au lieu de changer de famille de modèles, nous introduisons un nouveau critère d’entraînement qui donne naissance aux machines de Boltzmann profondes à multiples prédictions (MBP-MP). Les MBP-MP sont entraînables en une seule étape et ont un meilleur taux de succès en classification que les MBP classiques. Elles s’entraînent aussi avec des méthodes variationelles standard au lieu de nécessiter un classificateur discriminant pour obtenir un bon taux de succès en classification. Par contre, un des inconvénients de tels modèles est leur incapacité de générer deséchantillons, mais ceci n’est pas trop grave puisque la performance de classification des machines de Boltzmann profondes n’est plus une priorité étant donné les dernières avancées en apprentissage supervisé. Malgré cela, les MBP-MP demeurent intéressantes parce qu’elles sont capable d’accomplir certaines tâches que des modèles purement supervisés ne peuvent pas faire, telles que celle de classifier des données incomplètes ou encore celle de combler intelligemment l’information manquante dans ces données incomplètes. Le travail présenté dans cette thèse s’est déroulé au milieu d’une période de transformations importantes du domaine de l’apprentissage à réseaux neuronaux profonds qui a été déclenchée par la découverte de l’algorithme de “dropout” par Geoffrey Hinton. Dropout rend possible un entraînement purement supervisé d’architectures de propagation unidirectionnel sans être exposé au danger de sur- entraînement. Le troisième article présenté dans cette thèse introduit une nouvelle fonction d’activation spécialement con ̧cue pour aller avec l’algorithme de Dropout. Cette fonction d’activation, appelée maxout, permet l’utilisation de aggrégation multi-canal dans un contexte d’apprentissage purement supervisé. Nous démontrons comment plusieurs tâches de reconnaissance d’objets sont mieux accomplies par l’utilisation de maxout. Pour terminer, sont présentons un vrai cas d’utilisation dans l’industrie pour la transcription d’adresses de maisons à plusieurs chiffres. En combinant maxout avec une nouvelle sorte de couche de sortie pour des réseaux neuronaux de convolution, nous démontrons qu’il est possible d’atteindre un taux de succès comparable à celui des humains sur un ensemble de données coriace constitué de photos prises par les voitures de Google. Ce système a été déployé avec succès chez Google pour lire environ cent million d’adresses de maisons.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid development of data transfer through internet made it easier to send the data accurate and faster to the destination. There are many transmission media to transfer the data to destination like e-mails; at the same time it is may be easier to modify and misuse the valuable information through hacking. So, in order to transfer the data securely to the destination without any modifications, there are many approaches like cryptography and steganography. This paper deals with the image steganography as well as with the different security issues, general overview of cryptography, steganography and digital watermarking approaches.  The problem of copyright violation of multimedia data has increased due to the enormous growth of computer networks that provides fast and error free transmission of any unauthorized duplicate and possibly manipulated copy of multimedia information. In order to be effective for copyright protection, digital watermark must be robust which are difficult to remove from the object in which they are embedded despite a variety of possible attacks. The message to be send safe and secure, we use watermarking. We use invisible watermarking to embed the message using LSB (Least Significant Bit) steganographic technique. The standard LSB technique embed the message in every pixel, but my contribution for this proposed watermarking, works with the hint for embedding the message only on the image edges alone. If the hacker knows that the system uses LSB technique also, it cannot decrypt correct message. To make my system robust and secure, we added cryptography algorithm as Vigenere square. Whereas the message is transmitted in cipher text and its added advantage to the proposed system. The standard Vigenere square algorithm works with either lower case or upper case. The proposed cryptography algorithm is Vigenere square with extension of numbers also. We can keep the crypto key with combination of characters and numbers. So by using these modifications and updating in this existing algorithm and combination of cryptography and steganography method we develop a secure and strong watermarking method. Performance of this watermarking scheme has been analyzed by evaluating the robustness of the algorithm with PSNR (Peak Signal to Noise Ratio) and MSE (Mean Square Error) against the quality of the image for large amount of data. While coming to see results of the proposed encryption, higher value of 89dB of PSNR with small value of MSE is 0.0017. Then it seems the proposed watermarking system is secure and robust for hiding secure information in any digital system, because this system collect the properties of both steganography and cryptography sciences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The current trend in the evolution of sensor systems seeks ways to provide more accuracy and resolution, while at the same time decreasing the size and power consumption. The use of Field Programmable Gate Arrays (FPGAs) provides specific reprogrammable hardware technology that can be properly exploited to obtain a reconfigurable sensor system. This adaptation capability enables the implementation of complex applications using the partial reconfigurability at a very low-power consumption. For highly demanding tasks FPGAs have been favored due to the high efficiency provided by their architectural flexibility (parallelism, on-chip memory, etc.), reconfigurability and superb performance in the development of algorithms. FPGAs have improved the performance of sensor systems and have triggered a clear increase in their use in new fields of application. A new generation of smarter, reconfigurable and lower power consumption sensors is being developed in Spain based on FPGAs. In this paper, a review of these developments is presented, describing as well the FPGA technologies employed by the different research groups and providing an overview of future research within this field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis deals with the challenging problem of designing systems able to perceive objects in underwater environments. In the last few decades research activities in robotics have advanced the state of art regarding intervention capabilities of autonomous systems. State of art in fields such as localization and navigation, real time perception and cognition, safe action and manipulation capabilities, applied to ground environments (both indoor and outdoor) has now reached such a readiness level that it allows high level autonomous operations. On the opposite side, the underwater environment remains a very difficult one for autonomous robots. Water influences the mechanical and electrical design of systems, interferes with sensors by limiting their capabilities, heavily impacts on data transmissions, and generally requires systems with low power consumption in order to enable reasonable mission duration. Interest in underwater applications is driven by needs of exploring and intervening in environments in which human capabilities are very limited. Nowadays, most underwater field operations are carried out by manned or remotely operated vehicles, deployed for explorations and limited intervention missions. Manned vehicles, directly on-board controlled, expose human operators to risks related to the stay in field of the mission, within a hostile environment. Remotely Operated Vehicles (ROV) currently represent the most advanced technology for underwater intervention services available on the market. These vehicles can be remotely operated for long time but they need support from an oceanographic vessel with multiple teams of highly specialized pilots. Vehicles equipped with multiple state-of-art sensors and capable to autonomously plan missions have been deployed in the last ten years and exploited as observers for underwater fauna, seabed, ship wrecks, and so on. On the other hand, underwater operations like object recovery and equipment maintenance are still challenging tasks to be conducted without human supervision since they require object perception and localization with much higher accuracy and robustness, to a degree seldom available in Autonomous Underwater Vehicles (AUV). This thesis reports the study, from design to deployment and evaluation, of a general purpose and configurable platform dedicated to stereo-vision perception in underwater environments. Several aspects related to the peculiar environment characteristics have been taken into account during all stages of system design and evaluation: depth of operation and light conditions, together with water turbidity and external weather, heavily impact on perception capabilities. The vision platform proposed in this work is a modular system comprising off-the-shelf components for both the imaging sensors and the computational unit, linked by a high performance ethernet network bus. The adopted design philosophy aims at achieving high flexibility in terms of feasible perception applications, that should not be as limited as in case of a special-purpose and dedicated hardware. Flexibility is required by the variability of underwater environments, with water conditions ranging from clear to turbid, light backscattering varying with daylight and depth, strong color distortion, and other environmental factors. Furthermore, the proposed modular design ensures an easier maintenance and update of the system over time. Performance of the proposed system, in terms of perception capabilities, has been evaluated in several underwater contexts taking advantage of the opportunity offered by the MARIS national project. Design issues like energy power consumption, heat dissipation and network capabilities have been evaluated in different scenarios. Finally, real-world experiments, conducted in multiple and variable underwater contexts, including open sea waters, have led to the collection of several datasets that have been publicly released to the scientific community. The vision system has been integrated in a state of the art AUV equipped with a robotic arm and gripper, and has been exploited in the robot control loop to successfully perform underwater grasping operations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de Mestrado, Engenharia Informática, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vision systems are powerful tools playing an increasingly important role in modern industry, to detect errors and maintain product standards. With the enlarged availability of affordable industrial cameras, computer vision algorithms have been increasingly applied in industrial manufacturing processes monitoring. Until a few years ago, industrial computer vision applications relied only on ad-hoc algorithms designed for the specific object and acquisition setup being monitored, with a strong focus on co-designing the acquisition and processing pipeline. Deep learning has overcome these limits providing greater flexibility and faster re-configuration. In this work, the process to be inspected consists in vials’ pack formation entering a freeze-dryer, which is a common scenario in pharmaceutical active ingredient packaging lines. To ensure that the machine produces proper packs, a vision system is installed at the entrance of the freeze-dryer to detect eventual anomalies with execution times compatible with the production specifications. Other constraints come from sterility and safety standards required in pharmaceutical manufacturing. This work presents an overview about the production line, with particular focus on the vision system designed, and about all trials conducted to obtain the final performance. Transfer learning, alleviating the requirement for a large number of training data, combined with data augmentation methods, consisting in the generation of synthetic images, were used to effectively increase the performances while reducing the cost of data acquisition and annotation. The proposed vision algorithm is composed by two main subtasks, designed respectively to vials counting and discrepancy detection. The first one was trained on more than 23k vials (about 300 images) and tested on 5k more (about 75 images), whereas 60 training images and 52 testing images were used for the second one.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays despite improvements in usability and intuitiveness users have to adapt to the proposed systems to satisfy their needs. For instance, they must learn how to achieve tasks, how to interact with the system, and fulfill system's specifications. This paper proposes an approach to improve this situation enabling graphical user interface redefinition through virtualization and computer vision with the aim of increasing the system's usability. To achieve this goal the approach is based on enriched task models, virtualization and picture-driven computing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sendo uma forma natural de interação homem-máquina, o reconhecimento de gestos implica uma forte componente de investigação em áreas como a visão por computador e a aprendizagem computacional. O reconhecimento gestual é uma área com aplicações muito diversas, fornecendo aos utilizadores uma forma mais natural e mais simples de comunicar com sistemas baseados em computador, sem a necessidade de utilização de dispositivos extras. Assim, o objectivo principal da investigação na área de reconhecimento de gestos aplicada à interacção homemmáquina é o da criação de sistemas, que possam identificar gestos específicos e usálos para transmitir informações ou para controlar dispositivos. Para isso as interfaces baseados em visão para o reconhecimento de gestos, necessitam de detectar a mão de forma rápida e robusta e de serem capazes de efetuar o reconhecimento de gestos em tempo real. Hoje em dia, os sistemas de reconhecimento de gestos baseados em visão são capazes de trabalhar com soluções específicas, construídos para resolver um determinado problema e configurados para trabalhar de uma forma particular. Este projeto de investigação estudou e implementou soluções, suficientemente genéricas, com o recurso a algoritmos de aprendizagem computacional, permitindo a sua aplicação num conjunto alargado de sistemas de interface homem-máquina, para reconhecimento de gestos em tempo real. A solução proposta, Gesture Learning Module Architecture (GeLMA), permite de forma simples definir um conjunto de comandos que pode ser baseado em gestos estáticos e dinâmicos e que pode ser facilmente integrado e configurado para ser utilizado numa série de aplicações. É um sistema de baixo custo e fácil de treinar e usar, e uma vez que é construído unicamente com bibliotecas de código. As experiências realizadas permitiram mostrar que o sistema atingiu uma precisão de 99,2% em termos de reconhecimento de gestos estáticos e uma precisão média de 93,7% em termos de reconhecimento de gestos dinâmicos. Para validar a solução proposta, foram implementados dois sistemas completos. O primeiro é um sistema em tempo real capaz de ajudar um árbitro a arbitrar um jogo de futebol robótico. A solução proposta combina um sistema de reconhecimento de gestos baseada em visão com a definição de uma linguagem formal, o CommLang Referee, à qual demos a designação de Referee Command Language Interface System (ReCLIS). O sistema identifica os comandos baseados num conjunto de gestos estáticos e dinâmicos executados pelo árbitro, sendo este posteriormente enviado para um interface de computador que transmite a respectiva informação para os robôs. O segundo é um sistema em tempo real capaz de interpretar um subconjunto da Linguagem Gestual Portuguesa. As experiências demonstraram que o sistema foi capaz de reconhecer as vogais em tempo real de forma fiável. Embora a solução implementada apenas tenha sido treinada para reconhecer as cinco vogais, o sistema é facilmente extensível para reconhecer o resto do alfabeto. As experiências também permitiram mostrar que a base dos sistemas de interação baseados em visão pode ser a mesma para todas as aplicações e, deste modo facilitar a sua implementação. A solução proposta tem ainda a vantagem de ser suficientemente genérica e uma base sólida para o desenvolvimento de sistemas baseados em reconhecimento gestual que podem ser facilmente integrados com qualquer aplicação de interface homem-máquina. A linguagem formal de definição da interface pode ser redefinida e o sistema pode ser facilmente configurado e treinado com um conjunto de gestos diferentes de forma a serem integrados na solução final.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"Lecture notes in computational vision and biomechanics series, ISSN 2212-9391, vol. 19"

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tese de Doutoramento em Engenharia de Eletrónica e de Computadores