903 resultados para Computer vision industry
Resumo:
The main objective of this work was to enable the recognition of human gestures through the development of a computer program. The program created captures the gestures executed by the user through a camera attached to the computer and sends it to the robot command referring to the gesture. They were interpreted in total ve gestures made by human hand. The software (developed in C ++) widely used the computer vision concepts and open source library OpenCV that directly impact the overall e ciency of the control of mobile robots. The computer vision concepts take into account the use of lters to smooth/blur the image noise reduction, color space to better suit the developer's desktop as well as useful information for manipulating digital images. The OpenCV library was essential in creating the project because it was possible to use various functions/procedures for complete control lters, image borders, image area, the geometric center of borders, exchange of color spaces, convex hull and convexity defect, plus all the necessary means for the characterization of imaged features. During the development of the software was the appearance of several problems, as false positives (noise), underperforming the insertion of various lters with sizes oversized masks, as well as problems arising from the choice of color space for processing human skin tones. However, after the development of seven versions of the control software, it was possible to minimize the occurrence of false positives due to a better use of lters combined with a well-dimensioned mask size (tested at run time) all associated with a programming logic that has been perfected over the construction of the seven versions. After all the development is managed software that met the established requirements. After the completion of the control software, it was observed that the overall e ectiveness of the various programs, highlighting in particular the V programs: 84.75 %, with VI: 93.00 % and VII with: 94.67 % showed that the nal program performed well in interpreting gestures, proving that it was possible the mobile robot control through human gestures without the need for external accessories to give it a better mobility and cost savings for maintain such a system. The great merit of the program was to assist capacity in demystifying the man set/machine therefore uses an easy and intuitive interface for control of mobile robots. Another important feature observed is that to control the mobile robot is not necessary to be close to the same, as to control the equipment is necessary to receive only the address that the Robotino passes to the program via network or Wi-Fi.
Resumo:
In this thesis, we introduce DeReEs-4v, an algorithm for unsupervised and automatic registration of two video frames captured depth-sensing cameras. DeReEs-4V receives two RGBD video streams from two depth-sensing cameras arbitrary located in an indoor space that share a minimum amount of 25% overlap between their captured scenes. The motivation of this research is to employ multiple depth-sensing cameras to enlarge the field of view and acquire a more complete and accurate 3D information of the environment. A typical way to combine multiple views from different cameras is through manual calibration. However, this process is time-consuming and may require some technical knowledge. Moreover, calibration has to be repeated when the location or position of the cameras change. In this research, we demonstrate how DeReEs-4V registration can be used to find the transformation of the view of one camera with respect to the other at interactive rates. Our algorithm automatically finds the 3D transformation to match the views from two cameras, requires no human interference, and is robust to camera movements while capturing. To validate this approach, a thorough examination of the system performance under different scenarios is presented. The system presented here supports any application that might benefit from the wider field-of-view provided by the combined scene from both cameras, including applications in 3D telepresence, gaming, people tracking, videoconferencing and computer vision.
Resumo:
Nell'elaborato viene introdotto l'ambito della Computer Vision e come l'algoritmo SIFT si inserisce nel suo panorama. Viene inoltre descritto SIFT stesso, le varie fasi di cui si compone e un'applicazione al problema dell'object recognition. Infine viene presentata un'implementazione di SIFT in linguaggio Python creata per ottenere un'applicazione didattica interattiva e vengono mostrati esempi di questa applicazione.
Resumo:
Lo scopo della tesi è creare un’architettura in FPGA in grado di ricavare informazioni 3D da una coppia di sensori stereo. La pipeline è stata realizzata utilizzando il System-on-Chip Zynq, che permette una stretta interazione tra la parte hardware realizzata in FPGA e la CPU. Dopo uno studio preliminare degli strumenti hardware e software, è stata realizzata l’architettura base per la scrittura e la lettura di immagini nella memoria DDR dello Zynq. In seguito l’attenzione si è spostata sull’implementazione di algoritmi stereo (rettificazione e stereo matching) su FPGA e nella realizzazione di una pipeline in grado di ricavare accurate mappe di disparità in tempo reale acquisendo le immagini da una camera stereo.
Resumo:
Questa tesi si occupa dell’estensione di un framework software finalizzato all'individuazione e al tracciamento di persone in una scena ripresa da telecamera stereoscopica. In primo luogo è rimossa la necessità di una calibrazione manuale offline del sistema sfruttando algoritmi che consentono di individuare, a partire da un fotogramma acquisito dalla camera, il piano su cui i soggetti tracciati si muovono. Inoltre, è introdotto un modulo software basato su deep learning con lo scopo di migliorare la precisione del tracciamento. Questo componente, che è in grado di individuare le teste presenti in un fotogramma, consente ridurre i dati analizzati al solo intorno della posizione effettiva di una persona, escludendo oggetti che l’algoritmo di tracciamento sarebbe portato a individuare come persone.
Resumo:
Acquiring 3D shape from images is a classic problem in Computer Vision occupying researchers for at least 20 years. Only recently however have these ideas matured enough to provide highly accurate results. We present a complete algorithm to reconstruct 3D objects from images using the stereo correspondence cue. The technique can be described as a pipeline of four basic building blocks: camera calibration, image segmentation, photo-consistency estimation from images, and surface extraction from photo-consistency. In this Chapter we will put more emphasis on the latter two: namely how to extract geometric information from a set of photographs without explicit camera visibility, and how to combine different geometry estimates in an optimal way. © 2010 Springer-Verlag Berlin Heidelberg.
Resumo:
Photometric Stereo is a powerful image based 3D reconstruction technique that has recently been used to obtain very high quality reconstructions. However, in its classic form, Photometric Stereo suffers from two main limitations: Firstly, one needs to obtain images of the 3D scene under multiple different illuminations. As a result the 3D scene needs to remain static during illumination changes, which prohibits the reconstruction of deforming objects. Secondly, the images obtained must be from a single viewpoint. This leads to depth-map based 2.5 reconstructions, instead of full 3D surfaces. The aim of this Chapter is to show how these limitations can be alleviated, leading to the derivation of two practical 3D acquisition systems: The first one, based on the powerful Coloured Light Photometric Stereo method can be used to reconstruct moving objects such as cloth or human faces. The second, permits the complete 3D reconstruction of challenging objects such as porcelain vases. In addition to algorithmic details, the Chapter pays attention to practical issues such as setup calibration, detection and correction of self and cast shadows. We provide several evaluation experiments as well as reconstruction results. © 2010 Springer-Verlag Berlin Heidelberg.
Resumo:
This work presents the design of a real-time system to model visual objects with the use of self-organising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction.
Resumo:
Shape-based registration methods frequently encounters in the domains of computer vision, image processing and medical imaging. The registration problem is to find an optimal transformation/mapping between sets of rigid or nonrigid objects and to automatically solve for correspondences. In this paper we present a comparison of two different probabilistic methods, the entropy and the growing neural gas network (GNG), as general feature-based registration algorithms. Using entropy shape modelling is performed by connecting the point sets with the highest probability of curvature information, while with GNG the points sets are connected using nearest-neighbour relationships derived from competitive hebbian learning. In order to compare performances we use different levels of shape deformation starting with a simple shape 2D MRI brain ventricles and moving to more complicated shapes like hands. Results both quantitatively and qualitatively are given for both sets.
Resumo:
This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and non-verbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and non-verbal behaviours required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and non-verbal behaviour, since it requires only very limited verbal understanding on the part of the machine. This scenario allows us to concentrate on non-verbal capabilities without having to address at the same time the challenges of spoken language understanding, task modeling etc. We first summarise three prototype versions of the SAL scenario, in which the behaviour of the Sensitive Artificial Listener characters was determined by a human operator. These prototypes served the purpose of verifying the effectiveness of the SAL scenario and allowed us to collect data required for building system components for analysing and synthesising the respective behaviours. We then describe the fully autonomous integrated real-time system we created, which combines incremental analysis of user behaviour, dialogue management, and synthesis of speaker and listener behaviour of a SAL character displayed as a virtual agent. We discuss principles that should underlie the evaluation of SAL-type systems. Since the system is designed for modularity and reuse, and since it is publicly available, the SAL system has potential as a joint research tool in the affective computing research community.
Resumo:
This theoretical paper attempts to define some of the key components and challenges required to create embodied conversational agents that can be genuinely interesting conversational partners. Wittgenstein's argument concerning talking lions emphasizes the importance of having a shared common ground as a basis for conversational interactions. Virtual bats suggests that-for some people at least-it is important that there be a feeling of authenticity concerning a subjectively experiencing entity that can convey what it is like to be that entity. Electric sheep reminds us of the importance of empathy in human conversational interaction and that we should provide a full communicative repertoire of both verbal and non-verbal components if we are to create genuinely engaging interactions. Also we may be making the task more difficult rather than easy if we leave out non-verbal aspects of communication. Finally, analogical peacocks highlights the importance of between minds alignment and establishes a longer term goal of being interesting, creative, and humorous if an embodied conversational agent is to be truly an engaging conversational partner. Some potential directions and solutions to addressing these issues are suggested.