859 resultados para Computer vision teaching
Resumo:
Machine vision is an important subject in computer science and engineering degrees. For laboratory experimentation, it is desirable to have a complete and easy-to-use tool. In this work we present a Java library, oriented to teaching computer vision. We have designed and built the library from the scratch with enfasis on readability and understanding rather than on efficiency. However, the library can also be used for research purposes. JavaVis is an open source Java library, oriented to the teaching of Computer Vision. It consists of a framework with several features that meet its demands. It has been designed to be easy to use: the user does not have to deal with internal structures or graphical interface, and should the student need to add a new algorithm it can be done simply enough. Once we sketch the library, we focus on the experience the student gets using this library in several computer vision courses. Our main goal is to find out whether the students understand what they are doing, that is, find out how much the library helps the student in grasping the basic concepts of computer vision. In the last four years we have conducted surveys to assess how much the students have improved their skills by using this library.
Resumo:
In this article, we present a new framework oriented to teach Computer Vision related subjects called JavaVis. It is a computer vision library divided in three main areas: 2D package is featured for classical computer vision processing; 3D package, which includes a complete 3D geometric toolset, is used for 3D vision computing; Desktop package comprises a tool for graphic designing and testing of new algorithms. JavaVis is designed to be easy to use, both for launching and testing existing algorithms and for developing new ones.
Resumo:
Nowadays despite improvements in usability and intuitiveness users have to adapt to the proposed systems to satisfy their needs. For instance, they must learn how to achieve tasks, how to interact with the system, and fulfill system's specifications. This paper proposes an approach to improve this situation enabling graphical user interface redefinition through virtualization and computer vision with the aim of increasing the system's usability. To achieve this goal the approach is based on enriched task models, virtualization and picture-driven computing.
Resumo:
Sendo uma forma natural de interação homem-máquina, o reconhecimento de gestos implica uma forte componente de investigação em áreas como a visão por computador e a aprendizagem computacional. O reconhecimento gestual é uma área com aplicações muito diversas, fornecendo aos utilizadores uma forma mais natural e mais simples de comunicar com sistemas baseados em computador, sem a necessidade de utilização de dispositivos extras. Assim, o objectivo principal da investigação na área de reconhecimento de gestos aplicada à interacção homemmáquina é o da criação de sistemas, que possam identificar gestos específicos e usálos para transmitir informações ou para controlar dispositivos. Para isso as interfaces baseados em visão para o reconhecimento de gestos, necessitam de detectar a mão de forma rápida e robusta e de serem capazes de efetuar o reconhecimento de gestos em tempo real. Hoje em dia, os sistemas de reconhecimento de gestos baseados em visão são capazes de trabalhar com soluções específicas, construídos para resolver um determinado problema e configurados para trabalhar de uma forma particular. Este projeto de investigação estudou e implementou soluções, suficientemente genéricas, com o recurso a algoritmos de aprendizagem computacional, permitindo a sua aplicação num conjunto alargado de sistemas de interface homem-máquina, para reconhecimento de gestos em tempo real. A solução proposta, Gesture Learning Module Architecture (GeLMA), permite de forma simples definir um conjunto de comandos que pode ser baseado em gestos estáticos e dinâmicos e que pode ser facilmente integrado e configurado para ser utilizado numa série de aplicações. É um sistema de baixo custo e fácil de treinar e usar, e uma vez que é construído unicamente com bibliotecas de código. As experiências realizadas permitiram mostrar que o sistema atingiu uma precisão de 99,2% em termos de reconhecimento de gestos estáticos e uma precisão média de 93,7% em termos de reconhecimento de gestos dinâmicos. Para validar a solução proposta, foram implementados dois sistemas completos. O primeiro é um sistema em tempo real capaz de ajudar um árbitro a arbitrar um jogo de futebol robótico. A solução proposta combina um sistema de reconhecimento de gestos baseada em visão com a definição de uma linguagem formal, o CommLang Referee, à qual demos a designação de Referee Command Language Interface System (ReCLIS). O sistema identifica os comandos baseados num conjunto de gestos estáticos e dinâmicos executados pelo árbitro, sendo este posteriormente enviado para um interface de computador que transmite a respectiva informação para os robôs. O segundo é um sistema em tempo real capaz de interpretar um subconjunto da Linguagem Gestual Portuguesa. As experiências demonstraram que o sistema foi capaz de reconhecer as vogais em tempo real de forma fiável. Embora a solução implementada apenas tenha sido treinada para reconhecer as cinco vogais, o sistema é facilmente extensível para reconhecer o resto do alfabeto. As experiências também permitiram mostrar que a base dos sistemas de interação baseados em visão pode ser a mesma para todas as aplicações e, deste modo facilitar a sua implementação. A solução proposta tem ainda a vantagem de ser suficientemente genérica e uma base sólida para o desenvolvimento de sistemas baseados em reconhecimento gestual que podem ser facilmente integrados com qualquer aplicação de interface homem-máquina. A linguagem formal de definição da interface pode ser redefinida e o sistema pode ser facilmente configurado e treinado com um conjunto de gestos diferentes de forma a serem integrados na solução final.
Resumo:
"Lecture notes in computational vision and biomechanics series, ISSN 2212-9391, vol. 19"
Resumo:
Tese de Doutoramento em Engenharia de Eletrónica e de Computadores
Resumo:
This paper presents a pattern recognition method focused on paintings images. The purpose is construct a system able to recognize authors or art styles based on common elements of his work (here called patterns). The method is based on comparing images that contain the same or similar patterns. It uses different computer vision techniques, like SIFT and SURF, to describe the patterns in descriptors, K-Means to classify and simplify these descriptors, and RANSAC to determine and detect good results. The method are good to find patterns of known images but not so good if they are not.
Resumo:
Following their detection and seizure by police and border guard authorities, false identity and travel documents are usually scanned, producing digital images. This research investigates the potential of these images to classify false identity documents, highlight links between documents produced by a same modus operandi or same source, and thus support forensic intelligence efforts. Inspired by previous research work about digital images of Ecstasy tablets, a systematic and complete method has been developed to acquire, collect, process and compare images of false identity documents. This first part of the article highlights the critical steps of the method and the development of a prototype that processes regions of interest extracted from images. Acquisition conditions have been fine-tuned in order to optimise reproducibility and comparability of images. Different filters and comparison metrics have been evaluated and the performance of the method has been assessed using two calibration and validation sets of documents, made up of 101 Italian driving licenses and 96 Portuguese passports seized in Switzerland, among which some were known to come from common sources. Results indicate that the use of Hue and Edge filters or their combination to extract profiles from images, and then the comparison of profiles with a Canberra distance-based metric provides the most accurate classification of documents. The method appears also to be quick, efficient and inexpensive. It can be easily operated from remote locations and shared amongst different organisations, which makes it very convenient for future operational applications. The method could serve as a first fast triage method that may help target more resource-intensive profiling methods (based on a visual, physical or chemical examination of documents for instance). Its contribution to forensic intelligence and its application to several sets of false identity documents seized by police and border guards will be developed in a forthcoming article (part II).
Resumo:
This work presents the implementation and comparison of three different techniques of three-dimensional computer vision as follows: • Stereo vision - correlation between two 2D images • Sensorial fusion - use of different sensors: camera 2D + ultrasound sensor (1D); • Structured light The computer vision techniques herein presented took into consideration the following characteristics: • Computational effort ( elapsed time for obtain the 3D information); • Influence of environmental conditions (noise due to a non uniform lighting, overlighting and shades); • The cost of the infrastructure for each technique; • Analysis of uncertainties, precision and accuracy. The option of using the Matlab software, version 5.1, for algorithm implementation of the three techniques was due to the simplicity of their commands, programming and debugging. Besides, this software is well known and used by the academic community, allowing the results of this work to be obtained and verified. Examples of three-dimensional vision applied to robotic assembling tasks ("pick-and-place") are presented.
Resumo:
L’objectif de cette thèse par articles est de présenter modestement quelques étapes du parcours qui mènera (on espère) à une solution générale du problème de l’intelligence artificielle. Cette thèse contient quatre articles qui présentent chacun une différente nouvelle méthode d’inférence perceptive en utilisant l’apprentissage machine et, plus particulièrement, les réseaux neuronaux profonds. Chacun de ces documents met en évidence l’utilité de sa méthode proposée dans le cadre d’une tâche de vision par ordinateur. Ces méthodes sont applicables dans un contexte plus général, et dans certains cas elles on tété appliquées ailleurs, mais ceci ne sera pas abordé dans le contexte de cette de thèse. Dans le premier article, nous présentons deux nouveaux algorithmes d’inférence variationelle pour le modèle génératif d’images appelé codage parcimonieux “spike- and-slab” (CPSS). Ces méthodes d’inférence plus rapides nous permettent d’utiliser des modèles CPSS de tailles beaucoup plus grandes qu’auparavant. Nous démontrons qu’elles sont meilleures pour extraire des détecteur de caractéristiques quand très peu d’exemples étiquetés sont disponibles pour l’entraînement. Partant d’un modèle CPSS, nous construisons ensuite une architecture profonde, la machine de Boltzmann profonde partiellement dirigée (MBP-PD). Ce modèle a été conçu de manière à simplifier d’entraînement des machines de Boltzmann profondes qui nécessitent normalement une phase de pré-entraînement glouton pour chaque couche. Ce problème est réglé dans une certaine mesure, mais le coût d’inférence dans le nouveau modèle est relativement trop élevé pour permettre de l’utiliser de manière pratique. Dans le deuxième article, nous revenons au problème d’entraînement joint de machines de Boltzmann profondes. Cette fois, au lieu de changer de famille de modèles, nous introduisons un nouveau critère d’entraînement qui donne naissance aux machines de Boltzmann profondes à multiples prédictions (MBP-MP). Les MBP-MP sont entraînables en une seule étape et ont un meilleur taux de succès en classification que les MBP classiques. Elles s’entraînent aussi avec des méthodes variationelles standard au lieu de nécessiter un classificateur discriminant pour obtenir un bon taux de succès en classification. Par contre, un des inconvénients de tels modèles est leur incapacité de générer deséchantillons, mais ceci n’est pas trop grave puisque la performance de classification des machines de Boltzmann profondes n’est plus une priorité étant donné les dernières avancées en apprentissage supervisé. Malgré cela, les MBP-MP demeurent intéressantes parce qu’elles sont capable d’accomplir certaines tâches que des modèles purement supervisés ne peuvent pas faire, telles que celle de classifier des données incomplètes ou encore celle de combler intelligemment l’information manquante dans ces données incomplètes. Le travail présenté dans cette thèse s’est déroulé au milieu d’une période de transformations importantes du domaine de l’apprentissage à réseaux neuronaux profonds qui a été déclenchée par la découverte de l’algorithme de “dropout” par Geoffrey Hinton. Dropout rend possible un entraînement purement supervisé d’architectures de propagation unidirectionnel sans être exposé au danger de sur- entraînement. Le troisième article présenté dans cette thèse introduit une nouvelle fonction d’activation spécialement con ̧cue pour aller avec l’algorithme de Dropout. Cette fonction d’activation, appelée maxout, permet l’utilisation de aggrégation multi-canal dans un contexte d’apprentissage purement supervisé. Nous démontrons comment plusieurs tâches de reconnaissance d’objets sont mieux accomplies par l’utilisation de maxout. Pour terminer, sont présentons un vrai cas d’utilisation dans l’industrie pour la transcription d’adresses de maisons à plusieurs chiffres. En combinant maxout avec une nouvelle sorte de couche de sortie pour des réseaux neuronaux de convolution, nous démontrons qu’il est possible d’atteindre un taux de succès comparable à celui des humains sur un ensemble de données coriace constitué de photos prises par les voitures de Google. Ce système a été déployé avec succès chez Google pour lire environ cent million d’adresses de maisons.
Resumo:
The application of computer vision based quality control has been slowly but steadily gaining importance mainly due to its speed in achieving results and also greatly due to its non- destnictive nature of testing. Besides, in food applications it also does not contribute to contamination. However, computer vision applications in quality control needs the application of an appropriate software for image analysis. Eventhough computer vision based quality control has several advantages, its application has limitations as to the type of work to be done, particularly so in the food industries. Selective applications, however, can be highly advantageous and very accurate.Computer vision based image analysis could be used in morphometric measurements of fish with the same accuracy as the existing conventional method. The method is non-destructive and non-contaminating thus providing anadvantage in seafood processing.The images could be stored in archives and retrieved at anytime to carry out morphometric studies for biologists.Computer vision and subsequent image analysis could be used in measurements of various food products to assess uniformity of size. One product namely cutlet and product ingredients namely coating materials such as bread crumbs and rava were selected for the study. Computer vision based image analysis was used in the measurements of length, width and area of cutlets. Also the width of coating materials like bread crumbs was measured.Computer imaging and subsequent image analysis can be very effectively used in quality evaluations of product ingredients in food processing. Measurement of width of coating materials could establish uniformity of particles or the lack of it. The application of image analysis in bacteriological work was also done
Resumo:
Relaciona la enseñanza de la literatura infantil en el aula con la era digital. Los autores describen los modos en que la literatura infantil está desarrollando nuevas dimensiones: la publicación de libros infantiles en CD ROM , los recursos de Web para trabajar con textos literarios, y niños que, cada vez más, comunican sus experiencias a través del correo electrónico y diversas formas de foros electrónicos y de chat. El texto ofrece orientaciones prácticas para profesores con poca experiencia con las TIC y muestra cómo el uso de ordenadores en la enseñanza del inglés puede mejorar y ampliar la participación de los niños. Describe y analiza la realización de actividades para ampliar los enfoques tradicionales de textos literarios y aprovechar las ventajas de la tecnología disponible.
Resumo:
This paper discusses and compares the use of vision based and non-vision based technologies in developing intelligent environments. By reviewing the related projects that use vision based techniques in intelligent environment design, the achieved functions, technical issues and drawbacks of those projects are discussed and summarized, and the potential solutions for future improvement are proposed, which leads to the prospective direction of my PhD research.