920 resultados para Computer vision teaching
Resumo:
This paper investigated using lip movements as a behavioural biometric for person authentication. The system was trained, evaluated and tested using the XM2VTS dataset, following the Lausanne Protocol configuration II. Features were selected from the DCT coefficients of the greyscale lip image. This paper investigated the number of DCT coefficients selected, the selection process, and static and dynamic feature combinations. Using a Gaussian Mixture Model - Universal Background Model framework an Equal Error Rate of 2.20% was achieved during evaluation and on an unseen test set a False Acceptance Rate of 1.7% and False Rejection Rate of 3.0% was achieved. This compares favourably with face authentication results on the same dataset whilst not being susceptible to spoofing attacks.
Resumo:
Despite pattern recognition methods for human behavioral analysis has flourished in the last decade, animal behavioral analysis has been almost neglected. Those few approaches are mostly focused on preserving livestock economic value while attention on the welfare of companion animals, like dogs, is now emerging as a social need. In this work, following the analogy with human behavior recognition, we propose a system for recognizing body parts of dogs kept in pens. We decide to adopt both 2D and 3D features in order to obtain a rich description of the dog model. Images are acquired using the Microsoft Kinect to capture the depth map images of the dog. Upon depth maps a Structural Support Vector Machine (SSVM) is employed to identify the body parts using both 3D features and 2D images. The proposal relies on a kernelized discriminative structural classificator specifically tailored for dogs independently from the size and breed. The classification is performed in an online fashion using the LaRank optimization technique to obtaining real time performances. Promising results have emerged during the experimental evaluation carried out at a dog shelter, managed by IZSAM, in Teramo, Italy.
Resumo:
This paper presents a method for rational behaviour recognition that combines vision-based pose estimation with knowledge modeling and reasoning. The proposed method consists of two stages. First, RGB-D images are used in the estimation of the body postures. Then, estimated actions are evaluated to verify that they make sense. This method requires rational behaviour to be exhibited. To comply with this requirement, this work proposes a rational RGB-D dataset with two types of sequences, some for training and some for testing. Preliminary results show the addition of knowledge modeling and reasoning leads to a significant increase of recognition accuracy when compared to a system based only on computer vision.
Resumo:
In this paper we propose a novel recurrent neural networkarchitecture for video-based person re-identification.Given the video sequence of a person, features are extracted from each frame using a convolutional neural network that incorporates a recurrent final layer, which allows information to flow between time-steps. The features from all time steps are then combined using temporal pooling to give an overall appearance feature for the complete sequence. The convolutional network, recurrent layer, and temporal pooling layer, are jointly trained to act as a feature extractor for video-based re-identification using a Siamese network architecture.Our approach makes use of colour and optical flow information in order to capture appearance and motion information which is useful for video re-identification. Experiments are conduced on the iLIDS-VID and PRID-2011 datasets to show that this approach outperforms existing methods of video-based re-identification.
https://github.com/niallmcl/Recurrent-Convolutional-Video-ReID
Project Source Code
Resumo:
Abstract
Publicly available, outdoor webcams continuously view the world and share images. These cameras include traffic cams, campus cams, ski-resort cams, etc. The Archive of Many Outdoor Scenes (AMOS) is a project aiming to geolocate, annotate, archive, and visualize these cameras and images to serve as a resource for a wide variety of scientific applications. The AMOS dataset has archived over 750 million images of outdoor environments from 27,000 webcams since 2006. Our goal is to utilize the AMOS image dataset and crowdsourcing to develop reliable and valid tools to improve physical activity assessment via online, outdoor webcam capture of global physical activity patterns and urban built environment characteristics.
This project’s grand scale-up of capturing physical activity patterns and built environments is a methodological step forward in advancing a real-time, non-labor intensive assessment using webcams, crowdsourcing, and eventually machine learning. The combined use of webcams capturing outdoor scenes every 30 min and crowdsources providing the labor of annotating the scenes allows for accelerated public health surveillance related to physical activity across numerous built environments. The ultimate goal of this public health and computer vision collaboration is to develop machine learning algorithms that will automatically identify and calculate physical activity patterns.
Resumo:
In this paper we present an improved scheme for line and edge detection in cortical area V1, based on responses of simple and complex cells, truly multi-scale with no free parameters. We illustrate the multi-scale representation for visual reconstruction, and show how object segregation can be achieved with coarse-to-finescale groupings. A two-level object categorization scenario is tested in which pre-categorization is based on coarse scales only, and final categorization on coarse plus fine scales. Processing schemes are discussed in the framework of a complete cortical architecture.
Resumo:
In this paper we present a brief overview of the processing in the primary visual cortex, the multi-scale line/edge and keypoint representations, and a model of brightness perception. This model, which is being extended from 1D to 2D, is based on a symbolic line and edge interpretation: lines are represented by scaled Gaussians and edges by scaled, Gaussian-windowed error functions. We show that this model, in combination with standard techniques from graphics, provides a very fertile basis for non-photorealistic image rendering.
Resumo:
Computer vision for realtime applications requires tremendous computational power because all images must be processed from the first to the last pixel. Ac tive vision by probing specific objects on the basis of already acquired context may lead to a significant reduction of processing. This idea is based on a few concepts from our visual cortex (Rensink, Visual Cogn. 7, 17-42, 2000): (1) our physical surround can be seen as memory, i.e. there is no need to construct detailed and complete maps, (2) the bandwidth of the what and where systems is limited, i.e. only one object can be probed at any time, and (3) bottom-up, low-level feature extraction is complemented by top-down hypothesis testing, i.e. there is a rapid convergence of activities in dendritic/axonal connections.
Resumo:
Object recognition requires that templates with canonical views are stored in memory. Such templates must somehow be normalised. In this paper we present a novel method for obtaining 2D translation, rotation and size invariance. Cortical simple, complex and end-stopped cells provide multi-scale maps of lines, edges and keypoints. These maps are combined such that objects are characterised. Dynamic routing in neighbouring neural layers allows feature maps of input objects and stored templates to converge. We illustrate the construction of group templates and the invariance method for object categorisation and recognition in the context of a cortical architecture, which can be applied in computer vision.
Resumo:
The goal of the project "SmartVision: active vision for the blind" is to develop a small and portable but intelligent and reliable system for assisting the blind and visually impaired while navigating autonomously, both outdoor and indoor. In this paper we present an overview of the prototype, design issues, and its different modules which integrate a GIS with GPS, Wi-Fi, RFID tags and computer vision. The prototype addresses global navigation by following known landmarks, local navigation with path tracking and obstacle avoidance, and object recognition. The system does not replace the white cane, but extends it beyond its reach. The user-friendly interface consists of a 4-button hand-held box, a vibration actuator in the handle of the cane, and speech synthesis. A future version may also employ active RFID tags for marking navigation landmarks, and speech recognition may complement speech synthesis.
Resumo:
Increasingly more applications in computer vision employ interest points. Algorithms like SIFT and SURF are all based on partial derivatives of images smoothed with Gaussian filter kemels. These algorithrns are fast and therefore very popular.
Resumo:
This Database was generated during the development of a computer vision-based system for safety purposes in nuclear plants. The system aims at detecting and tracking people within a nuclear plant. Further details may be found in the related thesis. The research was developed through a cooperation between the Graduate Electrical Engineering Program of Federal University of Rio de Janeiro (PEE/COPPE, UFRJ) and the Nuclear Engineering Institute of National Commission of Nuclear Energy (IEN, CNEN). The experimental part of this research was carried out in Argonauta, a nuclear research reactor belonging to IEN. The Database is made available in the sequel. All the videos are already rectified. The Projection and Homography matrices are given in the end, for both cameras. Please, acknowledge the use of this Database in any publication.
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Electrotécnica Ramo Automação e Electrónica Industrial
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Electrotécnica Ramo de Automação e Electrónica Industrial
Resumo:
Nos últimos anos, o fácil acesso em termos de custos, ferramentas de produção, edição e distribuição de conteúdos audiovisuais, contribuíram para o aumento exponencial da produção diária deste tipo de conteúdos. Neste paradigma de superabundância de conteúdos multimédia existe uma grande percentagem de sequências de vídeo que contém material explícito, sendo necessário existir um controlo mais rigoroso, de modo a não ser facilmente acessível a menores. O conceito de conteúdo explícito pode ser caraterizado de diferentes formas, tendo o trabalho descrito neste documento incidido sobre a deteção automática de nudez feminina presente em sequências de vídeo. Este processo de deteção e classificação automática de material para adultos pode constituir uma ferramenta importante na gestão de um canal de televisão. Diariamente podem ser recebidas centenas de horas de material sendo impraticável a implementação de um processo manual de controlo de qualidade. A solução criada no contexto desta dissertação foi estudada e desenvolvida em torno de um produto especifico ligado à área do broadcasting. Este produto é o mxfSPEEDRAIL F1000, sendo este uma solução da empresa MOG Technologies. O objetivo principal do projeto é o desenvolvimento de uma biblioteca em C++, acessível durante o processo de ingest, que permita, através de uma análise baseada em funcionalidades de visão computacional, detetar e sinalizar na metadata do sinal, quais as frames que potencialmente apresentam conteúdo explícito. A solução desenvolvida utiliza um conjunto de técnicas do estado da arte adaptadas ao problema a tratar. Nestas incluem-se algoritmos para realizar a segmentação de pele e deteção de objetos em imagens. Por fim é efetuada uma análise critica à solução desenvolvida no âmbito desta dissertação de modo a que em futuros desenvolvimentos esta seja melhorada a nível do consumo de recursos durante a análise e a nível da sua taxa de sucesso.