903 resultados para Computer vision industry


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time. This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problem and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class­ based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state­ of ­the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state­ of­ the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Presentaciones de la asignatura Interfaces para Entornos Inteligentes del Máster en Tecnologías de la Informática/Machine Learning and Data Mining.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

International audience

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A computer vision system that has to interact in natural language needs to understand the visual appearance of interactions between objects along with the appearance of objects themselves. Relationships between objects are frequently mentioned in queries of tasks like semantic image retrieval, image captioning, visual question answering and natural language object detection. Hence, it is essential to model context between objects for solving these tasks. In the first part of this thesis, we present a technique for detecting an object mentioned in a natural language query. Specifically, we work with referring expressions which are sentences that identify a particular object instance in an image. In many referring expressions, an object is described in relation to another object using prepositions, comparative adjectives, action verbs etc. Our proposed technique can identify both the referred object and the context object mentioned in such expressions. Context is also useful for incrementally understanding scenes and videos. In the second part of this thesis, we propose techniques for searching for objects in an image and events in a video. Our proposed incremental algorithms use the context from previously explored regions to prioritize the regions to explore next. The advantage of incremental understanding is restricting the amount of computation time and/or resources spent for various detection tasks. Our first proposed technique shows how to learn context in indoor scenes in an implicit manner and use it for searching for objects. The second technique shows how explicitly written context rules of one-on-one basketball can be used to sequentially detect events in a game.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

International audience

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Object recognition has long been a core problem in computer vision. To improve object spatial support and speed up object localization for object recognition, generating high-quality category-independent object proposals as the input for object recognition system has drawn attention recently. Given an image, we generate a limited number of high-quality and category-independent object proposals in advance and used as inputs for many computer vision tasks. We present an efficient dictionary-based model for image classification task. We further extend the work to a discriminative dictionary learning method for tensor sparse coding. In the first part, a multi-scale greedy-based object proposal generation approach is presented. Based on the multi-scale nature of objects in images, our approach is built on top of a hierarchical segmentation. We first identify the representative and diverse exemplar clusters within each scale. Object proposals are obtained by selecting a subset from the multi-scale segment pool via maximizing a submodular objective function, which consists of a weighted coverage term, a single-scale diversity term and a multi-scale reward term. The weighted coverage term forces the selected set of object proposals to be representative and compact; the single-scale diversity term encourages choosing segments from different exemplar clusters so that they will cover as many object patterns as possible; the multi-scale reward term encourages the selected proposals to be discriminative and selected from multiple layers generated by the hierarchical image segmentation. The experimental results on the Berkeley Segmentation Dataset and PASCAL VOC2012 segmentation dataset demonstrate the accuracy and efficiency of our object proposal model. Additionally, we validate our object proposals in simultaneous segmentation and detection and outperform the state-of-art performance. To classify the object in the image, we design a discriminative, structural low-rank framework for image classification. We use a supervised learning method to construct a discriminative and reconstructive dictionary. By introducing an ideal regularization term, we perform low-rank matrix recovery for contaminated training data from all categories simultaneously without losing structural information. A discriminative low-rank representation for images with respect to the constructed dictionary is obtained. With semantic structure information and strong identification capability, this representation is good for classification tasks even using a simple linear multi-classifier.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Esta tesis versa sobre el an álisis de la forma de objetos 2D. En visión articial existen numerosos aspectos de los que se pueden extraer información. Uno de los más usados es la forma o el contorno de esos objetos. Esta característica visual de los objetos nos permite, mediante el procesamiento adecuado, extraer información de los objetos, analizar escenas, etc. No obstante el contorno o silueta de los objetos contiene información redundante. Este exceso de datos que no aporta nuevo conocimiento debe ser eliminado, con el objeto de agilizar el procesamiento posterior o de minimizar el tamaño de la representación de ese contorno, para su almacenamiento o transmisión. Esta reducción de datos debe realizarse sin que se produzca una pérdida de información importante para representación del contorno original. Se puede obtener una versión reducida de un contorno eliminando puntos intermedios y uniendo los puntos restantes mediante segmentos. Esta representación reducida de un contorno se conoce como aproximación poligonal. Estas aproximaciones poligonales de contornos representan, por tanto, una versión comprimida de la información original. El principal uso de las mismas es la reducción del volumen de información necesario para representar el contorno de un objeto. No obstante, en los últimos años estas aproximaciones han sido usadas para el reconocimiento de objetos. Para ello los algoritmos de aproximaci ón poligonal se han usado directamente para la extracci ón de los vectores de caracter ísticas empleados en la fase de aprendizaje. Las contribuciones realizadas por tanto en esta tesis se han centrado en diversos aspectos de las aproximaciones poligonales. En la primera contribución se han mejorado varios algoritmos de aproximaciones poligonales, mediante el uso de una fase de preprocesado que acelera estos algoritmos permitiendo incluso mejorar la calidad de las soluciones en un menor tiempo. En la segunda contribución se ha propuesto un nuevo algoritmo de aproximaciones poligonales que obtiene soluciones optimas en un menor espacio de tiempo que el resto de métodos que aparecen en la literatura. En la tercera contribución se ha propuesto un algoritmo de aproximaciones que es capaz de obtener la solución óptima en pocas iteraciones en la mayor parte de los casos. Por último, se ha propuesto una versi ón mejorada del algoritmo óptimo para obtener aproximaciones poligonales que soluciona otro problema de optimización alternativo.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

(Deep) neural networks are increasingly being used for various computer vision and pattern recognition tasks due to their strong ability to learn highly discriminative features. However, quantitative analysis of their classication ability and design philosophies are still nebulous. In this work, we use information theory to analyze the concatenated restricted Boltzmann machines (RBMs) and propose a mutual information-based RBM neural networks (MI-RBM). We develop a novel pretraining algorithm to maximize the mutual information between RBMs. Extensive experimental results on various classication tasks show the eectiveness of the proposed approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação de Mestrado, Engenharia Informática, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2014

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Electrical Bus Rapid Transit (eBRT) is a charging electrical public transport which brings a clean, high performance, and affordable cost alternative from the conventional traffic vehicles which work with combustion and hybrid technology. These buses charge the battery in every bus stop to arrive at the next station. But, this charging system needs an appropriate infrastructure called pantograph, and it requires a high precision bus location to maintain battery lifetime, energy saving and charging time. To overcome this issue Vicomtech and Datik has planned a project based on computer vision to help to the driver to locate the vehicle in the correct place. In this document, we present a mono camera bus driver guided fast algorithm because these vehicles embedded computers do not support high computation and precision operations. In addition to the frequent lane sign, there are more accurate geometric beacons painted on the road to bring metric information to the vision system. This method uses segmentation to binarize the image discriminating the background space. Besides it detects, tracks and counts different lane mark contours in addition to classify each special painted mark. Besides it does not need any calibration task to calculate longitudinal and cross distances because we know the lane mark sizes.