878 resultados para Depth Estimation,Deep Learning,Disparity Estimation,Computer Vision,Stereo Vision


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acquiring 3D shape from images is a classic problem in Computer Vision occupying researchers for at least 20 years. Only recently however have these ideas matured enough to provide highly accurate results. We present a complete algorithm to reconstruct 3D objects from images using the stereo correspondence cue. The technique can be described as a pipeline of four basic building blocks: camera calibration, image segmentation, photo-consistency estimation from images, and surface extraction from photo-consistency. In this Chapter we will put more emphasis on the latter two: namely how to extract geometric information from a set of photographs without explicit camera visibility, and how to combine different geometry estimates in an optimal way. © 2010 Springer-Verlag Berlin Heidelberg.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work presents the design of a real-time system to model visual objects with the use of self-organising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this work is to demonstrate and to assess a simple algorithm for automatic estimation of the most salient region in an image, that have possible application in computer vision. The algorithm uses the connection between color dissimilarities in the image and the image’s most salient region. The algorithm also avoids using image priors. Pixel dissimilarity is an informal function of the distance of a specific pixel’s color to other pixels’ colors in an image. We examine the relation between pixel color dissimilarity and salient region detection on the MSRA1K image dataset. We propose a simple algorithm for salient region detection through random pixel color dissimilarity. We define dissimilarity by accumulating the distance between each pixel and a sample of n other random pixels, in the CIELAB color space. An important result is that random dissimilarity between each pixel and just another pixel (n = 1) is enough to create adequate saliency maps when combined with median filter, with competitive average performance if compared with other related methods in the saliency detection research field. The assessment was performed by means of precision-recall curves. This idea is inspired on the human attention mechanism that is able to choose few specific regions to focus on, a biological system that the computer vision community aims to emulate. We also review some of the history on this topic of selective attention.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To date, adult educational research has had a limited focus on lesbian, gay, bisexual and transgendered (LGBT) adults and the learning processes in which they engage across the life course. Adopting a biographical and life history methodology, this study aimed to critically explore the potentially distinctive nature and impact of how, when and where LGBT adults learn to construct their identities over their lives. In-depth, semi-structured interviews, dialogue and discussion with LGBT individuals and groups provided rich narratives that reflect shifting, diverse and multiple ways of identifying and living as LGBT. Participants engage in learning in unique ways that play a significant role in the construction and expression of such identities, that in turn influence how, when and where learning happens. Framed largely by complex heteronormative forces, learning can have a negative, distortive impact that deeply troubles any balanced, positive sense of being LGBT, leading to self- censoring, alienation and in some cases, hopelessness. However, learning is also more positively experiential, critically reflective, inventive and queer in nature. This can transform how participants understand their sexual identities and the lifewide spaces in which they learn, engendering agency and resilience. Intersectional perspectives reveal learning that participants struggle with, but can reconcile the disjuncture between evolving LGBT and other myriad identities as parents, Christians, teachers, nurses, academics, activists and retirees. The study’s main contributions lie in three areas. A focus on LGBT experience can contribute to the creation of new opportunities to develop intergenerational learning processes. The study also extends the possibilities for greater criticality in older adult education theory, research and practice, based on the continued, rich learning in which participants engage post-work and in later life. Combined with this, there is scope to further explore the nature of ‘life-deep learning’ for other societal groups, brought by combined religious, moral, ideological and social learning that guides action, beliefs, values, and expression of identity. The LGBT adults in this study demonstrate engagement in distinct forms of life-deep learning to navigate social and moral opprobrium. From this they gain hope, self-respect, empathy with others, and deeper self-knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class­ based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state­ of ­the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state­ of­ the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

(Deep) neural networks are increasingly being used for various computer vision and pattern recognition tasks due to their strong ability to learn highly discriminative features. However, quantitative analysis of their classication ability and design philosophies are still nebulous. In this work, we use information theory to analyze the concatenated restricted Boltzmann machines (RBMs) and propose a mutual information-based RBM neural networks (MI-RBM). We develop a novel pretraining algorithm to maximize the mutual information between RBMs. Extensive experimental results on various classication tasks show the eectiveness of the proposed approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Negli ultimi due anni, per via della pandemia generata dal virus Covid19, la vita in ogni angolo del nostro pianeta è drasticamente cambiata. Ad oggi, nel mondo, sono oltre duecentoventi milioni le persone che hanno contratto questo virus e sono quasi cinque milioni le persone decedute. In alcuni periodi si è arrivati ad avere anche un milione di nuovi contagiati al giorno e mediamente, negli ultimi sei mesi, questo dato è stato di più di mezzo milione al giorno. Gli ospedali, soprattutto nei paesi meno sviluppati, hanno subito un grande stress e molte volte hanno avuto una carenza di risorse per fronteggiare questa grave pandemia. Per questo motivo ogni ricerca in questo campo diventa estremamente importante, soprattutto quelle che, con l'ausilio dell'intelligenza artificiale, riescono a dare supporto ai medici. Queste tecnologie una volta sviluppate e approvate possono essere diffuse a costi molto bassi e accessibili a tutti. In questo elaborato sono stati sperimentati e valutati due diversi approcci alla diagnosi del Covid-19 a partire dalle radiografie toraciche dei pazienti: il primo metodo si basa sul transfer learning di una rete convoluzionale inizialmente pensata per la classificazione di immagini. Il secondo approccio utilizza i Vision Transformer (ViT), un'architettura ampiamente diffusa nel campo del Natural Language Processing adattata ai task di Visione Artificiale. La prima soluzione ha ottenuto un’accuratezza di 0.85 mentre la seconda di 0.92, questi risultati, soprattutto il secondo, sono molto incoraggianti soprattutto vista la minima quantità di dati di training necessaria.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The inferior alveolar nerve (IAN) lies within the mandibular canal, named inferior alveolar canal in literature. The detection of this nerve is important during maxillofacial surgeries or for creating dental implants. The poor quality of cone-beam computed tomography (CBCT) and computed tomography (CT) scans and/or bone gaps within the mandible increase the difficulty of this task, posing a challenge to human experts who are going to manually detect it and resulting in a time-consuming task.Therefore this thesis investigates two methods to automatically detect the IAN: a non-data driven technique and a deep-learning method. The latter tracks the IAN position at each frame leveraging detections obtained with the deep neural network CenterNet, fined-tuned for our task, and temporal and spatial information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decades, Artificial Intelligence has witnessed multiple breakthroughs in deep learning. In particular, purely data-driven approaches have opened to a wide variety of successful applications due to the large availability of data. Nonetheless, the integration of prior knowledge is still required to compensate for specific issues like lack of generalization from limited data, fairness, robustness, and biases. In this thesis, we analyze the methodology of integrating knowledge into deep learning models in the field of Natural Language Processing (NLP). We start by remarking on the importance of knowledge integration. We highlight the possible shortcomings of these approaches and investigate the implications of integrating unstructured textual knowledge. We introduce Unstructured Knowledge Integration (UKI) as the process of integrating unstructured knowledge into machine learning models. We discuss UKI in the field of NLP, where knowledge is represented in a natural language format. We identify UKI as a complex process comprised of multiple sub-processes, different knowledge types, and knowledge integration properties to guarantee. We remark on the challenges of integrating unstructured textual knowledge and bridge connections with well-known research areas in NLP. We provide a unified vision of structured knowledge extraction (KE) and UKI by identifying KE as a sub-process of UKI. We investigate some challenging scenarios where structured knowledge is not a feasible prior assumption and formulate each task from the point of view of UKI. We adopt simple yet effective neural architectures and discuss the challenges of such an approach. Finally, we identify KE as a form of symbolic representation. From this perspective, we remark on the need of defining sophisticated UKI processes to verify the validity of knowledge integration. To this end, we foresee frameworks capable of combining symbolic and sub-symbolic representations for learning as a solution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of ancient, undeciphered scripts presents unique challenges, that depend both on the nature of the problem and on the peculiarities of each writing system. In this thesis, I present two computational approaches that are tailored to two different tasks and writing systems. The first of these methods is aimed at the decipherment of the Linear A afraction signs, in order to discover their numerical values. This is achieved with a combination of constraint programming, ad-hoc metrics and paleographic considerations. The second main contribution of this thesis regards the creation of an unsupervised deep learning model which uses drawings of signs from ancient writing system to learn to distinguish different graphemes in the vector space. This system, which is based on techniques used in the field of computer vision, is adapted to the study of ancient writing systems by incorporating information about sequences in the model, mirroring what is often done in natural language processing. In order to develop this model, the Cypriot Greek Syllabary is used as a target, since this is a deciphered writing system. Finally, this unsupervised model is adapted to the undeciphered Cypro-Minoan and it is used to answer open questions about this script. In particular, by reconstructing multiple allographs that are not agreed upon by paleographers, it supports the idea that Cypro-Minoan is a single script and not a collection of three script like it was proposed in the literature. These results on two different tasks shows that computational methods can be applied to undeciphered scripts, despite the relatively low amount of available data, paving the way for further advancement in paleography using these methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il Deep Learning ha radicalmente trasformato il mondo del Machine Learning migliorando lo stato dell'arte in diversi campi che spaziano dalla computer vision al natural language processing. Non fermandosi a problemi di classificazione, negli ultimi anni, applicazioni di tipo generativo hanno portato alla creazione di immagini realistiche e documenti letterali. Il mondo della musica non è esente da una moltitudine di esperimenti nello stesso campo, con risultati ancora acerbi ma comunque potenzialmente interessanti. In questa tesi verrà discussa l'applicazione di un di modello appartenente alla famiglia del Deep Learning per la generazione di musica simbolica.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Correctness of information gathered in production environments is an essential part of quality assurance processes in many industries, this task is often performed by human resources who visually take annotations in various steps of the production flow. Depending on the performed task the correlation between where exactly the information is gathered and what it represents is more than often lost in the process. The lack of labeled data places a great boundary on the application of deep neural networks aimed at object detection tasks, moreover supervised training of deep models requires a great amount of data to be available. Reaching an adequate large collection of labeled images through classic techniques of data annotations is an exhausting and costly task to perform, not always suitable for every scenario. A possible solution is to generate synthetic data that replicates the real one and use it to fine-tune a deep neural network trained on one or more source domains to a different target domain. The purpose of this thesis is to show a real case scenario where the provided data were both in great scarcity and missing the required annotations. Sequentially a possible approach is presented where synthetic data has been generated to address those issues while standing as a training base of deep neural networks for object detection, capable of working on images taken in production-like environments. Lastly, it compares performance on different types of synthetic data and convolutional neural networks used as backbones for the model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Photoplethysmography (PPG) sensors allow for noninvasive and comfortable heart-rate (HR) monitoring, suitable for compact wearable devices. However, PPG signals collected from such devices often suffer from corruption caused by motion artifacts. This is typically addressed by combining the PPG signal with acceleration measurements from an inertial sensor. Recently, different energy-efficient deep learning approaches for heart rate estimation have been proposed. To test these new solutions, in this work, we developed a highly wearable platform (42mm x 48 mm x 1.2mm) for PPG signal acquisition and processing, based on GAP9, a parallel ultra low power system-on-chip featuring nine cores RISC-V compute cluster with neural network accelerator and 1 core RISC-V controller. The hardware platform also integrates a commercial complete Optical Biosensing Module and an ARM-Cortex M4 microcontroller unit (MCU) with Bluetooth low-energy connectivity. To demonstrate the capabilities of the system, a deep learning-based approach for PPG-based HR estimation has been deployed. Thanks to the reduced power consumption of the digital computational platform, the total power budget is just 2.67 mW providing up to 5 days of operation (105 mAh battery).