892 resultados para SIFT,Computer Vision,Python,Object Recognition,Feature Detection,Descriptor Computation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il Deep Learning ha radicalmente trasformato il mondo del Machine Learning migliorando lo stato dell'arte in diversi campi che spaziano dalla computer vision al natural language processing. Non fermandosi a problemi di classificazione, negli ultimi anni, applicazioni di tipo generativo hanno portato alla creazione di immagini realistiche e documenti letterali. Il mondo della musica non è esente da una moltitudine di esperimenti nello stesso campo, con risultati ancora acerbi ma comunque potenzialmente interessanti. In questa tesi verrà discussa l'applicazione di un di modello appartenente alla famiglia del Deep Learning per la generazione di musica simbolica.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nell’ambito della Stereo Vision, settore della Computer Vision, partendo da coppie di immagini RGB, si cerca di ricostruire la profondità della scena. La maggior parte degli algoritmi utilizzati per questo compito ipotizzano che tutte le superfici presenti nella scena siano lambertiane. Quando sono presenti superfici non lambertiane (riflettenti o trasparenti), gli algoritmi stereo esistenti sbagliano la predizione della profondità. Per risolvere questo problema, durante l’esperienza di tirocinio, si è realizzato un dataset contenente oggetti trasparenti e riflettenti che sono la base per l’allenamento della rete. Agli oggetti presenti nelle scene sono associate annotazioni 3D usate per allenare la rete. Invece, nel seguente lavoro di tesi, utilizzando l’algoritmo RAFT-Stereo [1], rete allo stato dell’arte per la stereo vision, si analizza come la rete modifica le sue prestazioni (predizione della disparità) se al suo interno viene inserito un modulo per la segmentazione semantica degli oggetti. Si introduce questo layer aggiuntivo perché, trovare la corrispondenza tra due punti appartenenti a superfici lambertiane, risulta essere molto complesso per una normale rete. Si vuole utilizzare l’informazione semantica per riconoscere questi tipi di superfici e così migliorarne la disparità. È stata scelta questa architettura neurale in quanto, durante l’esperienza di tirocinio riguardante la creazione del dataset Booster [2], è risultata la migliore su questo dataset. L’obiettivo ultimo di questo lavoro è vedere se il riconoscimento di superfici non lambertiane, da parte del modulo semantico, influenza la predizione della disparità migliorandola. Nell’ambito della stereo vision, gli elementi riflettenti e trasparenti risultano estremamente complessi da analizzare, ma restano tuttora oggetto di studio dati gli svariati settori di applicazione come la guida autonoma e la robotica.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gaze estimation has gained interest in recent years for being an important cue to obtain information about the internal cognitive state of humans. Regardless of whether it is the 3D gaze vector or the point of gaze (PoG), gaze estimation has been applied in various fields, such as: human robot interaction, augmented reality, medicine, aviation and automotive. In the latter field, as part of Advanced Driver-Assistance Systems (ADAS), it allows the development of cutting-edge systems capable of mitigating road accidents by monitoring driver distraction. Gaze estimation can be also used to enhance the driving experience, for instance, autonomous driving. It also can improve comfort with augmented reality components capable of being commanded by the driver's eyes. Although, several high-performance real-time inference works already exist, just a few are capable of working with only a RGB camera on computationally constrained devices, such as a microcontroller. This work aims to develop a low-cost, efficient and high-performance embedded system capable of estimating the driver's gaze using deep learning and a RGB camera. The proposed system has achieved near-SOTA performances with about 90% less memory footprint. The capabilities to generalize in unseen environments have been evaluated through a live demonstration, where high performance and near real-time inference were obtained using a webcam and a Raspberry Pi4.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to estimate depth through supervised deep learning-based stereo methods, it is necessary to have access to precise ground truth depth data. While the gathering of precise labels is commonly tackled by deploying depth sensors, this is not always a viable solution. For instance, in many applications in the biomedical domain, the choice of sensors capable of sensing depth at small distances with high precision on difficult surfaces (that present non-Lambertian properties) is very limited. It is therefore necessary to find alternative techniques to gather ground truth data without having to rely on external sensors. In this thesis, two different approaches have been tested to produce supervision data for biomedical images. The first aims to obtain input stereo image pairs and disparities through simulation in a virtual environment, while the second relies on a non-learned disparity estimation algorithm in order to produce noisy disparities, which are then filtered by means of hand-crafted confidence measures to create noisy labels for a subset of pixels. Among the two, the second approach, which is referred in literature as proxy-labeling, has shown the best results and has even outperformed the non-learned disparity estimation algorithm used for supervision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il mondo della moda è in continua e costante evoluzione, non solo dal punto di vista sociale, ma anche da quello tecnologico. Nel corso del presente elaborato si è studiata la possibilità di riconoscere e segmentare abiti presenti in una immagine utilizzando reti neurali profonde e approcci moderni. Sono state, quindi, analizzate reti quali FasterRCNN, MaskRCNN, YOLOv5, FashionPedia e Match-RCNN. In seguito si è approfondito l’addestramento delle reti neurali profonde in scenari di alta parallelizzazione e su macchine dotate di molteplici GPU al fine di ridurre i tempi di addestramento. Inoltre si è sperimentata la possibilità di creare una rete per prevedere se un determinato abito possa avere successo in futuro analizzando semplicemente dati passati e una immagine del vestito in questione. Necessaria per tali compiti è stata, inoltre, una approfondita analisi dei dataset esistenti nel mondo della moda e dei metodi per utilizzarli per l’addestramento. Il presente elaborato è stato svolto nell’ambito del progetto FA.RE.TRA. per il quale l'Università di Bologna svolge un compito di consulenza per lo studio di fattibilità su reti neurali in grado di svolgere i compiti menzionati.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Depth estimation from images has long been regarded as a preferable alternative compared to expensive and intrusive active sensors, such as LiDAR and ToF. The topic has attracted the attention of an increasingly wide audience thanks to the great amount of application domains, such as autonomous driving, robotic navigation and 3D reconstruction. Among the various techniques employed for depth estimation, stereo matching is one of the most widespread, owing to its robustness, speed and simplicity in setup. Recent developments has been aided by the abundance of annotated stereo images, which granted to deep learning the opportunity to thrive in a research area where deep networks can reach state-of-the-art sub-pixel precision in most cases. Despite the recent findings, stereo matching still begets many open challenges, two among them being finding pixel correspondences in presence of objects that exhibits a non-Lambertian behaviour and processing high-resolution images. Recently, a novel dataset named Booster, which contains high-resolution stereo pairs featuring a large collection of labeled non-Lambertian objects, has been released. The work shown that training state-of-the-art deep neural network on such data improves the generalization capabilities of these networks also in presence of non-Lambertian surfaces. Regardless being a further step to tackle the aforementioned challenge, Booster includes a rather small number of annotated images, and thus cannot satisfy the intensive training requirements of deep learning. This thesis work aims to investigate novel view synthesis techniques to augment the Booster dataset, with ultimate goal of improving stereo matching reliability in presence of high-resolution images that displays non-Lambertian surfaces.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il machine learning negli ultimi anni ha acquisito una crescente popolarità nell’ambito della ricerca scientifica e delle sue applicazioni. Lo scopo di questa tesi è stato quello di studiare il machine learning nei suoi aspetti generali e applicarlo a problemi di computer vision. La tesi ha affrontato le difficoltà del dover spiegare dal punto di vista teorico gli algoritmi alla base delle reti neurali convoluzionali e ha successivamente trattato due problemi concreti di riconoscimento immagini: il dataset MNIST (immagini di cifre scritte a mano) e un dataset che sarà chiamato ”MELANOMA dataset” (immagini di melanomi e nevi sani). Utilizzando le tecniche spiegate nella sezione teorica si sono riusciti ad ottenere risultati soddifacenti per entrambi i dataset ottenendo una precisione del 98% per il MNIST e del 76.8% per il MELANOMA dataset

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Neural scene representation and neural rendering are new computer vision techniques that enable the reconstruction and implicit representation of real 3D scenes from a set of 2D captured images, by fitting a deep neural network. The trained network can then be used to render novel views of the scene. A recent work in this field, Neural Radiance Fields (NeRF), presented a state-of-the-art approach, which uses a simple Multilayer Perceptron (MLP) to generate photo-realistic RGB images of a scene from arbitrary viewpoints. However, NeRF does not model any light interaction with the fitted scene; therefore, despite producing compelling results for the view synthesis task, it does not provide a solution for relighting. In this work, we propose a new architecture to enable relighting capabilities in NeRF-based representations and we introduce a new real-world dataset to train and evaluate such a model. Our method demonstrates the ability to perform realistic rendering of novel views under arbitrary lighting conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, some activities, such as subscribing an insurance policy or opening a bank account, are possible by navigating through a web page or a downloadable application. Since the user is often “hidden” behind a monitor or a smartphone, it is necessary a solution able to guarantee about their identity. Companies are often requiring the submission of a “proof-of-identity”, which usually consists in a picture of an identity document of the user, together with a picture or a brief video of themselves. This work describes a system whose purpose is the automation of these kinds of verifications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Artificial Intelligence (AI) has substantially influenced numerous disciplines in recent years. Biology, chemistry, and bioinformatics are among them, with significant advances in protein structure prediction, paratope prediction, protein-protein interactions (PPIs), and antibody-antigen interactions. Understanding PPIs is critical since they are responsible for practically everything living and have several uses in vaccines, cancer, immunology, and inflammatory illnesses. Machine Learning (ML) offers enormous potential for effectively simulating antibody-antigen interactions and improving in-silico optimization of therapeutic antibodies for desired features, including binding activity, stability, and low immunogenicity. This research looks at the use of AI algorithms to better understand antibody-antigen interactions, and it further expands and explains several difficulties encountered in the field. Furthermore, we contribute by presenting a method that outperforms existing state-of-the-art strategies in paratope prediction from sequence data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Miniaturized flying robotic platforms, called nano-drones, have the potential to revolutionize the autonomous robots industry sector thanks to their very small form factor. The nano-drones’ limited payload only allows for a sub-100mW microcontroller unit for the on-board computations. Therefore, traditional computer vision and control algorithms are too computationally expensive to be executed on board these palm-sized robots, and we are forced to rely on artificial intelligence to trade off accuracy in favor of lightweight pipelines for autonomous tasks. However, relying on deep learning exposes us to the problem of generalization since the deployment scenario of a convolutional neural network (CNN) is often composed by different visual cues and different features from those learned during training, leading to poor inference performances. Our objective is to develop and deploy and adaptation algorithm, based on the concept of latent replays, that would allow us to fine-tune a CNN to work in new and diverse deployment scenarios. To do so we start from an existing model for visual human pose estimation, called PULPFrontnet, which is used to identify the pose of a human subject in space through its 4 output variables, and we present the design of our novel adaptation algorithm, which features automatic data gathering and labeling and on-device deployment. We therefore showcase the ability of our algorithm to adapt PULP-Frontnet to new deployment scenarios, improving the R2 scores of the four network outputs, with respect to an unknown environment, from approximately [−0.2, 0.4, 0.0,−0.7] to [0.25, 0.45, 0.2, 0.1]. Finally we demonstrate how it is possible to fine-tune our neural network in real time (i.e., under 76 seconds), using the target parallel ultra-low power GAP 8 System-on-Chip on board the nano-drone, and we show how all adaptation operations can take place using less than 2mWh of energy, a small fraction of the available battery power.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This undergraduate thesis aims formally define aspects of Quantum Turing Machine using as a basis quantum finite automata. We introduce the basic concepts of quantum mechanics and quantum computing through principles such as superposition, entanglement of quantum states, quantum bits and algorithms. We demonstrate the Bell's teleportation theorem, enunciated in the form of Deutsch-Jozsa definition for quantum algorithms. The way as the overall text were written omits formal aspects of quantum mechanics, encouraging computer scientists to understand the framework of quantum computation. We conclude our thesis by listing the Quantum Turing Machine's main limitations regarding the well-known Classical Turing Machines

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This undergraduate thesis aims formally define aspects of Quantum Turing Machine using as a basis quantum finite automata. We introduce the basic concepts of quantum mechanics and quantum computing through principles such as superposition, entanglement of quantum states, quantum bits and algorithms. We demonstrate the Bell's teleportation theorem, enunciated in the form of Deutsch-Jozsa definition for quantum algorithms. The way as the overall text were written omits formal aspects of quantum mechanics, encouraging computer scientists to understand the framework of quantum computation. We conclude our thesis by listing the Quantum Turing Machine's main limitations regarding the well-known Classical Turing Machines

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Postprint