917 resultados para COMPUTER-VISION
Resumo:
Depth estimation from images has long been regarded as a preferable alternative compared to expensive and intrusive active sensors, such as LiDAR and ToF. The topic has attracted the attention of an increasingly wide audience thanks to the great amount of application domains, such as autonomous driving, robotic navigation and 3D reconstruction. Among the various techniques employed for depth estimation, stereo matching is one of the most widespread, owing to its robustness, speed and simplicity in setup. Recent developments has been aided by the abundance of annotated stereo images, which granted to deep learning the opportunity to thrive in a research area where deep networks can reach state-of-the-art sub-pixel precision in most cases. Despite the recent findings, stereo matching still begets many open challenges, two among them being finding pixel correspondences in presence of objects that exhibits a non-Lambertian behaviour and processing high-resolution images. Recently, a novel dataset named Booster, which contains high-resolution stereo pairs featuring a large collection of labeled non-Lambertian objects, has been released. The work shown that training state-of-the-art deep neural network on such data improves the generalization capabilities of these networks also in presence of non-Lambertian surfaces. Regardless being a further step to tackle the aforementioned challenge, Booster includes a rather small number of annotated images, and thus cannot satisfy the intensive training requirements of deep learning. This thesis work aims to investigate novel view synthesis techniques to augment the Booster dataset, with ultimate goal of improving stereo matching reliability in presence of high-resolution images that displays non-Lambertian surfaces.
Resumo:
Unmanned Aerial Vehicle (UAVs) equipped with cameras have been fast deployed to a wide range of applications, such as smart cities, agriculture or search and rescue applications. Even though UAV datasets exist, the amount of open and quality UAV datasets is limited. So far, we want to overcome this lack of high quality annotation data by developing a simulation framework for a parametric generation of synthetic data. The framework accepts input via a serializable format. The input specifies which environment preset is used, the objects to be placed in the environment along with their position and orientation as well as additional information such as object color and size. The result is an environment that is able to produce UAV typical data: RGB image from the UAVs camera, altitude, roll, pitch and yawn of the UAV. Beyond the image generation process, we improve the resulting image data photorealism by using Synthetic-To-Real transfer learning methods. Transfer learning focuses on storing knowledge gained while solving one problem and applying it to a different - although related - problem. This approach has been widely researched in other affine fields and results demonstrate it to be an interesing area to investigate. Since simulated images are easy to create and synthetic-to-real translation has shown good quality results, we are able to generate pseudo-realistic images. Furthermore, object labels are inherently given, so we are capable of extending the already existing UAV datasets with realistic quality images and high resolution meta-data. During the development of this thesis we have been able to produce a result of 68.4% on UAVid. This can be considered a new state-of-art result on this dataset.
Resumo:
Nel TCR - Termina container Ravenna, è importante che nel momento di scarico del container sul camion non siano presenti persone nell’area. In questo elaborato si descrive la realizzazione e il funzionamento di un sistema di allarme automatico, in grado di rilevare persone ed eventualmente interrompere la procedura di scarico del container. Tale sistema si basa sulla tecnica della object segmentation tramite rimozione dello sfondo, a cui viene affiancata una classificazione e rimozione delle eventuali ombre con un metodo cromatico. Inoltre viene identificata la possibile testa di una persona e avendo a disposizione due telecamere, si mette in atto una visione binoculare per calcolarne l’altezza. Infine, viene presa in considerazione anche la dinamica del sistema, per cui la classificazione di una persona si può basare sulla grandezza, altezza e velocità dell’oggetto individuato.
Resumo:
Il machine learning negli ultimi anni ha acquisito una crescente popolarità nell’ambito della ricerca scientifica e delle sue applicazioni. Lo scopo di questa tesi è stato quello di studiare il machine learning nei suoi aspetti generali e applicarlo a problemi di computer vision. La tesi ha affrontato le difficoltà del dover spiegare dal punto di vista teorico gli algoritmi alla base delle reti neurali convoluzionali e ha successivamente trattato due problemi concreti di riconoscimento immagini: il dataset MNIST (immagini di cifre scritte a mano) e un dataset che sarà chiamato ”MELANOMA dataset” (immagini di melanomi e nevi sani). Utilizzando le tecniche spiegate nella sezione teorica si sono riusciti ad ottenere risultati soddifacenti per entrambi i dataset ottenendo una precisione del 98% per il MNIST e del 76.8% per il MELANOMA dataset
Resumo:
Neural scene representation and neural rendering are new computer vision techniques that enable the reconstruction and implicit representation of real 3D scenes from a set of 2D captured images, by fitting a deep neural network. The trained network can then be used to render novel views of the scene. A recent work in this field, Neural Radiance Fields (NeRF), presented a state-of-the-art approach, which uses a simple Multilayer Perceptron (MLP) to generate photo-realistic RGB images of a scene from arbitrary viewpoints. However, NeRF does not model any light interaction with the fitted scene; therefore, despite producing compelling results for the view synthesis task, it does not provide a solution for relighting. In this work, we propose a new architecture to enable relighting capabilities in NeRF-based representations and we introduce a new real-world dataset to train and evaluate such a model. Our method demonstrates the ability to perform realistic rendering of novel views under arbitrary lighting conditions.
Resumo:
The usage of Optical Character Recognition’s (OCR, systems is a widely spread technology into the world of Computer Vision and Machine Learning. It is a topic that interest many field, for example the automotive, where becomes a specialized task known as License Plate Recognition, useful for many application from the automation of toll road to intelligent payments. However, OCR systems need to be very accurate and generalizable in order to be able to extract the text of license plates under high variable conditions, from the type of camera used for acquisition to light changes. Such variables compromise the quality of digitalized real scenes causing the presence of noise and degradation of various type, which can be minimized with the application of modern approaches for image iper resolution and noise reduction. Oneclass of them is known as Generative Neural Networks, which are very strong ally for the solution of this popular problem.
Resumo:
Artificial Intelligence (AI) has substantially influenced numerous disciplines in recent years. Biology, chemistry, and bioinformatics are among them, with significant advances in protein structure prediction, paratope prediction, protein-protein interactions (PPIs), and antibody-antigen interactions. Understanding PPIs is critical since they are responsible for practically everything living and have several uses in vaccines, cancer, immunology, and inflammatory illnesses. Machine Learning (ML) offers enormous potential for effectively simulating antibody-antigen interactions and improving in-silico optimization of therapeutic antibodies for desired features, including binding activity, stability, and low immunogenicity. This research looks at the use of AI algorithms to better understand antibody-antigen interactions, and it further expands and explains several difficulties encountered in the field. Furthermore, we contribute by presenting a method that outperforms existing state-of-the-art strategies in paratope prediction from sequence data.
Resumo:
Miniaturized flying robotic platforms, called nano-drones, have the potential to revolutionize the autonomous robots industry sector thanks to their very small form factor. The nano-drones’ limited payload only allows for a sub-100mW microcontroller unit for the on-board computations. Therefore, traditional computer vision and control algorithms are too computationally expensive to be executed on board these palm-sized robots, and we are forced to rely on artificial intelligence to trade off accuracy in favor of lightweight pipelines for autonomous tasks. However, relying on deep learning exposes us to the problem of generalization since the deployment scenario of a convolutional neural network (CNN) is often composed by different visual cues and different features from those learned during training, leading to poor inference performances. Our objective is to develop and deploy and adaptation algorithm, based on the concept of latent replays, that would allow us to fine-tune a CNN to work in new and diverse deployment scenarios. To do so we start from an existing model for visual human pose estimation, called PULPFrontnet, which is used to identify the pose of a human subject in space through its 4 output variables, and we present the design of our novel adaptation algorithm, which features automatic data gathering and labeling and on-device deployment. We therefore showcase the ability of our algorithm to adapt PULP-Frontnet to new deployment scenarios, improving the R2 scores of the four network outputs, with respect to an unknown environment, from approximately [−0.2, 0.4, 0.0,−0.7] to [0.25, 0.45, 0.2, 0.1]. Finally we demonstrate how it is possible to fine-tune our neural network in real time (i.e., under 76 seconds), using the target parallel ultra-low power GAP 8 System-on-Chip on board the nano-drone, and we show how all adaptation operations can take place using less than 2mWh of energy, a small fraction of the available battery power.
Resumo:
Technological advancement has undergone exponential growth in recent years, and this has brought significant improvements in the computational capabilities of computers, which can now perform an enormous amount of calculations per second. Taking advantage of these improvements has made it possible to devise algorithms that are very demanding in terms of the computational resources needed to develop architectures capable of solving the most complex problems: currently the most powerful of these are neural networks and in this thesis I will combine these tecniques with classical computer vision algorithms to improve the speed and accuracy of maintenance in photovoltaic facilities.
Resumo:
The present study aimed at providing conditions for the assessment of color discrimination in children using a modified version of the Cambridge Colour Test (CCT, Cambridge Research Systems Ltd., Rochester, UK). Since the task of indicating the gap of the Landolt C used in that test proved counterintuitive and/or difficult for young children to understand, we changed the target Stimulus to a patch of color approximately the size of the Landolt C gap (about 7 degrees Of Visual angle at 50 cm from the monitor). The modifications were performed for the CCT Trivector test which measures color discrimination for the protan, deutan and tritan confusion lines. Experiment I Sought to evaluate the correspondence between the CCT and the child-friendly adaptation with adult subjects (n = 29) with normal color vision. Results showed good agreement between the two test versions. Experiment 2 tested the child-friendly software with children 2 to 7 years old (n = 25) using operant training techniques for establishing and maintaining the subjects` performance. Color discrimination thresholds were progressively lower as age increased within the age range tested (2 to 30 years old), and the data-including those obtained for children-fell within the range of thresholds previously obtained for adults with the CCT. The protan and deutan thresholds were consistently lower than tritan thresholds, a pattern repeatedly observed in adults tested with the CCT. The results demonstrate that the test is fit for assessment of color discrimination in young children and may be a useful tool for the establishment of color vision thresholds during development.
Resumo:
Dissertation presented at the Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa to obtain the Master degree in Electrical and Computer Engineering.
Resumo:
This paper presents an automatic vision-based system for UUV station keeping. The vehicle is equipped with a down-looking camera, which provides images of the sea-floor. The station keeping system is based on a feature-based motion detection algorithm, which exploits standard correlation and explicit textural analysis to solve the correspondence problem. A visual map of the area surveyed by the vehicle is constructed to increase the flexibility of the system, allowing the vehicle to position itself when it has lost the reference image. The testing platform is the URIS underwater vehicle. Experimental results demonstrating the behavior of the system on a real environment are presented
Resumo:
Report for the scientific sojourn carried out at the School of Computing of the University of Dundee, United Kingdom, from 2010 to 2012. This document is a scientific report of the work done, main results, publications and accomplishment of the objectives of the 2-year post-doctoral research project with reference number BP-A 00239. The project has addressed the topic of older people (60+) and Information and Communication Technologies (ICT), which is a topic of growing social and research interest, from a Human-Computer Interaction perspective. Over a 2-year period (June 2010-June 2012), we have conducted classical ethnography of ICT use in a computer clubhouse in Scotland, addressing interaction barriers and strategies, social sharing practices in Social Network Sites, and ICT learning, and carried out rapid ethnographical studies related to geo-enabled ICT and e-government services towards supporting independent living and active ageing. The main results have provided a much deeper understanding of (i) the everyday use of Computer-Mediated Communication tools, such as video-chats and blogs, and its evolution as older people’s experience with ICT increases over time, (ii) cross-cultural aspects of ICT use in the north and south of Europe, (iii) the relevance of cognition over vision in interacting with geographical information and a wide range of ICT tools, despite common stereotypes (e.g. make things bigger), (iv) the important relationship offline-online to provide older people with socially inclusive and meaningful eservices for independent living and active ageing, (v) how older people carry out social sharing practices in the popular YouTube, (vi) their user experiences and (vii) the challenges they face in ICT learning and the strategies they use to become successful ICT learners over time. The research conducted in this project has been published in 17 papers, 4 in journals – two of which in JCR, 5 in conferences, 4 in workshops and 4 in magazines. Other public output consists of 10 invited talks and seminars.
Resumo:
In this paper the effectiveness of a novel method of computer assisted pedicle screw insertion was studied using testing of hypothesis procedure with a sample size of 48. Pattern recognition based on geometric features of markers on the drill has been performed on real time optical video obtained from orthogonally placed CCD cameras. The study reveals the exactness of the calculated position of the drill using navigation based on CT image of the vertebra and real time optical video of the drill. The significance value is 0.424 at 95% confidence level which indicates good precision with a standard mean error of only 0.00724. The virtual vision method is less hazardous to both patient and the surgeon
Resumo:
This paper presents an automatic vision-based system for UUV station keeping. The vehicle is equipped with a down-looking camera, which provides images of the sea-floor. The station keeping system is based on a feature-based motion detection algorithm, which exploits standard correlation and explicit textural analysis to solve the correspondence problem. A visual map of the area surveyed by the vehicle is constructed to increase the flexibility of the system, allowing the vehicle to position itself when it has lost the reference image. The testing platform is the URIS underwater vehicle. Experimental results demonstrating the behavior of the system on a real environment are presented