8 resultados para Vision-based

em AMS Tesi di Laurea - Alm@DL - Università di Bologna


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Lo scopo di questo studio è l’implementazione di un sistema di navigazione autonomo in grado di calcolare la traiettoria di un mezzo aereo, noti che siano a priori dei punti di posizione detti waypoint. A partire da questa traiettoria, è possibile ottenere la sua rappresentazione in un codice che mette a disposizione immagini satellitari e ricavare le viste del terreno sorvolato in una serie di punti calcolati, in modo da garantire in ogni sequenza la presenza di elementi comuni rispetto a quella precedente. Lo scopo della realizzazione di questa banca dati è rendere possibili futuri sviluppi di algoritmi di navigazione basati su deep learning e reti neurali. Le immagini virtuali ottenute del terreno saranno in futuro applicate alla navigazione autonoma per agricoltura di precisione mediante droni. Per lo studio condotto è stato simulato un generico velivolo, con o senza pilota, dotato di una videocamera fissata su una sospensione cardanica a tre assi (gimbal). La tesi, dunque, introduce ai più comuni metodi di determinazione della posizione dei velivoli e alle più recenti soluzioni basate su algoritmi di Deep Learning e sistemi vision-based con reti neurali e segue in un approfondimento sul metodo di conversione degli angoli e sulla teoria matematica che ne sta alla base. Successivamente, analizza nel dettaglio il processo di simulazione della navigazione autonoma e della determinazione della traiettoria in ambiente software Matlab e Simulink, procedendo nell’analisi di alcuni casi di studio in ambienti realistici. L’elaborato si conclude con un breve riepilogo di quanto svolto e con alcune considerazioni sugli sviluppi futuri.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Vision systems are powerful tools playing an increasingly important role in modern industry, to detect errors and maintain product standards. With the enlarged availability of affordable industrial cameras, computer vision algorithms have been increasingly applied in industrial manufacturing processes monitoring. Until a few years ago, industrial computer vision applications relied only on ad-hoc algorithms designed for the specific object and acquisition setup being monitored, with a strong focus on co-designing the acquisition and processing pipeline. Deep learning has overcome these limits providing greater flexibility and faster re-configuration. In this work, the process to be inspected consists in vials’ pack formation entering a freeze-dryer, which is a common scenario in pharmaceutical active ingredient packaging lines. To ensure that the machine produces proper packs, a vision system is installed at the entrance of the freeze-dryer to detect eventual anomalies with execution times compatible with the production specifications. Other constraints come from sterility and safety standards required in pharmaceutical manufacturing. This work presents an overview about the production line, with particular focus on the vision system designed, and about all trials conducted to obtain the final performance. Transfer learning, alleviating the requirement for a large number of training data, combined with data augmentation methods, consisting in the generation of synthetic images, were used to effectively increase the performances while reducing the cost of data acquisition and annotation. The proposed vision algorithm is composed by two main subtasks, designed respectively to vials counting and discrepancy detection. The first one was trained on more than 23k vials (about 300 images) and tested on 5k more (about 75 images), whereas 60 training images and 52 testing images were used for the second one.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Artificial Intelligence is reshaping the field of fashion industry in different ways. E-commerce retailers exploit their data through AI to enhance their search engines, make outfit suggestions and forecast the success of a specific fashion product. However, it is a challenging endeavour as the data they possess is huge, complex and multi-modal. The most common way to search for fashion products online is by matching keywords with phrases in the product's description which are often cluttered, inadequate and differ across collections and sellers. A customer may also browse an online store's taxonomy, although this is time-consuming and doesn't guarantee relevant items. With the advent of Deep Learning architectures, particularly Vision-Language models, ad-hoc solutions have been proposed to model both the product image and description to solve this problems. However, the suggested solutions do not exploit effectively the semantic or syntactic information of these modalities, and the unique qualities and relations of clothing items. In this work of thesis, a novel approach is proposed to address this issues, which aims to model and process images and text descriptions as graphs in order to exploit the relations inside and between each modality and employs specific techniques to extract syntactic and semantic information. The results obtained show promising performances on different tasks when compared to the present state-of-the-art deep learning architectures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general. However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of "intelligence". The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them. CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems. HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain. In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Questa tesi si occupa dell’estensione di un framework software finalizzato all'individuazione e al tracciamento di persone in una scena ripresa da telecamera stereoscopica. In primo luogo è rimossa la necessità di una calibrazione manuale offline del sistema sfruttando algoritmi che consentono di individuare, a partire da un fotogramma acquisito dalla camera, il piano su cui i soggetti tracciati si muovono. Inoltre, è introdotto un modulo software basato su deep learning con lo scopo di migliorare la precisione del tracciamento. Questo componente, che è in grado di individuare le teste presenti in un fotogramma, consente ridurre i dati analizzati al solo intorno della posizione effettiva di una persona, escludendo oggetti che l’algoritmo di tracciamento sarebbe portato a individuare come persone.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The job of a historian is to understand what happened in the past, resorting in many cases to written documents as a firsthand source of information. Text, however, does not amount to the only source of knowledge. Pictorial representations, in fact, have also accompanied the main events of the historical timeline. In particular, the opportunity of visually representing circumstances has bloomed since the invention of photography, with the possibility of capturing in real-time the occurrence of a specific events. Thanks to the widespread use of digital technologies (e.g. smartphones and digital cameras), networking capabilities and consequent availability of multimedia content, the academic and industrial research communities have developed artificial intelligence (AI) paradigms with the aim of inferring, transferring and creating new layers of information from images, videos, etc. Now, while AI communities are devoting much of their attention to analyze digital images, from an historical research standpoint more interesting results may be obtained analyzing analog images representing the pre-digital era. Within the aforementioned scenario, the aim of this work is to analyze a collection of analog documentary photographs, building upon state-of-the-art deep learning techniques. In particular, the analysis carried out in this thesis aims at producing two following results: (a) produce the date of an image, and, (b) recognizing its background socio-cultural context,as defined by a group of historical-sociological researchers. Given these premises, the contribution of this work amounts to: (i) the introduction of an historical dataset including images of “Family Album” among all the twentieth century, (ii) the introduction of a new classification task regarding the identification of the socio-cultural context of an image, (iii) the exploitation of different deep learning architectures to perform the image dating and the image socio-cultural context classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A differenza di quanto avviene nel commercio tradizionale, in quello online il cliente non ha la possibilità di toccare con mano o provare il prodotto. La decisione di acquisto viene maturata in base ai dati messi a disposizione dal venditore attraverso titolo, descrizioni, immagini e alle recensioni di clienti precedenti. É quindi possibile prevedere quanto un prodotto venderà sulla base di queste informazioni. La maggior parte delle soluzioni attualmente presenti in letteratura effettua previsioni basandosi sulle recensioni, oppure analizzando il linguaggio usato nelle descrizioni per capire come questo influenzi le vendite. Le recensioni, tuttavia, non sono informazioni note ai venditori prima della commercializzazione del prodotto; usando solo dati testuali, inoltre, si tralascia l’influenza delle immagini. L'obiettivo di questa tesi è usare modelli di machine learning per prevedere il successo di vendita di un prodotto a partire dalle informazioni disponibili al venditore prima della commercializzazione. Si fa questo introducendo un modello cross-modale basato su Vision-Language Transformer in grado di effettuare classificazione. Un modello di questo tipo può aiutare i venditori a massimizzare il successo di vendita dei prodotti. A causa della mancanza, in letteratura, di dataset contenenti informazioni relative a prodotti venduti online che includono l’indicazione del successo di vendita, il lavoro svolto comprende la realizzazione di un dataset adatto a testare la soluzione sviluppata. Il dataset contiene un elenco di 78300 prodotti di Moda venduti su Amazon, per ognuno dei quali vengono riportate le principali informazioni messe a disposizione dal venditore e una misura di successo sul mercato. Questa viene ricavata a partire dal gradimento espresso dagli acquirenti e dal posizionamento del prodotto in una graduatoria basata sul numero di esemplari venduti.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Unmanned Aerial Vehicle (UAVs) equipped with cameras have been fast deployed to a wide range of applications, such as smart cities, agriculture or search and rescue applications. Even though UAV datasets exist, the amount of open and quality UAV datasets is limited. So far, we want to overcome this lack of high quality annotation data by developing a simulation framework for a parametric generation of synthetic data. The framework accepts input via a serializable format. The input specifies which environment preset is used, the objects to be placed in the environment along with their position and orientation as well as additional information such as object color and size. The result is an environment that is able to produce UAV typical data: RGB image from the UAVs camera, altitude, roll, pitch and yawn of the UAV. Beyond the image generation process, we improve the resulting image data photorealism by using Synthetic-To-Real transfer learning methods. Transfer learning focuses on storing knowledge gained while solving one problem and applying it to a different - although related - problem. This approach has been widely researched in other affine fields and results demonstrate it to be an interesing area to investigate. Since simulated images are easy to create and synthetic-to-real translation has shown good quality results, we are able to generate pseudo-realistic images. Furthermore, object labels are inherently given, so we are capable of extending the already existing UAV datasets with realistic quality images and high resolution meta-data. During the development of this thesis we have been able to produce a result of 68.4% on UAVid. This can be considered a new state-of-art result on this dataset.