A study on tackling visual odometry by a transformer architecture


Autoria(s): Wen, Xiaowei
Contribuinte(s)

Di Stefano, Luigi

De Luigi, Luca

Data(s)

06/10/2022

Resumo

This dissertation describes a deepening study about Visual Odometry problem tackled with transformer architectures. The existing VO algorithms are based on heavily hand-crafted features and are not able to generalize well to new environments. To train them, we need carefully fine-tune the hyper-parameters and the network architecture. We propose to tackle the VO problem with transformer because it is a general-purpose architecture and because it was designed to transformer sequences of data from a domain to another one, which is the case of the VO problem. Our first goal is to create synthetic dataset using BlenderProc2 framework to mitigate the problem of the dataset scarcity. The second goal is to tackle the VO problem by using different versions of the transformer architecture, which will be pre-trained on the synthetic dataset and fine-tuned on the real dataset, KITTI dataset. Our approach is defined as follows: we use a feature-extractor to extract features embeddings from a sequence of images, then we feed this sequence of embeddings to the transformer architecture, finally, an MLP is used to predict the sequence of camera poses.

Formato

application/pdf

Identificador

http://amslaurea.unibo.it/26920/1/tesi.pdf

Wen, Xiaowei (2022) A study on tackling visual odometry by a transformer architecture. [Laurea magistrale], Università di Bologna, Corso di Studio in Artificial intelligence [LM-DM270] <http://amslaurea.unibo.it/view/cds/CDS9063/>, Documento ad accesso riservato.

Idioma(s)

en

Publicador

Alma Mater Studiorum - Università di Bologna

Relação

http://amslaurea.unibo.it/26920/

Direitos

Free to read

Palavras-Chave #Visual Odometry,Transformer,Deep learning #Artificial intelligence [LM-DM270]
Tipo

PeerReviewed

info:eu-repo/semantics/masterThesis