The Neural Networks customized and tested in this thesis (WaldoNet, FlowNet and PatchNet) are a first exploration and approach to the Template Matching task. The possibilities of extension are therefore many and some are proposed below. During my thesis, I have analyzed the functioning of the classical algorithms and adapted with deep learning algorithms. The features extracted from both the template and the query images resemble the keypoints of the SIFT algorithm. Then, instead of similarity function or keypoints matching, WaldoNet and PatchNet use the convolutional layer to compare the features, while FlowNet uses the correlational layer. In addition, I have identified the major challenges of the Template Matching task (affine/non-affine transformations, intensity changes...) and solved them with a careful design of the dataset.
Artificial Intelligence is reshaping the field of fashion industry in different ways. E-commerce retailers exploit their data through AI to enhance their search engines, make outfit suggestions and forecast the success of a specific fashion product. However, it is a challenging endeavour as the data they possess is huge, complex and multi-modal. The most common way to search for fashion products online is by matching keywords with phrases in the product's description which are often cluttered, inadequate and differ across collections and sellers. A customer may also browse an online store's taxonomy, although this is time-consuming and doesn't guarantee relevant items. With the advent of Deep Learning architectures, particularly Vision-Language models, ad-hoc solutions have been proposed to model both the product image and description to solve this problems. However, the suggested solutions do not exploit effectively the semantic or syntactic information of these modalities, and the unique qualities and relations of clothing items. In this work of thesis, a novel approach is proposed to address this issues, which aims to model and process images and text descriptions as graphs in order to exploit the relations inside and between each modality and employs specific techniques to extract syntactic and semantic information. The results obtained show promising performances on different tasks when compared to the present state-of-the-art deep learning architectures.
Il machine learning negli ultimi anni ha acquisito una crescente popolarità nell’ambito della ricerca scientifica e delle sue applicazioni. Lo scopo di questa tesi è stato quello di studiare il machine learning nei suoi aspetti generali e applicarlo a problemi di computer vision. La tesi ha affrontato le difficoltà del dover spiegare dal punto di vista teorico gli algoritmi alla base delle reti neurali convoluzionali e ha successivamente trattato due problemi concreti di riconoscimento immagini: il dataset MNIST (immagini di cifre scritte a mano) e un dataset che sarà chiamato ”MELANOMA dataset” (immagini di melanomi e nevi sani). Utilizzando le tecniche spiegate nella sezione teorica si sono riusciti ad ottenere risultati soddifacenti per entrambi i dataset ottenendo una precisione del 98% per il MNIST e del 76.8% per il MELANOMA dataset
Neural scene representation and neural rendering are new computer vision techniques that enable the reconstruction and implicit representation of real 3D scenes from a set of 2D captured images, by fitting a deep neural network. The trained network can then be used to render novel views of the scene. A recent work in this field, Neural Radiance Fields (NeRF), presented a state-of-the-art approach, which uses a simple Multilayer Perceptron (MLP) to generate photo-realistic RGB images of a scene from arbitrary viewpoints. However, NeRF does not model any light interaction with the fitted scene; therefore, despite producing compelling results for the view synthesis task, it does not provide a solution for relighting. In this work, we propose a new architecture to enable relighting capabilities in NeRF-based representations and we introduce a new real-world dataset to train and evaluate such a model. Our method demonstrates the ability to perform realistic rendering of novel views under arbitrary lighting conditions.
Distance learners are self-directed learners traditionally taught via study books, collections of readings, and exercises to test understanding of learning packages. Despite advances in e-Learning environments and computer-based teaching interfaces, distance learners still lack opportunities to participate in exercises and debates available to classroom learners, particularly through non-text based learning techniques. Effective distance teaching requires flexible learning opportunities. Using arguments developed in interpretation literature, we argue that effective distance learning must also be Entertaining, Relevant, Organised, Thematic, Involving and Creative—E.R.O.T.I.C. (after Ham, 1992). We discuss an experiment undertaken with distance learners at The University of Queensland Gatton Campus, where we initiated an E.R.O.T.I.C. external teaching package aimed at engaging distance learners but using multimedia, including but not limited to text-based learning tools. Student responses to non-text media were positive.
Lifelong learning (LLL) has received increasing attention in recent years. It implies that learning should take place at all stages of the “life cycle and it should be life-wide, that is embedded in all life contexts from the school to the work place, the home and the community” (Green, 2002, p.613). The ‘learning society’, is the vision of a society where there are recognized opportunities for learning for every person, wherever they are and however old they happen to be. Globalization and the rise of new information technologies are some of the driving forces that cause depreciation of specialised competences. This happens very quickly in terms of economic value; consequently, workers of all skills levels, during their working life, must have the opportunity to update “their technical skills and enhance general skills to keep pace with continuous technological change and new job requirements” (Fahr, 2005, p. 75). It is in this context that LLL tops the policy agenda of international bodies, national governments and non-governmental organizations, in the field of education and training, to justify the need for LLL opportunities for the population as they face contemporary employability challenges. It is in this context that the requirement and interest to analyse the behaviour patterns of adult learners has developed over the last few years
A exploração do meio subaquático utilizando visão computacional é ainda um processo complexo. Geralmente são utilizados sistemas de visão baseados em visão stereo, no entanto, esta abordagem apresenta limitações, é pouco precisa e é exigente em termos computacionais quando o meio de operação é o subaquático. Estas limitações surgem principalmente em dois cenários de aplicação: quando existe escassez de iluminação e em operações junto a infraestruturas subaquáticas. Consequentemente, a solução reside na utilização de fontes de informação sensorial alternativas ou complementares ao sistema de visão computacional. Neste trabalho propõe-se o desenvolvimento de um sistema de percepção subaquático que combina uma câmara e um projetor laser de um feixe em linha, onde o projetor de luz estruturada _e utilizado como fonte de informação. Em qualquer sistema de visão computacional, e ainda mais relevante em sistemas baseados em triangulação, a sua correta calibração toma um papel fulcral para a qualidade das medidas obtidas com o sistema. A calibração do sistema de visão laser foi dividida em duas etapas. A primeira etapa diz respeito à calibração da câmara, onde são definidos os parâmetros intrínsecos e os parâmetros extrínsecos relativos a este sensor. A segunda etapa define a relação entre a câmara e o laser, sendo esta etapa necessária para a obtenção de imagens tridimensionais. Assim, um dos principais desafios desta dissertação passou por resolver o problema da calibração inerente a este sistema. Desse modo, foi desenvolvida uma ferramenta que requer, pelo menos duas fotos do padrão de xadrez, com perspectivas diferentes. O método proposto foi caracterizado e validado em ambientes secos e subaquáticos. Os resultados obtidos mostram que o sistema _e preciso e os valores de profundidade obtidos apresentam um erro significativamente baixo (inferiores a 1 mm), mesmo com uma base-line (distância entre a centro óptico da câmara e o plano de incidência do laser) reduzida.
Currently, the teaching-learning process in domains, such as computer programming, is characterized by an extensive curricula and a high enrolment of students. This poses a great workload for faculty and teaching assistants responsible for the creation, delivery, and assessment of student exercises. The main goal of this chapter is to foster practice-based learning in complex domains. This objective is attained with an e-learning framework—called Ensemble—as a conceptual tool to organize and facilitate technical interoperability among services. The Ensemble framework is used on a specific domain: computer programming. Content issues are tacked with a standard format to describe programming exercises as learning objects. Communication is achieved with the extension of existing specifications for the interoperation with several systems typically found in an e-learning environment. In order to evaluate the acceptability of the proposed solution, an Ensemble instance was validated on a classroom experiment with encouraging results.
Nos últimos anos, o fácil acesso em termos de custos, ferramentas de produção, edição e distribuição de conteúdos audiovisuais, contribuíram para o aumento exponencial da produção diária deste tipo de conteúdos. Neste paradigma de superabundância de conteúdos multimédia existe uma grande percentagem de sequências de vídeo que contém material explícito, sendo necessário existir um controlo mais rigoroso, de modo a não ser facilmente acessível a menores. O conceito de conteúdo explícito pode ser caraterizado de diferentes formas, tendo o trabalho descrito neste documento incidido sobre a deteção automática de nudez feminina presente em sequências de vídeo. Este processo de deteção e classificação automática de material para adultos pode constituir uma ferramenta importante na gestão de um canal de televisão. Diariamente podem ser recebidas centenas de horas de material sendo impraticável a implementação de um processo manual de controlo de qualidade. A solução criada no contexto desta dissertação foi estudada e desenvolvida em torno de um produto especifico ligado à área do broadcasting. Este produto é o mxfSPEEDRAIL F1000, sendo este uma solução da empresa MOG Technologies. O objetivo principal do projeto é o desenvolvimento de uma biblioteca em C++, acessível durante o processo de ingest, que permita, através de uma análise baseada em funcionalidades de visão computacional, detetar e sinalizar na metadata do sinal, quais as frames que potencialmente apresentam conteúdo explícito. A solução desenvolvida utiliza um conjunto de técnicas do estado da arte adaptadas ao problema a tratar. Nestas incluem-se algoritmos para realizar a segmentação de pele e deteção de objetos em imagens. Por fim é efetuada uma análise critica à solução desenvolvida no âmbito desta dissertação de modo a que em futuros desenvolvimentos esta seja melhorada a nível do consumo de recursos durante a análise e a nível da sua taxa de sucesso.
Massive Open Online Courses (MOOC) are gaining prominence in transversal teaching-learning strategies. However, there are many issues still debated, namely assessment, recognized largely as a cornerstone in Education. The large number of students involved requires a redefinition of strategies that often use approaches based on tasks or challenging projects. In these conditions and due to this approach, assessment is made through peer-reviewed assignments and quizzes online. The peer-reviewed assignments are often based upon sample answers or topics, which guide the student in the task of evaluating peers. This chapter analyzes the grading and evaluation in MOOCs, especially in science and engineering courses, within the context of education and grading methodologies and discusses possible perspectives to pursue grading quality in massive e-learning courses.
OER-based learning has the potential to overcome many shortcomings and problems of traditional education. It is not hampered by IP restrictions; can depend on collaborative, cumulative, iterative refinement of resources; and the digital form provides unprecedented flexibility with respect to configuration and delivery. The OER community is a progressive group of educators and learners with decades of learning research to draw from, who know that we must prepare learners for an evolving and diverse reality. Despite this OER tends to replicate the unsuccessful characteristics of traditional education. To remedy this we may need to remember the importance of imperfection, mistakes, problems, disagreement, and the incomplete for engaged learning, and relinquish our notions of perfection, acknowledging that learners learn differently and we need diverse learners. We must stretch our perceptions of quality and provide mechanisms for engaging the incredible pool of educators globally to fulfill the promise of inclusive education.
Catadioptric sensors are combinations of mirrors and lenses made in order to obtain a wide field of view. In this paper we propose a new sensor that has omnidirectional viewing ability and it also provides depth information about the nearby surrounding. The sensor is based on a conventional camera coupled with a laser emitter and two hyperbolic mirrors. Mathematical formulation and precise specifications of the intrinsic and extrinsic parameters of the sensor are discussed. Our approach overcomes limitations of the existing omni-directional sensors and eventually leads to reduced costs of production
This paper focuses on the problem of realizing a plane-to-plane virtual link between a camera attached to the end-effector of a robot and a planar object. In order to do the system independent to the object surface appearance, a structured light emitter is linked to the camera so that 4 laser pointers are projected onto the object. In a previous paper we showed that such a system has good performance and nice characteristics like partial decoupling near the desired state and robustness against misalignment of the emitter and the camera (J. Pages et al., 2004). However, no analytical results concerning the global asymptotic stability of the system were obtained due to the high complexity of the visual features utilized. In this work we present a better set of visual features which improves the properties of the features in (J. Pages et al., 2004) and for which it is possible to prove the global asymptotic stability
In this paper we face the problem of positioning a camera attached to the end-effector of a robotic manipulator so that it gets parallel to a planar object. Such problem has been treated for a long time in visual servoing. Our approach is based on linking to the camera several laser pointers so that its configuration is aimed to produce a suitable set of visual features. The aim of using structured light is not only for easing the image processing and to allow low-textured objects to be treated, but also for producing a control scheme with nice properties like decoupling, stability, well conditioning and good camera trajectory