907 resultados para Visual Odometry,Transformer,Deep learning
Resumo:
Recent advances in the field of statistical learning have established that learners are able to track regularities of multimodal stimuli, yet it is unknown whether the statistical computations are performed on integrated representations or on separate, unimodal representations. In the present study, we investigated the ability of adults to integrate audio and visual input during statistical learning. We presented learners with a speech stream synchronized with a video of a speaker's face. In the critical condition, the visual (e.g., /gi/) and auditory (e.g., /mi/) signals were occasionally incongruent, which we predicted would produce the McGurk illusion, resulting in the perception of an audiovisual syllable (e.g., /ni/). In this way, we used the McGurk illusion to manipulate the underlying statistical structure of the speech streams, such that perception of these illusory syllables facilitated participants' ability to segment the speech stream. Our results therefore demonstrate that participants can integrate audio and visual input to perceive the McGurk illusion during statistical learning. We interpret our findings as support for modality-interactive accounts of statistical learning.
Resumo:
Perceptual learning can occur when stimuli are only imagined, i.e., without proper stimulus presentation. For example, perceptual learning improved bisection discrimination when only the two outer lines of the bisection stimulus were presented and the central line had to be imagined. Performance improved also with other static stimuli. In non-learning imagery experiments, imagining static stimuli is different from imagining motion stimuli. We hypothesized that those differences also affect imagery perceptual learning. Here, we show that imagery training also improves motion direction discrimination. Learning occurs when no stimulus at all is presented during training, whereas no learning occurs when only noise is presented. The interference between noise and mental imagery possibly hinders learning. For static bisection stimuli, the pattern is just the opposite. Learning occurs when presented with the two outer lines of the bisection stimulus, i.e., with only a part of the visual stimulus, while no learning occurs when no stimulus at all is presented.
Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network
Resumo:
Automated tissue characterization is one of the most crucial components of a computer aided diagnosis (CAD) system for interstitial lung diseases (ILDs). Although much research has been conducted in this field, the problem remains challenging. Deep learning techniques have recently achieved impressive results in a variety of computer vision problems, raising expectations that they might be applied in other domains, such as medical image analysis. In this paper, we propose and evaluate a convolutional neural network (CNN), designed for the classification of ILD patterns. The proposed network consists of 5 convolutional layers with 2×2 kernels and LeakyReLU activations, followed by average pooling with size equal to the size of the final feature maps and three dense layers. The last dense layer has 7 outputs, equivalent to the classes considered: healthy, ground glass opacity (GGO), micronodules, consolidation, reticulation, honeycombing and a combination of GGO/reticulation. To train and evaluate the CNN, we used a dataset of 14696 image patches, derived by 120 CT scans from different scanners and hospitals. To the best of our knowledge, this is the first deep CNN designed for the specific problem. A comparative analysis proved the effectiveness of the proposed CNN against previous methods in a challenging dataset. The classification performance (~85.5%) demonstrated the potential of CNNs in analyzing lung patterns. Future work includes, extending the CNN to three-dimensional data provided by CT volume scans and integrating the proposed method into a CAD system that aims to provide differential diagnosis for ILDs as a supportive tool for radiologists.
Resumo:
El principal objetivo de este trabajo es proporcionar una solución en tiempo real basada en visión estéreo o monocular precisa y robusta para que un vehículo aéreo no tripulado (UAV) sea autónomo en varios tipos de aplicaciones UAV, especialmente en entornos abarrotados sin señal GPS. Este trabajo principalmente consiste en tres temas de investigación de UAV basados en técnicas de visión por computador: (I) visual tracking, proporciona soluciones efectivas para localizar visualmente objetos de interés estáticos o en movimiento durante el tiempo que dura el vuelo del UAV mediante una aproximación adaptativa online y una estrategia de múltiple resolución, de este modo superamos los problemas generados por las diferentes situaciones desafiantes, tales como cambios significativos de aspecto, iluminación del entorno variante, fondo del tracking embarullado, oclusión parcial o total de objetos, variaciones rápidas de posición y vibraciones mecánicas a bordo. La solución ha sido utilizada en aterrizajes autónomos, inspección de plataformas mar adentro o tracking de aviones en pleno vuelo para su detección y evasión; (II) odometría visual: proporciona una solución eficiente al UAV para estimar la posición con 6 grados de libertad (6D) usando únicamente la entrada de una cámara estéreo a bordo del UAV. Un método Semi-Global Blocking Matching (SGBM) eficiente basado en una estrategia grueso-a-fino ha sido implementada para una rápida y profunda estimación del plano. Además, la solución toma provecho eficazmente de la información 2D y 3D para estimar la posición 6D, resolviendo de esta manera la limitación de un punto de referencia fijo en la cámara estéreo. Una robusta aproximación volumétrica de mapping basada en el framework Octomap ha sido utilizada para reconstruir entornos cerrados y al aire libre bastante abarrotados en 3D con memoria y errores correlacionados espacialmente o temporalmente; (III) visual control, ofrece soluciones de control prácticas para la navegación de un UAV usando Fuzzy Logic Controller (FLC) con la estimación visual. Y el framework de Cross-Entropy Optimization (CEO) ha sido usado para optimizar el factor de escala y la función de pertenencia en FLC. Todas las soluciones basadas en visión en este trabajo han sido probadas en test reales. Y los conjuntos de datos de imágenes reales grabados en estos test o disponibles para la comunidad pública han sido utilizados para evaluar el rendimiento de estas soluciones basadas en visión con ground truth. Además, las soluciones de visión presentadas han sido comparadas con algoritmos de visión del estado del arte. Los test reales y los resultados de evaluación muestran que las soluciones basadas en visión proporcionadas han obtenido rendimientos en tiempo real precisos y robustos, o han alcanzado un mejor rendimiento que aquellos algoritmos del estado del arte. La estimación basada en visión ha ganado un rol muy importante en controlar un UAV típico para alcanzar autonomía en aplicaciones UAV. ABSTRACT The main objective of this dissertation is providing real-time accurate robust monocular or stereo vision-based solution for Unmanned Aerial Vehicle (UAV) to achieve the autonomy in various types of UAV applications, especially in GPS-denied dynamic cluttered environments. This dissertation mainly consists of three UAV research topics based on computer vision technique: (I) visual tracking, it supplys effective solutions to visually locate interesting static or moving object over time during UAV flight with on-line adaptivity approach and multiple-resolution strategy, thereby overcoming the problems generated by the different challenging situations, such as significant appearance change, variant surrounding illumination, cluttered tracking background, partial or full object occlusion, rapid pose variation and onboard mechanical vibration. The solutions have been utilized in autonomous landing, offshore floating platform inspection and midair aircraft tracking for sense-and-avoid; (II) visual odometry: it provides the efficient solution for UAV to estimate the 6 Degree-of-freedom (6D) pose using only the input of stereo camera onboard UAV. An efficient Semi-Global Blocking Matching (SGBM) method based on a coarse-to-fine strategy has been implemented for fast depth map estimation. In addition, the solution effectively takes advantage of both 2D and 3D information to estimate the 6D pose, thereby solving the limitation of a fixed small baseline in the stereo camera. A robust volumetric occupancy mapping approach based on the Octomap framework has been utilized to reconstruct indoor and outdoor large-scale cluttered environments in 3D with less temporally or spatially correlated measurement errors and memory; (III) visual control, it offers practical control solutions to navigate UAV using Fuzzy Logic Controller (FLC) with the visual estimation. And the Cross-Entropy Optimization (CEO) framework has been used to optimize the scaling factor and the membership function in FLC. All the vision-based solutions in this dissertation have been tested in real tests. And the real image datasets recorded from these tests or available from public community have been utilized to evaluate the performance of these vision-based solutions with ground truth. Additionally, the presented vision solutions have compared with the state-of-art visual algorithms. Real tests and evaluation results show that the provided vision-based solutions have obtained real-time accurate robust performances, or gained better performance than those state-of-art visual algorithms. The vision-based estimation has played a critically important role for controlling a typical UAV to achieve autonomy in the UAV application.
Resumo:
In this paper we introduce a probabilistic approach to support visual supervision and gesture recognition. Task knowledge is both of geometric and visual nature and it is encoded in parametric eigenspaces. Learning processes for compute modal subspaces (eigenspaces) are the core of tracking and recognition of gestures and tasks. We describe the overall architecture of the system and detail learning processes and gesture design. Finally we show experimental results of tracking and recognition in block-world like assembling tasks and in general human gestures.
Resumo:
La recherche d'informations s'intéresse, entre autres, à répondre à des questions comme: est-ce qu'un document est pertinent à une requête ? Est-ce que deux requêtes ou deux documents sont similaires ? Comment la similarité entre deux requêtes ou documents peut être utilisée pour améliorer l'estimation de la pertinence ? Pour donner réponse à ces questions, il est nécessaire d'associer chaque document et requête à des représentations interprétables par ordinateur. Une fois ces représentations estimées, la similarité peut correspondre, par exemple, à une distance ou une divergence qui opère dans l'espace de représentation. On admet généralement que la qualité d'une représentation a un impact direct sur l'erreur d'estimation par rapport à la vraie pertinence, jugée par un humain. Estimer de bonnes représentations des documents et des requêtes a longtemps été un problème central de la recherche d'informations. Le but de cette thèse est de proposer des nouvelles méthodes pour estimer les représentations des documents et des requêtes, la relation de pertinence entre eux et ainsi modestement avancer l'état de l'art du domaine. Nous présentons quatre articles publiés dans des conférences internationales et un article publié dans un forum d'évaluation. Les deux premiers articles concernent des méthodes qui créent l'espace de représentation selon une connaissance à priori sur les caractéristiques qui sont importantes pour la tâche à accomplir. Ceux-ci nous amènent à présenter un nouveau modèle de recherche d'informations qui diffère des modèles existants sur le plan théorique et de l'efficacité expérimentale. Les deux derniers articles marquent un changement fondamental dans l'approche de construction des représentations. Ils bénéficient notamment de l'intérêt de recherche dont les techniques d'apprentissage profond par réseaux de neurones, ou deep learning, ont fait récemment l'objet. Ces modèles d'apprentissage élicitent automatiquement les caractéristiques importantes pour la tâche demandée à partir d'une quantité importante de données. Nous nous intéressons à la modélisation des relations sémantiques entre documents et requêtes ainsi qu'entre deux ou plusieurs requêtes. Ces derniers articles marquent les premières applications de l'apprentissage de représentations par réseaux de neurones à la recherche d'informations. Les modèles proposés ont aussi produit une performance améliorée sur des collections de test standard. Nos travaux nous mènent à la conclusion générale suivante: la performance en recherche d'informations pourrait drastiquement être améliorée en se basant sur les approches d'apprentissage de représentations.
Resumo:
La recherche d'informations s'intéresse, entre autres, à répondre à des questions comme: est-ce qu'un document est pertinent à une requête ? Est-ce que deux requêtes ou deux documents sont similaires ? Comment la similarité entre deux requêtes ou documents peut être utilisée pour améliorer l'estimation de la pertinence ? Pour donner réponse à ces questions, il est nécessaire d'associer chaque document et requête à des représentations interprétables par ordinateur. Une fois ces représentations estimées, la similarité peut correspondre, par exemple, à une distance ou une divergence qui opère dans l'espace de représentation. On admet généralement que la qualité d'une représentation a un impact direct sur l'erreur d'estimation par rapport à la vraie pertinence, jugée par un humain. Estimer de bonnes représentations des documents et des requêtes a longtemps été un problème central de la recherche d'informations. Le but de cette thèse est de proposer des nouvelles méthodes pour estimer les représentations des documents et des requêtes, la relation de pertinence entre eux et ainsi modestement avancer l'état de l'art du domaine. Nous présentons quatre articles publiés dans des conférences internationales et un article publié dans un forum d'évaluation. Les deux premiers articles concernent des méthodes qui créent l'espace de représentation selon une connaissance à priori sur les caractéristiques qui sont importantes pour la tâche à accomplir. Ceux-ci nous amènent à présenter un nouveau modèle de recherche d'informations qui diffère des modèles existants sur le plan théorique et de l'efficacité expérimentale. Les deux derniers articles marquent un changement fondamental dans l'approche de construction des représentations. Ils bénéficient notamment de l'intérêt de recherche dont les techniques d'apprentissage profond par réseaux de neurones, ou deep learning, ont fait récemment l'objet. Ces modèles d'apprentissage élicitent automatiquement les caractéristiques importantes pour la tâche demandée à partir d'une quantité importante de données. Nous nous intéressons à la modélisation des relations sémantiques entre documents et requêtes ainsi qu'entre deux ou plusieurs requêtes. Ces derniers articles marquent les premières applications de l'apprentissage de représentations par réseaux de neurones à la recherche d'informations. Les modèles proposés ont aussi produit une performance améliorée sur des collections de test standard. Nos travaux nous mènent à la conclusion générale suivante: la performance en recherche d'informations pourrait drastiquement être améliorée en se basant sur les approches d'apprentissage de représentations.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Introduction - Group learning has been used to enhance deep (long-term) learning and promote life skills, such as decision making, communication, and interpersonal skills. However, with increasing multiculturalism in higher education, there is little information available as to the acceptance of this form of learning by Asian students or as to its value to them. Methodology - Group-learning projects, incorporating a seminar presentation, were used in first-year veterinary anatomical science classes over two consecutive years (2003 and 2004) at the School of Veterinary Science, University of Queensland. Responses of Australian and Asian students to survey forms evaluating the learning experience were analyzed and compared. Results - All students responded positively to the group learning, indicating that it was a useful learning experience and a great method for meeting colleagues. There were no significant differences between Asian and Australian students in overall responses to the survey evaluating the learning experience, except where Asian students responded significantly higher than Australian students in identifying specific skills that needed improving. Conclusions - Group learning can be successfully used in multicultural teaching to enhance deep learning. This form of learning helps to remove cultural barriers and establish a platform for continued successful group learning throughout the program.
Resumo:
A critical review of previous research revealed that visual attention tests, such as the Useful Field of View (UFOV) test, provided the best means of detecting age-related changes to the visual system that could potentially increase crash risk. However, the question was raised as to whether the UFOV, which was regarded as a static visual attention test, could be improved by inclusion of kinetic targets that more closely represent the driving task. A computer program was written to provide more information about the derivation of UFOV test scores. Although this investigation succeeded in providing new information, some of the commercially protected UFOV test procedures still remain unknown. Two kinetic visual attention tests (DRTS1 and 2), developed at Aston University to investigate inclusion of kinetic targets in visual attention tests, were introduced. The UFOV was found to be more repeatable than either of the kinetic visual attention tests and learning effects or age did not influence these findings. Determinants of static and kinetic visual attention were explored. Increasing target eccentricity led to reduced performance on the UFOV and DRTS1 tests. The DRTS2 was not affected by eccentricity but this may have been due to the style of presentation of its targets. This might also have explained why only the DRTS2 showed laterality effects (i.e. better performance to targets presented on the left hand side of the road). Radial location, explored using the UFOV test, showed that subjects responded best to targets positioned to the horizontal meridian. Distraction had opposite effects on static and kinetic visual attention. While UFOV test performance declined with distraction, DRTS1 performance increased. Previous research had shown that this striking difference was to be expected. Whereas the detection of static targets is attenuated in the presence of distracting stimuli, distracting stimuli that move in a structured flow field enhances the detection of moving targets. Subjects reacted more slowly to kinetic compared to static targets, longitudinal motion compared to angular motion and to increased self-motion. However, the effects of longitudinal motion, angular motion, self-motion and even target eccentricity were caused by target edge speed variations arising because of optic flow field effects. The UFOV test was more able to detect age-related changes to the visual system than were either of the kinetic visual attention tests. The driving samples investigated were too limited to draw firm conclusions. Nevertheless, the results presented showed that neither the DRTS2 nor the UFOV tests were powerful tools for the identification of drivers prone to crashes or poor driving performance.
Resumo:
Background - The literature is not univocal about the effects of Peer Review (PR) within the context of constructivist learning. Due to the predominant focus on using PR as an assessment tool, rather than a constructivist learning activity, and because most studies implicitly assume that the benefits of PR are limited to the reviewee, little is known about the effects upon students who are required to review their peers. Much of the theoretical debate in the literature is focused on explaining how and why constructivist learning is beneficial. At the same time these discussions are marked by an underlying presupposition of a causal relationship between reviewing and deep learning. Objectives - The purpose of the study is to investigate whether the writing of PR feedback causes students to benefit in terms of: perceived utility about statistics, actual use of statistics, better understanding of statistical concepts and associated methods, changed attitudes towards market risks, and outcomes of decisions that were made. Methods - We conducted a randomized experiment, assigning students randomly to receive PR or non–PR treatments and used two cohorts with a different time span. The paper discusses the experimental design and all the software components that we used to support the learning process: Reproducible Computing technology which allows students to reproduce or re–use statistical results from peers, Collaborative PR, and an AI–enhanced Stock Market Engine. Results - The results establish that the writing of PR feedback messages causes students to experience benefits in terms of Behavior, Non–Rote Learning, and Attitudes, provided the sequence of PR activities are maintained for a period that is sufficiently long.
Resumo:
Il riconoscimento delle gesture è un tema di ricerca che sta acquisendo sempre più popolarità, specialmente negli ultimi anni, grazie ai progressi tecnologici dei dispositivi embedded e dei sensori. Lo scopo di questa tesi è quello di utilizzare alcune tecniche di machine learning per realizzare un sistema in grado di riconoscere e classificare in tempo reale i gesti delle mani, a partire dai segnali mioelettrici (EMG) prodotti dai muscoli. Inoltre, per consentire il riconoscimento di movimenti spaziali complessi, verranno elaborati anche segnali di tipo inerziale, provenienti da una Inertial Measurement Unit (IMU) provvista di accelerometro, giroscopio e magnetometro. La prima parte della tesi, oltre ad offrire una panoramica sui dispositivi wearable e sui sensori, si occuperà di analizzare alcune tecniche per la classificazione di sequenze temporali, evidenziandone vantaggi e svantaggi. In particolare, verranno considerati approcci basati su Dynamic Time Warping (DTW), Hidden Markov Models (HMM), e reti neurali ricorrenti (RNN) di tipo Long Short-Term Memory (LSTM), che rappresentano una delle ultime evoluzioni nel campo del deep learning. La seconda parte, invece, riguarderà il progetto vero e proprio. Verrà impiegato il dispositivo wearable Myo di Thalmic Labs come caso di studio, e saranno applicate nel dettaglio le tecniche basate su DTW e HMM per progettare e realizzare un framework in grado di eseguire il riconoscimento real-time di gesture. Il capitolo finale mostrerà i risultati ottenuti (fornendo anche un confronto tra le tecniche analizzate), sia per la classificazione di gesture isolate che per il riconoscimento in tempo reale.
Resumo:
The efficiency of lecturing or large group teaching has been called into question for many years. An abundance of literature details the components of effective teaching which are not provided in the traditional lecture setting, with many alternative methods of teaching recommended. However, with continued constraints on resources large group teaching is here to stay and student’s expect and are familiar with this method.
Technology Enhanced Learning may be the way forward, to prevent educators from “throwing out the baby with the bath water”. TEL could help Educator’s especially in the area of life sciences which is often taught by lectures to engage and involve students in their learning, provide feedback and incorporate the “quality” of small group teaching, case studies and Enquiry Based Learning into the large group setting thus promoting effective and deep learning.
Resumo:
The development of new learning models has been of great importance throughout recent years, with a focus on creating advances in the area of deep learning. Deep learning was first noted in 2006, and has since become a major area of research in a number of disciplines. This paper will delve into the area of deep learning to present its current limitations and provide a new idea for a fully integrated deep and dynamic probabilistic system. The new model will be applicable to a vast number of areas initially focusing on applications into medical image analysis with an overall goal of utilising this approach for prediction purposes in computer based medical systems.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08