919 resultados para Vision-based navigation
Resumo:
More information is now readily available to computer users than at any time in human history; however, much of this information is often inaccessible to people with blindness or low-vision, for whom information must be presented non-visually. Currently, screen readers are able to verbalize on-screen text using text-to-speech (TTS) synthesis; however, much of this vocalization is inadequate for browsing the Internet. An auditory interface that incorporates auditory-spatial orientation was created and tested. For information that can be structured as a two-dimensional table, links can be semantically grouped as cells in a row within an auditory table, which provides a consistent structure for auditory navigation. An auditory display prototype was tested.^ Sixteen legally blind subjects participated in this research study. Results demonstrated that stereo panning was an effective technique for audio-spatially orienting non-visual navigation in a five-row, six-column HTML table as compared to a centered, stationary synthesized voice. These results were based on measuring the time- to-target (TTT), or the amount of time elapsed from the first prompting to the selection of each tabular link. Preliminary analysis of the TTT values recorded during the experiment showed that the populations did not conform to the ANOVA requirements of normality and equality of variances. Therefore, the data were transformed using the natural logarithm. The repeated-measures two-factor ANOVA results show that the logarithmically-transformed TTTs were significantly affected by the tonal variation method, F(1,15) = 6.194, p= 0.025. Similarly, the results show that the logarithmically transformed TTTs were marginally affected by the stereo spatialization method, F(1,15) = 4.240, p=0.057. The results show that the logarithmically transformed TTTs were not significantly affected by the interaction of both methods, F(1,15) = 1.381, p=0.258. These results suggest that some confusion may be caused in the subject when employing both of these methods simultaneously. The significant effect of tonal variation indicates that the effect is actually increasing the average TTT. In other words, the presence of preceding tones increases task completion time on average. The marginally-significant effect of stereo spatialization decreases the average log(TTT) from 2.405 to 2.264.^
Resumo:
Abstract : Images acquired from unmanned aerial vehicles (UAVs) can provide data with unprecedented spatial and temporal resolution for three-dimensional (3D) modeling. Solutions developed for this purpose are mainly operating based on photogrammetry concepts, namely UAV-Photogrammetry Systems (UAV-PS). Such systems are used in applications where both geospatial and visual information of the environment is required. These applications include, but are not limited to, natural resource management such as precision agriculture, military and police-related services such as traffic-law enforcement, precision engineering such as infrastructure inspection, and health services such as epidemic emergency management. UAV-photogrammetry systems can be differentiated based on their spatial characteristics in terms of accuracy and resolution. That is some applications, such as precision engineering, require high-resolution and high-accuracy information of the environment (e.g. 3D modeling with less than one centimeter accuracy and resolution). In other applications, lower levels of accuracy might be sufficient, (e.g. wildlife management needing few decimeters of resolution). However, even in those applications, the specific characteristics of UAV-PSs should be well considered in the steps of both system development and application in order to yield satisfying results. In this regard, this thesis presents a comprehensive review of the applications of unmanned aerial imagery, where the objective was to determine the challenges that remote-sensing applications of UAV systems currently face. This review also allowed recognizing the specific characteristics and requirements of UAV-PSs, which are mostly ignored or not thoroughly assessed in recent studies. Accordingly, the focus of the first part of this thesis is on exploring the methodological and experimental aspects of implementing a UAV-PS. The developed system was extensively evaluated for precise modeling of an open-pit gravel mine and performing volumetric-change measurements. This application was selected for two main reasons. Firstly, this case study provided a challenging environment for 3D modeling, in terms of scale changes, terrain relief variations as well as structure and texture diversities. Secondly, open-pit-mine monitoring demands high levels of accuracy, which justifies our efforts to improve the developed UAV-PS to its maximum capacities. The hardware of the system consisted of an electric-powered helicopter, a high-resolution digital camera, and an inertial navigation system. The software of the system included the in-house programs specifically designed for camera calibration, platform calibration, system integration, onboard data acquisition, flight planning and ground control point (GCP) detection. The detailed features of the system are discussed in the thesis, and solutions are proposed in order to enhance the system and its photogrammetric outputs. The accuracy of the results was evaluated under various mapping conditions, including direct georeferencing and indirect georeferencing with different numbers, distributions and types of ground control points. Additionally, the effects of imaging configuration and network stability on modeling accuracy were assessed. The second part of this thesis concentrates on improving the techniques of sparse and dense reconstruction. The proposed solutions are alternatives to traditional aerial photogrammetry techniques, properly adapted to specific characteristics of unmanned, low-altitude imagery. Firstly, a method was developed for robust sparse matching and epipolar-geometry estimation. The main achievement of this method was its capacity to handle a very high percentage of outliers (errors among corresponding points) with remarkable computational efficiency (compared to the state-of-the-art techniques). Secondly, a block bundle adjustment (BBA) strategy was proposed based on the integration of intrinsic camera calibration parameters as pseudo-observations to Gauss-Helmert model. The principal advantage of this strategy was controlling the adverse effect of unstable imaging networks and noisy image observations on the accuracy of self-calibration. The sparse implementation of this strategy was also performed, which allowed its application to data sets containing a lot of tie points. Finally, the concepts of intrinsic curves were revisited for dense stereo matching. The proposed technique could achieve a high level of accuracy and efficiency by searching only through a small fraction of the whole disparity search space as well as internally handling occlusions and matching ambiguities. These photogrammetric solutions were extensively tested using synthetic data, close-range images and the images acquired from the gravel-pit mine. Achieving absolute 3D mapping accuracy of 11±7 mm illustrated the success of this system for high-precision modeling of the environment.
Resumo:
Clouds are important in weather prediction, climate studies and aviation safety. Important parameters include cloud height, type and cover percentage. In this paper, the recent improvements in the development of a low-cost cloud height measurement setup are described. It is based on stereo vision with consumer digital cameras. The cameras positioning is calibrated using the position of stars in the night sky. An experimental uncertainty analysis of the calibration parameters is performed. Cloud height measurement results are presented and compared with LIDAR measurements.
Resumo:
Vision systems are powerful tools playing an increasingly important role in modern industry, to detect errors and maintain product standards. With the enlarged availability of affordable industrial cameras, computer vision algorithms have been increasingly applied in industrial manufacturing processes monitoring. Until a few years ago, industrial computer vision applications relied only on ad-hoc algorithms designed for the specific object and acquisition setup being monitored, with a strong focus on co-designing the acquisition and processing pipeline. Deep learning has overcome these limits providing greater flexibility and faster re-configuration. In this work, the process to be inspected consists in vials’ pack formation entering a freeze-dryer, which is a common scenario in pharmaceutical active ingredient packaging lines. To ensure that the machine produces proper packs, a vision system is installed at the entrance of the freeze-dryer to detect eventual anomalies with execution times compatible with the production specifications. Other constraints come from sterility and safety standards required in pharmaceutical manufacturing. This work presents an overview about the production line, with particular focus on the vision system designed, and about all trials conducted to obtain the final performance. Transfer learning, alleviating the requirement for a large number of training data, combined with data augmentation methods, consisting in the generation of synthetic images, were used to effectively increase the performances while reducing the cost of data acquisition and annotation. The proposed vision algorithm is composed by two main subtasks, designed respectively to vials counting and discrepancy detection. The first one was trained on more than 23k vials (about 300 images) and tested on 5k more (about 75 images), whereas 60 training images and 52 testing images were used for the second one.
Resumo:
Artificial Intelligence is reshaping the field of fashion industry in different ways. E-commerce retailers exploit their data through AI to enhance their search engines, make outfit suggestions and forecast the success of a specific fashion product. However, it is a challenging endeavour as the data they possess is huge, complex and multi-modal. The most common way to search for fashion products online is by matching keywords with phrases in the product's description which are often cluttered, inadequate and differ across collections and sellers. A customer may also browse an online store's taxonomy, although this is time-consuming and doesn't guarantee relevant items. With the advent of Deep Learning architectures, particularly Vision-Language models, ad-hoc solutions have been proposed to model both the product image and description to solve this problems. However, the suggested solutions do not exploit effectively the semantic or syntactic information of these modalities, and the unique qualities and relations of clothing items. In this work of thesis, a novel approach is proposed to address this issues, which aims to model and process images and text descriptions as graphs in order to exploit the relations inside and between each modality and employs specific techniques to extract syntactic and semantic information. The results obtained show promising performances on different tasks when compared to the present state-of-the-art deep learning architectures.
Resumo:
To evaluate the use of optical and nonoptical aids during reading and writing activities in individuals with acquired low vision. This study was performed using descriptive and cross-sectional surveys. The data collection instrument was created with structured questions that were developed from an exploratory study and a previous test based on interviews, and it evaluated the following variables: personal characteristics, use of optical and nonoptical aids, and activities that required the use of optical and nonoptical aids. The study population included 30 subjects with acquired low vision and visual acuities of 20/200-20/400. Most subjects reported the use of some optical aids (60.0%). Of these 60.0%, the majority (83.3%) cited spectacles as the most widely used optical aid. The majority (63.3%) of subjects also reported the use of nonoptical aids, the most frequent ones being letter magnification (68.4%), followed by bringing the objects closer to the eyes (57.8%). Subjects often used more than one nonoptical aid. The majority of participants reported the use of optical and nonoptical aids during reading activities, highlighting the use of spectacles, magnifying glasses, and letter magnification; however, even after the use of these aids, we found that the subjects often needed to read the text more than once to understand it. During writing activities, all subjects reported the use of optical aids, while most stated that they did not use nonoptical aids for such activities.
Resumo:
In the last decades, the air traffic system has been changing to adapt itself to new social demands, mainly the safe growth of worldwide traffic capacity. Those changes are ruled by the Communication, Navigation, Surveillance/Air Traffic Management (CNS/ATM) paradigm, based on digital communication technologies (mainly satellites) as a way of improving communication, surveillance, navigation and air traffic management services. However, CNS/ATM poses new challenges and needs, mainly related to the safety assessment process. In face of these new challenges, and considering the main characteristics of the CNS/ATM, a methodology is proposed at this work by combining ""absolute"" and ""relative"" safety assessment methods adopted by the International Civil Aviation Organization (ICAO) in ICAO Doc.9689 [14], using Fluid Stochastic Petri Nets (FSPN) as the modeling formalism, and compares the safety metrics estimated from the simulation of both the proposed (in analysis) and the legacy system models. To demonstrate its usefulness, the proposed methodology was applied to the ""Automatic Dependent Surveillance-Broadcasting"" (ADS-B) based air traffic control system. As conclusions, the proposed methodology assured to assess CNS/ATM system safety properties, in which FSPN formalism provides important modeling capabilities, and discrete event simulation allowing the estimation of the desired safety metric. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Visual pigments, the molecules in photoreceptors that initiate the process of vision, are inherently dichroic, differentially absorbing light according to its axis of polarization. Many animals have taken advantage of this property to build receptor systems capable of analyzing the polarization of incoming light, as polarized light is abundant in natural scenes (commonly being produced by scattering or reflection). Such polarization sensitivity has long been associated with behavioral tasks like orientation or navigation. However, only recently have we become aware that it can be incorporated into a high-level visual perception akin to color vision, permitting segmentation of a viewed scene into regions that differ in their polarization. By analogy to color vision, we call this capacity polarization vision. It is apparently used for tasks like those that color vision specializes in: contrast enhancement, camouflage breaking, object recognition, and signal detection and discrimination. While color is very useful in terrestrial or shallow-water environments, it is an unreliable cue deeper in water due to the spectral modification of light as it travels through water of various depths or of varying optical quality. Here, polarization vision has special utility and consequently has evolved in numerous marine species, as well as at least one terrestrial animal. In this review, we consider recent findings concerning polarization vision and its significance in biological signaling.
Resumo:
Recent studies have demonstrated that spatial patterns of fMRI BOLD activity distribution over the brain may be used to classify different groups or mental states. These studies are based on the application of advanced pattern recognition approaches and multivariate statistical classifiers. Most published articles in this field are focused on improving the accuracy rates and many approaches have been proposed to accomplish this task. Nevertheless, a point inherent to most machine learning methods (and still relatively unexplored in neuroimaging) is how the discriminative information can be used to characterize groups and their differences. In this work, we introduce the Maximum Uncertainty Linear Discrimination Analysis (MLDA) and show how it can be applied to infer groups` patterns by discriminant hyperplane navigation. In addition, we show that it naturally defines a behavioral score, i.e., an index quantifying the distance between the states of a subject from predefined groups. We validate and illustrate this approach using a motor block design fMRI experiment data with 35 subjects. (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
The process of establishing long-range neuronal connections can be divided into at least three discrete steps. First, axons need to be stimulated to grow and this growth must be towards appropriate targets. Second, after arriving at their target, axons need to be directed to their topographically appropriate position and in some cases, such as in cortical structures, they must grow radially to reach the correct laminar layer Third, axons then arborize and form synaptic connections with only a defined subpopulation of potential post-synaptic partners. Attempts to understand these mechanisms in the visual system have been ongoing since pioneer studies in the 1940s highlighted the specificity of neuronal connections in the retino-tectal pathway. These classical systems-based approaches culminated in the 1990s with the discovery that Eph-ephrin repulsive interactions were involved in topographical mapping. In marked contrast, it was the cloning of the odorant receptor family that quickly led to a better understanding of axon targeting in the olfactory system. The last 10 years have seen the olfactory pathway rise in prominence as a model system for axon guidance. Once considered to be experimentally intractable, it is now providing a wealth of information on all aspects of axon guidance and targeting with implications not only for our understanding of these mechanisms in the olfactory system but also in other regions of the nervous system.
Resumo:
Nowadays despite improvements in usability and intuitiveness users have to adapt to the proposed systems to satisfy their needs. For instance, they must learn how to achieve tasks, how to interact with the system, and fulfill system's specifications. This paper proposes an approach to improve this situation enabling graphical user interface redefinition through virtualization and computer vision with the aim of increasing the system's usability. To achieve this goal the approach is based on enriched task models, virtualization and picture-driven computing.
Resumo:
Background: Precise needle puncture of renal calyces is a challenging and essential step for successful percutaneous nephrolithotomy. This work tests and evaluates, through a clinical trial, a real-time navigation system to plan and guide percutaneous kidney puncture. Methods: A novel system, entitled i3DPuncture, was developed to aid surgeons in establishing the desired puncture site and the best virtual puncture trajectory, by gathering and processing data from a tracked needle with optical passive markers. In order to navigate and superimpose the needle to a preoperative volume, the patient, 3D image data and tracker system were previously registered intraoperatively using seven points that were strategically chosen based on rigid bone structures and nearby kidney area. In addition, relevant anatomical structures for surgical navigation were automatically segmented using a multi-organ segmentation algorithm that clusters volumes based on statistical properties and minimum description length criterion. For each cluster, a rendering transfer function enhanced the visualization of different organs and surrounding tissues. Results: One puncture attempt was sufficient to achieve a successful kidney puncture. The puncture took 265 seconds, and 32 seconds were necessary to plan the puncture trajectory. The virtual puncture path was followed correctively until the needle tip reached the desired kidney calyceal. Conclusions: This new solution provided spatial information regarding the needle inside the body and the possibility to visualize surrounding organs. It may offer a promising and innovative solution for percutaneous punctures.
Resumo:
Introduction and Objectives. Laparoscopic surgery has undeniable advantages, such as reduced postoperative pain, smaller incisions, and faster recovery. However, to improve surgeons’ performance, ergonomic adaptations of the laparoscopic instruments and introduction of robotic technology are needed. The aim of this study was to ascertain the influence of a new hand-held robotic device for laparoscopy (HHRDL) and 3D vision on laparoscopic skills performance of 2 different groups, naïve and expert. Materials and Methods. Each participant performed 3 laparoscopic tasks—Peg transfer, Wire chaser, Knot—in 4 different ways. With random sequencing we assigned the execution order of the tasks based on the first type of visualization and laparoscopic instrument. Time to complete each laparoscopic task was recorded and analyzed with one-way analysis of variance. Results. Eleven experts and 15 naïve participants were included. Three-dimensional video helps the naïve group to get better performance in Peg transfer, Wire chaser 2 hands, and Knot; the new device improved the execution of all laparoscopic tasks (P < .05). For expert group, the 3D video system benefited them in Peg transfer and Wire chaser 1 hand, and the robotic device in Peg transfer, Wire chaser 1 hand, and Wire chaser 2 hands (P < .05). Conclusion. The HHRDL helps the execution of difficult laparoscopic tasks, such as Knot, in the naïve group. Three-dimensional vision makes the laparoscopic performance of the participants without laparoscopic experience easier, unlike those with experience in laparoscopic procedures.
Resumo:
In this work, we present a neural network (NN) based method designed for 3D rigid-body registration of FMRI time series, which relies on a limited number of Fourier coefficients of the images to be aligned. These coefficients, which are comprised in a small cubic neighborhood located at the first octant of a 3D Fourier space (including the DC component), are then fed into six NN during the learning stage. Each NN yields the estimates of a registration parameter. The proposed method was assessed for 3D rigid-body transformations, using DC neighborhoods of different sizes. The mean absolute registration errors are of approximately 0.030 mm in translations and 0.030 deg in rotations, for the typical motion amplitudes encountered in FMRI studies. The construction of the training set and the learning stage are fast requiring, respectively, 90 s and 1 to 12 s, depending on the number of input and hidden units of the NN. We believe that NN-based approaches to the problem of FMRI registration can be of great interest in the future. For instance, NN relying on limited K-space data (possibly in navigation echoes) can be a valid solution to the problem of prospective (in frame) FMRI registration.
Resumo:
Mestrado em Engenharia Electrotécnica e de Computadores