842 resultados para Image Based Visual Servoing
Resumo:
Sparse representation based visual tracking approaches have attracted increasing interests in the community in recent years. The main idea is to linearly represent each target candidate using a set of target and trivial templates while imposing a sparsity constraint onto the representation coefficients. After we obtain the coefficients using L1-norm minimization methods, the candidate with the lowest error, when it is reconstructed using only the target templates and the associated coefficients, is considered as the tracking result. In spite of promising system performance widely reported, it is unclear if the performance of these trackers can be maximised. In addition, computational complexity caused by the dimensionality of the feature space limits these algorithms in real-time applications. In this paper, we propose a real-time visual tracking method based on structurally random projection and weighted least squares techniques. In particular, to enhance the discriminative capability of the tracker, we introduce background templates to the linear representation framework. To handle appearance variations over time, we relax the sparsity constraint using a weighed least squares (WLS) method to obtain the representation coefficients. To further reduce the computational complexity, structurally random projection is used to reduce the dimensionality of the feature space while preserving the pairwise distances between the data points in the feature space. Experimental results show that the proposed approach outperforms several state-of-the-art tracking methods.
Resumo:
Dans le domaine des neurosciences computationnelles, l'hypothèse a été émise que le système visuel, depuis la rétine et jusqu'au cortex visuel primaire au moins, ajuste continuellement un modèle probabiliste avec des variables latentes, à son flux de perceptions. Ni le modèle exact, ni la méthode exacte utilisée pour l'ajustement ne sont connus, mais les algorithmes existants qui permettent l'ajustement de tels modèles ont besoin de faire une estimation conditionnelle des variables latentes. Cela nous peut nous aider à comprendre pourquoi le système visuel pourrait ajuster un tel modèle; si le modèle est approprié, ces estimé conditionnels peuvent aussi former une excellente représentation, qui permettent d'analyser le contenu sémantique des images perçues. Le travail présenté ici utilise la performance en classification d'images (discrimination entre des types d'objets communs) comme base pour comparer des modèles du système visuel, et des algorithmes pour ajuster ces modèles (vus comme des densités de probabilité) à des images. Cette thèse (a) montre que des modèles basés sur les cellules complexes de l'aire visuelle V1 généralisent mieux à partir d'exemples d'entraînement étiquetés que les réseaux de neurones conventionnels, dont les unités cachées sont plus semblables aux cellules simples de V1; (b) présente une nouvelle interprétation des modèles du système visuels basés sur des cellules complexes, comme distributions de probabilités, ainsi que de nouveaux algorithmes pour les ajuster à des données; et (c) montre que ces modèles forment des représentations qui sont meilleures pour la classification d'images, après avoir été entraînés comme des modèles de probabilités. Deux innovations techniques additionnelles, qui ont rendu ce travail possible, sont également décrites : un algorithme de recherche aléatoire pour sélectionner des hyper-paramètres, et un compilateur pour des expressions mathématiques matricielles, qui peut optimiser ces expressions pour processeur central (CPU) et graphique (GPU).
Resumo:
L’objectif de cette recherche est la création d’une plateforme en ligne qui permettrait d’examiner les différences individuelles de stratégies de traitement de l’information visuelle dans différentes tâches de catégorisation des visages. Le but d’une telle plateforme est de récolter des données de participants géographiquement dispersés et dont les habiletés en reconnaissance des visages sont variables. En effet, de nombreuses études ont montré qu’il existe de grande variabilité dans le spectre des habiletés à reconnaître les visages, allant de la prosopagnosie développementale (Susilo & Duchaine, 2013), un trouble de reconnaissance des visages en l’absence de lésion cérébrale, aux super-recognizers, des individus dont les habiletés en reconnaissance des visages sont au-dessus de la moyenne (Russell, Duchaine & Nakayama, 2009). Entre ces deux extrêmes, les habiletés en reconnaissance des visages dans la population normale varient. Afin de démontrer la faisabilité de la création d’une telle plateforme pour des individus d’habiletés très variables, nous avons adapté une tâche de reconnaissance de l’identité des visages de célébrités utilisant la méthode Bubbles (Gosselin & Schyns, 2001) et avons recruté 14 sujets contrôles et un sujet présentant une prosopagnosie développementale. Nous avons pu mettre en évidence l’importance des yeux et de la bouche dans l’identification des visages chez les sujets « normaux ». Les meilleurs participants semblent, au contraire, utiliser majoritairement le côté gauche du visage (l’œil gauche et le côté gauche de la bouche).
Resumo:
Introducción: Varias características pueden afectar el pronóstico visual después de resolver quirúrgicamente el desprendimiento de retina. Existen características no observables por el ojo humano por si solo pero si por tomografía óptica coherente que se relacionan con la recuperación visual. Objetivo: Describir las características clínicas y topográfica en los periodos pre y postquirúrgico de ojos que ha sufrido DR regmatógeno con compromiso macular y su relación con la calidad de recuperación visual después de una cirugía considerada exitosa desde el punto de vista anatómico. Materiales y métodos: Estudio descriptivo en el que se comparan algunas características en tres periodos perioeperatorios, uno antes y dos después de cirugía (3 y 6 meses) de 24 ojos con DRregmatógeno y compromiso macular intervenidos mediante retinopexia combinada con vitrectomía pars plana. Resultados: La recuperación visual mejor o igual que logMAR 0,397 (20/50) se dió en 41,7% de ojos y 16,7%. alcanzaron agudeza visual logMAR 0,301 (20/40). Cinco ojos no alcanzaron una ganancia de líneas de visión mayor a cinco. El líquido submacular ausente se observó en la mayoría de ojos que recuperaron más de cinco líneas al igual que aquellos con elipsoide conservado. La regularidad del neuroepitelio y el edema en el periodo posquirúrgico no mostraron comportamientos claros respecto a recuperación visual al igual que la altura del desprendimiento y el número de cuadrantes afectados. Una mejor recuperación visual fue más frecuente en aquellos con menos de cinco semanas de desprendimiento de retina. Conclusiones: El retraso menor a cinco semanas en la resolución del desprendimiento de retina, la conservación del elipsoide y la ausencia de líquido submacular en el periodo postquirúrgico se observó más frecuentemente en ojos con mejor recuperación visual.
Resumo:
A novel framework for multimodal semantic-associative collateral image labelling, aiming at associating image regions with textual keywords, is described. Both the primary image and collateral textual modalities are exploited in a cooperative and complementary fashion. The collateral content and context based knowledge is used to bias the mapping from the low-level region-based visual primitives to the high-level visual concepts defined in a visual vocabulary. We introduce the notion of collateral context, which is represented as a co-occurrence matrix, of the visual keywords, A collaborative mapping scheme is devised using statistical methods like Gaussian distribution or Euclidean distance together with collateral content and context-driven inference mechanism. Finally, we use Self Organising Maps to examine the classification and retrieval effectiveness of the proposed high-level image feature vector model which is constructed based on the image labelling results.
Resumo:
A large volume of visual content is inaccessible until effective and efficient indexing and retrieval of such data is achieved. In this paper, we introduce the DREAM system, which is a knowledge-assisted semantic-driven context-aware visual information retrieval system applied in the film post production domain. We mainly focus on the automatic labelling and topic map related aspects of the framework. The use of the context- related collateral knowledge, represented by a novel probabilistic based visual keyword co-occurrence matrix, had been proven effective via the experiments conducted during system evaluation. The automatically generated semantic labels were fed into the Topic Map Engine which can automatically construct ontological networks using Topic Maps technology, which dramatically enhances the indexing and retrieval performance of the system towards an even higher semantic level.
Resumo:
In this work, signal processing techniques are used to improve the quality of image based on multi-element synthetic aperture techniques. Using several apodization functions to obtain different side lobes distribution, a polarity function and a threshold criterium are used to develop an image compounding technique. The spatial diversity is increased using an additional array, which generates complementary information about the defects, improving the results of the proposed algorithm and producing high resolution and contrast images. The inspection of isotropic plate-like structures using linear arrays and Lamb waves is presented. Experimental results are shown for a 1-mm-thick isotropic aluminum plate with artificial defects using linear arrays formed by 30 piezoelectric elements, with the low dispersion symmetric mode S0 at the frequency of 330 kHz. © 2011 American Institute of Physics.
Resumo:
Este trabalho apresenta os resultados do reconhecimento e mapeamento dos ambientes costeiros da região do Golfão Maranhense, Brasil, utilizando uma abordagem metodológica que incluiu: (a) análise integrada com base no processamento digital de imagens, ópticas Landsat-4 TM e SPOT-2 HRV, de imagens SAR (Synthetic Aperture Radar) do RADARSAT-1, e dados de elevação da SRTM (Shuttle Radar Topography Mission); (b) sistema de informações geográficas; e (c) levantamentos de campo relativos à geomorfologia, topografia e sedimentologia. Os ambientes costeiros, assim mapeados foram agrupados em quatro setores: Setor 1, com pântanos salinos, pântanos de água doce, lagos intermitentes e canal estuarino; Setor 2, abrangendo tabuleiro costeiro, planície de maré lamosa, planície fluvial, planície de maré arenosa, praias de macromaré, área construída e lagos artificiais; Setor 3, com manguezal, paleodunas e planície de maré mista; e Setor 4, constituído por dunas móveis. Além disso, foram também reconhecidos lagos perenes, deltas de maré vazante e planícies de supramaré arenosas. O processamento digital e a análise visual das imagens de sensores remotos orbitais, associados ao uso de sistemas de informações geográficas, mostraram-se eficazes no mapeamento de zonas costeiras tropicais, possibilitando a geração de produtos com boa acurácia e precisão cartográfica.
Resumo:
Trabajo realizado por: Garijo, J. C., Hernández León, S.
Resumo:
The research aims at developing a framework for semantic-based digital survey of architectural heritage. Rooted in knowledge-based modeling which extracts mathematical constraints of geometry from architectural treatises, as-built information of architecture obtained from image-based modeling is integrated with the ideal model in BIM platform. The knowledge-based modeling transforms the geometry and parametric relation of architectural components from 2D printings to 3D digital models, and create large amount variations based on shape grammar in real time thanks to parametric modeling. It also provides prior knowledge for semantically segmenting unorganized survey data. The emergence of SfM (Structure from Motion) provides access to reconstruct large complex architectural scenes with high flexibility, low cost and full automation, but low reliability of metric accuracy. We solve this problem by combing photogrammetric approaches which consists of camera configuration, image enhancement, and bundle adjustment, etc. Experiments show the accuracy of image-based modeling following our workflow is comparable to that from range-based modeling. We also demonstrate positive results of our optimized approach in digital reconstruction of portico where low-texture-vault and dramatical transition of illumination bring huge difficulties in the workflow without optimization. Once the as-built model is obtained, it is integrated with the ideal model in BIM platform which allows multiple data enrichment. In spite of its promising prospect in AEC industry, BIM is developed with limited consideration of reverse-engineering from survey data. Besides representing the architectural heritage in parallel ways (ideal model and as-built model) and comparing their difference, we concern how to create as-built model in BIM software which is still an open area to be addressed. The research is supposed to be fundamental for research of architectural history, documentation and conservation of architectural heritage, and renovation of existing buildings.
Resumo:
Accurate placement of lesions is crucial for the effectiveness and safety of a retinal laser photocoagulation treatment. Computer assistance provides the capability for improvements to treatment accuracy and execution time. The idea is to use video frames acquired from a scanning digital ophthalmoscope (SDO) to compensate for retinal motion during laser treatment. This paper presents a method for the multimodal registration of the initial frame from an SDO retinal video sequence to a retinal composite image, which may contain a treatment plan. The retinal registration procedure comprises the following steps: 1) detection of vessel centerline points and identification of the optic disc; 2) prealignment of the video frame and the composite image based on optic disc parameters; and 3) iterative matching of the detected vessel centerline points in expanding matching regions. This registration algorithm was designed for the initialization of a real-time registration procedure that registers the subsequent video frames to the composite image. The algorithm demonstrated its capability to register various pairs of SDO video frames and composite images acquired from patients.
Resumo:
Image-based modeling of tumor growth combines methods from cancer simulation and medical imaging. In this context, we present a novel approach to adapt a healthy brain atlas to MR images of tumor patients. In order to establish correspondence between a healthy atlas and a pathologic patient image, tumor growth modeling in combination with registration algorithms is employed. In a first step, the tumor is grown in the atlas based on a new multi-scale, multi-physics model including growth simulation from the cellular level up to the biomechanical level, accounting for cell proliferation and tissue deformations. Large-scale deformations are handled with an Eulerian approach for finite element computations, which can operate directly on the image voxel mesh. Subsequently, dense correspondence between the modified atlas and patient image is established using nonrigid registration. The method offers opportunities in atlasbased segmentation of tumor-bearing brain images as well as for improved patient-specific simulation and prognosis of tumor progression.
Resumo:
Percutaneous needle intervention based on PET/CT images is effective, but exposes the patient to unnecessary radiation due to the increased number of CT scans required. Computer assisted intervention can reduce the number of scans, but requires handling, matching and visualization of two different datasets. While one dataset is used for target definition according to metabolism, the other is used for instrument guidance according to anatomical structures. No navigation systems capable of handling such data and performing PET/CT image-based procedures while following clinically approved protocols for oncologic percutaneous interventions are available. The need for such systems is emphasized in scenarios where the target can be located in different types of tissue such as bone and soft tissue. These two tissues require different clinical protocols for puncturing and may therefore give rise to different problems during the navigated intervention. Studies comparing the performance of navigated needle interventions targeting lesions located in these two types of tissue are not often found in the literature. Hence, this paper presents an optical navigation system for percutaneous needle interventions based on PET/CT images. The system provides viewers for guiding the physician to the target with real-time visualization of PET/CT datasets, and is able to handle targets located in both bone and soft tissue. The navigation system and the required clinical workflow were designed taking into consideration clinical protocols and requirements, and the system is thus operable by a single person, even during transition to the sterile phase. Both the system and the workflow were evaluated in an initial set of experiments simulating 41 lesions (23 located in bone tissue and 18 in soft tissue) in swine cadavers. We also measured and decomposed the overall system error into distinct error sources, which allowed for the identification of particularities involved in the process as well as highlighting the differences between bone and soft tissue punctures. An overall average error of 4.23 mm and 3.07 mm for bone and soft tissue punctures, respectively, demonstrated the feasibility of using this system for such interventions. The proposed system workflow was shown to be effective in separating the preparation from the sterile phase, as well as in keeping the system manageable by a single operator. Among the distinct sources of error, the user error based on the system accuracy (defined as the distance from the planned target to the actual needle tip) appeared to be the most significant. Bone punctures showed higher user error, whereas soft tissue punctures showed higher tissue deformation error.
Resumo:
We present an image-based method for relighting a scene by analytically fitting cosine lobes to the reflectance function at each pixel, based on gradient illumination photographs. Realistic relighting results for many materials are obtained using a single per-pixel cosine lobe obtained from just two color photographs: one under uniform white illumination and the other under colored gradient illumination. For materials with wavelength-dependent scattering, a better fit can be obtained using independent cosine lobes for the red, green, and blue channels, obtained from three achromatic gradient illumination conditions instead of the colored gradient condition. We explore two cosine lobe reflectance functions, both of which allow an analytic fit to the gradient conditions. One is non-zero over half the sphere of lighting directions, which works well for diffuse and specular materials, but fails for materials with broader scattering such as fur. The other is non-zero everywhere, which works well for broadly scattering materials and still produces visually plausible results for diffuse and specular materials. We also perform an approximate diffuse/specular separation of the reflectance, and estimate scene geometry from the recovered photometric normals to produce hard shadows cast by the geometry, while still reconstructing the input photographs exactly.
Resumo:
We describe a user assisted technique for 3D stereo conversion from 2D images. Our approach exploits the geometric structure of perspective images including vanishing points. We allow a user to indicate lines, planes, and vanishing points in the input image, and directly employ these as constraints in an image warping framework to produce a stereo pair. By sidestepping explicit construction of a depth map, our approach is applicable to more general scenes and avoids potential artifacts of depth-image-based rendering. Our method is most suitable for scenes with large scale structures such as buildings.