Biblioteca Digital

992 resultados para Query images

Rotation-discriminating template matching based on Fourier coefficients of radial projections with robustness to scaling and partial occlusion

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider brightness/contrast-invariant and rotation-discriminating template matching that searches an image to analyze A for a query image Q We propose to use the complex coefficients of the discrete Fourier transform of the radial projections to compute new rotation-invariant local features. These coefficients can be efficiently obtained via FFT. We classify templates in ""stable"" and ""unstable"" ones and argue that any local feature-based template matching may fail to find unstable templates. We extract several stable sub-templates of Q and find them in A by comparing the features. The matchings of the sub-templates are combined using the Hough transform. As the features of A are computed only once, the algorithm can find quickly many different sub-templates in A, and it is Suitable for finding many query images in A, multi-scale searching and partial occlusion-robust template matching. (C) 2009 Elsevier Ltd. All rights reserved.

Content Based Image Retrieval System for Malayalam Handwritten Characters

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Content Based Image Retrieval is one of the prominent areas in Computer Vision and Image Processing. Recognition of handwritten characters has been a popular area of research for many years and still remains an open problem. The proposed system uses visual image queries for retrieving similar images from database of Malayalam handwritten characters. Local Binary Pattern (LBP) descriptors of the query images are extracted and those features are compared with the features of the images in database for retrieving desired characters. This system with local binary pattern gives excellent retrieval performance

Fusion of fingerprint recognition methods for robust human identification

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Biometrics is one of the biggest tendencies in human identification. The fingerprint is the most widely used biometric. However considering the automatic fingerprint recognition a completely solved problem is a common mistake. The most popular and extensively used methods, the minutiae-based, do not perform well on poor-quality images and when just a small area of overlap between the template and the query images exists. The use of multibiometrics is considered one of the keys to overcome the weakness and improve the accuracy of biometrics systems. This paper presents the fusion of a minutiae-based and a ridge-based fingerprint recognition method at rank, decision and score level. The fusion techniques implemented leaded to a reduction of the Equal Error Rate by 31.78% (from 4.09% to 2.79%) and a decreasing of 6 positions in the rank to reach a Correct Retrieval (from rank 8 to 2) when assessed in the FVC2002-DB1A database. © 2008 IEEE.

Fusão de métodos baseados em minúcias e em cristas para reconhecimento de impressões digitais

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Leveraging Multiple Features for Image Retrieval and Matching

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The goal of image retrieval and matching is to find and locate object instances in images from a large-scale image database. While visual features are abundant, how to combine them to improve performance by individual features remains a challenging task. In this work, we focus on leveraging multiple features for accurate and efficient image retrieval and matching. We first propose two graph-based approaches to rerank initially retrieved images for generic image retrieval. In the graph, vertices are images while edges are similarities between image pairs. Our first approach employs a mixture Markov model based on a random walk model on multiple graphs to fuse graphs. We introduce a probabilistic model to compute the importance of each feature for graph fusion under a naive Bayesian formulation, which requires statistics of similarities from a manually labeled dataset containing irrelevant images. To reduce human labeling, we further propose a fully unsupervised reranking algorithm based on a submodular objective function that can be efficiently optimized by greedy algorithm. By maximizing an information gain term over the graph, our submodular function favors a subset of database images that are similar to query images and resemble each other. The function also exploits the rank relationships of images from multiple ranked lists obtained by different features. We then study a more well-defined application, person re-identification, where the database contains labeled images of human bodies captured by multiple cameras. Re-identifications from multiple cameras are regarded as related tasks to exploit shared information. We apply a novel multi-task learning algorithm using both low level features and attributes. A low rank attribute embedding is joint learned within the multi-task learning formulation to embed original binary attributes to a continuous attribute space, where incorrect and incomplete attributes are rectified and recovered. To locate objects in images, we design an object detector based on object proposals and deep convolutional neural networks (CNN) in view of the emergence of deep networks. We improve a Fast RCNN framework and investigate two new strategies to detect objects accurately and efficiently: scale-dependent pooling (SDP) and cascaded rejection classifiers (CRC). The SDP improves detection accuracy by exploiting appropriate convolutional features depending on the scale of input object proposals. The CRC effectively utilizes convolutional features and greatly eliminates negative proposals in a cascaded manner, while maintaining a high recall for true objects. The two strategies together improve the detection accuracy and reduce the computational cost.

Efficient Image and Video Representations for Retrieval

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state of the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state of the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.

Convolutional Neural Network Architectures for Template Matching

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Neural Networks customized and tested in this thesis (WaldoNet, FlowNet and PatchNet) are a first exploration and approach to the Template Matching task. The possibilities of extension are therefore many and some are proposed below. During my thesis, I have analyzed the functioning of the classical algorithms and adapted with deep learning algorithms. The features extracted from both the template and the query images resemble the keypoints of the SIFT algorithm. Then, instead of similarity function or keypoints matching, WaldoNet and PatchNet use the convolutional layer to compare the features, while FlowNet uses the correlational layer. In addition, I have identified the major challenges of the Template Matching task (affine/non-affine transformations, intensity changes...) and solved them with a careful design of the dataset.

Using the Number of Pores on Fingerprint Images to Detect Spoofing Attacks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract-Due to the growing use of biometric technologies inour modern society, spoofing attacks are becoming a seriousconcern. Many solutions have been proposed to detect the use offake "fingerprints" on an acquisition device. In this paper, wepropose to take advantage of intrinsic features of friction ridgeskin: pores. The aim of this study is to investigate the potential ofusing pores to detect spoofing attacks.Results show that the use of pores is a promising approach. Fourmajor observations were made: First, results confirmed that thereproduction of pores on fake "fingerprints" is possible. Second,the distribution of the total number of pores between fake andgenuine fingerprints cannot be discriminated. Third, thedifference in pore quantities between a query image and areference image (genuine or fake) can be used as a discriminatingfactor in a linear discriminant analysis. In our sample, theobserved error rates were as follows: 45.5% of false positive (thefake passed the test) and 3.8% of false negative (a genuine printhas been rejected). Finally, the performance is improved byusing the difference of pore quantity obtained between adistorted query fingerprint and a non-distorted referencefingerprint. By using this approach, the error rates improved to21.2% of false acceptation rate and 8.3% of false rejection rate.

A review of atlas-based segmentation for magnetic resonance brain images.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Normal and abnormal brains can be segmented by registering the target image with an atlas. Here, an atlas is defined as the combination of an intensity image (template) and its segmented image (the atlas labels). After registering the atlas template and the target image, the atlas labels are propagated to the target image. We define this process as atlas-based segmentation. In recent years, researchers have investigated registration algorithms to match atlases to query subjects and also strategies for atlas construction. In this paper we present a review of the automated approaches for atlas-based segmentation of magnetic resonance brain images. We aim to point out the strengths and weaknesses of atlas-based methods and suggest new research directions. We use two different criteria to present the methods. First, we refer to the algorithms according to their atlas-based strategy: label propagation, multi-atlas methods, and probabilistic techniques. Subsequently, we classify the methods according to their medical target: the brain and its internal structures, tissue segmentation in healthy subjects, tissue segmentation in fetus, neonates and elderly subjects, and segmentation of damaged brains. A quantitative comparison of the results reported in the literature is also presented.

Learning 3D structure from 2D images using LBP features

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An automatic machine learning strategy for computing the 3D structure of monocular images from a single image query using Local Binary Patterns is presented. The 3D structure is inferred through a training set composed by a repository of color and depth images, assuming that images with similar structure present similar depth maps. Local Binary Patterns are used to characterize the structure of the color images. The depth maps of those color images with a similar structure to the query image are adaptively combined and filtered to estimate the final depth map. Using public databases, promising results have been obtained outperforming other state-of-the-art algorithms and with a computational cost similar to the most efficient 2D-to-3D algorithms.

Context Models for Understanding Images and Videos

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A computer vision system that has to interact in natural language needs to understand the visual appearance of interactions between objects along with the appearance of objects themselves. Relationships between objects are frequently mentioned in queries of tasks like semantic image retrieval, image captioning, visual question answering and natural language object detection. Hence, it is essential to model context between objects for solving these tasks. In the first part of this thesis, we present a technique for detecting an object mentioned in a natural language query. Specifically, we work with referring expressions which are sentences that identify a particular object instance in an image. In many referring expressions, an object is described in relation to another object using prepositions, comparative adjectives, action verbs etc. Our proposed technique can identify both the referred object and the context object mentioned in such expressions. Context is also useful for incrementally understanding scenes and videos. In the second part of this thesis, we propose techniques for searching for objects in an image and events in a video. Our proposed incremental algorithms use the context from previously explored regions to prioritize the regions to explore next. The advantage of incremental understanding is restricting the amount of computation time and/or resources spent for various detection tasks. Our first proposed technique shows how to learn context in indoor scenes in an implicit manner and use it for searching for objects. The second technique shows how explicitly written context rules of one-on-one basketball can be used to sequentially detect events in a game.

Image summarisation: human action description from static images

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertação de Mestrado, Processamento de Linguagem Natural e Indústrias da Língua, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014

Advancing Bag-of-visual-words Representations For Lesion Classification In Retinal Images.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Diabetic Retinopathy (DR) is a complication of diabetes that can lead to blindness if not readily discovered. Automated screening algorithms have the potential to improve identification of patients who need further medical attention. However, the identification of lesions must be accurate to be useful for clinical application. The bag-of-visual-words (BoVW) algorithm employs a maximum-margin classifier in a flexible framework that is able to detect the most common DR-related lesions such as microaneurysms, cotton-wool spots and hard exudates. BoVW allows to bypass the need for pre- and post-processing of the retinographic images, as well as the need of specific ad hoc techniques for identification of each type of lesion. An extensive evaluation of the BoVW model, using three large retinograph datasets (DR1, DR2 and Messidor) with different resolution and collected by different healthcare personnel, was performed. The results demonstrate that the BoVW classification approach can identify different lesions within an image without having to utilize different algorithms for each lesion reducing processing time and providing a more flexible diagnostic system. Our BoVW scheme is based on sparse low-level feature detection with a Speeded-Up Robust Features (SURF) local descriptor, and mid-level features based on semi-soft coding with max pooling. The best BoVW representation for retinal image classification was an area under the receiver operating characteristic curve (AUC-ROC) of 97.8% (exudates) and 93.5% (red lesions), applying a cross-dataset validation protocol. To assess the accuracy for detecting cases that require referral within one year, the sparse extraction technique associated with semi-soft coding and max pooling obtained an AUC of 94.2 ± 2.0%, outperforming current methods. Those results indicate that, for retinal image classification tasks in clinical practice, BoVW is equal and, in some instances, surpasses results obtained using dense detection (widely believed to be the best choice in many vision problems) for the low-level descriptors.

The Biopsychosocial Spiritual Model Applied To The Treatment Of women With Breast Cancer, Through Rime Intervention (relaxation, mental Images, Spirituality).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This postdoctoral study on the application of the RIME intervention in women that had undergone mastectomy and were in treatment, aimed to promote psychospiritual and social transformations to improve the quality of life, self-esteem and hope. A total of 28 women participated and were randomized into two groups. Brief Psychotherapy (PB) (average of six sessions) was administered in the Control Group, and RIME (three sessions) and BP (average of five sessions) were applied in the RIME Group. The quantitative results indicated a significant improvement (38.3%) in the Perception of Quality of Life after RIME according to the WHOQOL, compared both to the BP of the Control Group (12.5%), and the BP of the RIME Group (16.2%). There was a significant improvement in Self-esteem (Rosenberg) after RIME (14.6%) compared to the BP of the Control Group (worsened 35.9%), and the BP of the RIME Group (8.3%). The improvement in well-being, considering the focus worked on (Visual Analog Scale), was significant in the RIME Group (bad to good), as well as in the Control Group (unpleasant to good). The qualitative results indicated that RIME promotes creative transformations in the intrapsychic and interpersonal dimensions, so that new meanings and/or new attitudes emerge into the consciousness. It was observed that RIME has more strength of psychic structure, ego strengthening and provides a faster transformation that BP, therefore it can be indicated for crisis treatment in the hospital environment.

Vertical Bone Measurements From Cone Beam Computed Tomography Images Using Different Software Packages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article aimed at comparing the accuracy of linear measurement tools of different commercial software packages. Eight fully edentulous dry mandibles were selected for this study. Incisor, canine, premolar, first molar and second molar regions were selected. Cone beam computed tomography (CBCT) images were obtained with i-CAT Next Generation. Linear bone measurements were performed by one observer on the cross-sectional images using three different software packages: XoranCat®, OnDemand3D® and KDIS3D®, all able to assess DICOM images. In addition, 25% of the sample was reevaluated for the purpose of reproducibility. The mandibles were sectioned to obtain the gold standard for each region. Intraclass coefficients (ICC) were calculated to examine the agreement between the two periods of evaluation; the one-way analysis of variance performed with the post-hoc Dunnett test was used to compare each of the software-derived measurements with the gold standard. The ICC values were excellent for all software packages. The least difference between the software-derived measurements and the gold standard was obtained with the OnDemand3D and KDIS3D (-0.11 and -0.14 mm, respectively), and the greatest, with the XoranCAT (+0.25 mm). However, there was no statistical significant difference between the measurements obtained with the different software packages and the gold standard (p> 0.05). In conclusion, linear bone measurements were not influenced by the software package used to reconstruct the image from CBCT DICOM data.

«
1
2
3
4
5
6
7
8
...
66
67
»