5 resultados para Multimodal retrieval

em DRUM (Digital Repository at the University of Maryland)


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of image retrieval and matching is to find and locate object instances in images from a large-scale image database. While visual features are abundant, how to combine them to improve performance by individual features remains a challenging task. In this work, we focus on leveraging multiple features for accurate and efficient image retrieval and matching. We first propose two graph-based approaches to rerank initially retrieved images for generic image retrieval. In the graph, vertices are images while edges are similarities between image pairs. Our first approach employs a mixture Markov model based on a random walk model on multiple graphs to fuse graphs. We introduce a probabilistic model to compute the importance of each feature for graph fusion under a naive Bayesian formulation, which requires statistics of similarities from a manually labeled dataset containing irrelevant images. To reduce human labeling, we further propose a fully unsupervised reranking algorithm based on a submodular objective function that can be efficiently optimized by greedy algorithm. By maximizing an information gain term over the graph, our submodular function favors a subset of database images that are similar to query images and resemble each other. The function also exploits the rank relationships of images from multiple ranked lists obtained by different features. We then study a more well-defined application, person re-identification, where the database contains labeled images of human bodies captured by multiple cameras. Re-identifications from multiple cameras are regarded as related tasks to exploit shared information. We apply a novel multi-task learning algorithm using both low level features and attributes. A low rank attribute embedding is joint learned within the multi-task learning formulation to embed original binary attributes to a continuous attribute space, where incorrect and incomplete attributes are rectified and recovered. To locate objects in images, we design an object detector based on object proposals and deep convolutional neural networks (CNN) in view of the emergence of deep networks. We improve a Fast RCNN framework and investigate two new strategies to detect objects accurately and efficiently: scale-dependent pooling (SDP) and cascaded rejection classifiers (CRC). The SDP improves detection accuracy by exploiting appropriate convolutional features depending on the scale of input object proposals. The CRC effectively utilizes convolutional features and greatly eliminates negative proposals in a cascaded manner, while maintaining a high recall for true objects. The two strategies together improve the detection accuracy and reduce the computational cost.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class­ based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state­ of ­the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state­ of­ the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Executing a cloud or aerosol physical properties retrieval algorithm from controlled synthetic data is an important step in retrieval algorithm development. Synthetic data can help answer questions about the sensitivity and performance of the algorithm or aid in determining how an existing retrieval algorithm may perform with a planned sensor. Synthetic data can also help in solving issues that may have surfaced in the retrieval results. Synthetic data become very important when other validation methods, such as field campaigns,are of limited scope. These tend to be of relatively short duration and often are costly. Ground stations have limited spatial coverage whilesynthetic data can cover large spatial and temporal scales and a wide variety of conditions at a low cost. In this work I develop an advanced cloud and aerosol retrieval simulator for the MODIS instrument, also known as Multi-sensor Cloud and Aerosol Retrieval Simulator (MCARS). In a close collaboration with the modeling community I have seamlessly combined the GEOS-5 global climate model with the DISORT radiative transfer code, widely used by the remote sensing community, with the observations from the MODIS instrument to create the simulator. With the MCARS simulator it was then possible to solve the long standing issue with the MODIS aerosol optical depth retrievals that had a low bias for smoke aerosols. MODIS aerosol retrieval did not account for effects of humidity on smoke aerosols. The MCARS simulator also revealed an issue that has not been recognized previously, namely,the value of fine mode fraction could create a linear dependence between retrieved aerosol optical depth and land surface reflectance. MCARS provided the ability to examine aerosol retrievals against “ground truth” for hundreds of thousands of simultaneous samples for an area covered by only three AERONET ground stations. Findings from MCARS are already being used to improve the performance of operational MODIS aerosol properties retrieval algorithms. The modeling community will use the MCARS data to create new parameterizations for aerosol properties as a function of properties of the atmospheric column and gain the ability to correct any assimilated retrieval data that may display similar dependencies in comparisons with ground measurements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compressed covariance sensing using quadratic samplers is gaining increasing interest in recent literature. Covariance matrix often plays the role of a sufficient statistic in many signal and information processing tasks. However, owing to the large dimension of the data, it may become necessary to obtain a compressed sketch of the high dimensional covariance matrix to reduce the associated storage and communication costs. Nested sampling has been proposed in the past as an efficient sub-Nyquist sampling strategy that enables perfect reconstruction of the autocorrelation sequence of Wide-Sense Stationary (WSS) signals, as though it was sampled at the Nyquist rate. The key idea behind nested sampling is to exploit properties of the difference set that naturally arises in quadratic measurement model associated with covariance compression. In this thesis, we will focus on developing novel versions of nested sampling for low rank Toeplitz covariance estimation, and phase retrieval, where the latter problem finds many applications in high resolution optical imaging, X-ray crystallography and molecular imaging. The problem of low rank compressive Toeplitz covariance estimation is first shown to be fundamentally related to that of line spectrum recovery. In absence if noise, this connection can be exploited to develop a particular kind of sampler called the Generalized Nested Sampler (GNS), that can achieve optimal compression rates. In presence of bounded noise, we develop a regularization-free algorithm that provably leads to stable recovery of the high dimensional Toeplitz matrix from its order-wise minimal sketch acquired using a GNS. Contrary to existing TV-norm and nuclear norm based reconstruction algorithms, our technique does not use any tuning parameters, which can be of great practical value. The idea of nested sampling idea also finds a surprising use in the problem of phase retrieval, which has been of great interest in recent times for its convex formulation via PhaseLift, By using another modified version of nested sampling, namely the Partial Nested Fourier Sampler (PNFS), we show that with probability one, it is possible to achieve a certain conjectured lower bound on the necessary measurement size. Moreover, for sparse data, an l1 minimization based algorithm is proposed that can lead to stable phase retrieval using order-wise minimal number of measurements.