991 resultados para Dictionary learning
Resumo:
Recent advances in computer vision and machine learning suggest that a wide range of problems can be addressed more appropriately by considering non-Euclidean geometry. In this paper we explore sparse dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping, which enables us to devise a closed-form solution for updating a Grassmann dictionary, atom by atom. Furthermore, to handle non-linearity in data, we propose a kernelised version of the dictionary learning algorithm. Experiments on several classification tasks (face recognition, action recognition, dynamic texture classification) show that the proposed approach achieves considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelised Affine Hull Method and graph-embedding Grassmann discriminant analysis.
Resumo:
This paper presents an effective feature representation method in the context of activity recognition. Efficient and effective feature representation plays a crucial role not only in activity recognition, but also in a wide range of applications such as motion analysis, tracking, 3D scene understanding etc. In the context of activity recognition, local features are increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational requirements, their performance is still limited for real world applications due to a lack of contextual information and models not being tailored to specific activities. We propose a new activity representation framework to address the shortcomings of the popular, but simple bag-of-words approach. In our framework, first multiple instance SVM (mi-SVM) is used to identify positive features for each action category and the k-means algorithm is used to generate a codebook. Then locality-constrained linear coding is used to encode the features into the generated codebook, followed by spatio-temporal pyramid pooling to convey the spatio-temporal statistics. Finally, an SVM is used to classify the videos. Experiments carried out on two popular datasets with varying complexity demonstrate significant performance improvement over the base-line bag-of-feature method.
Resumo:
In big data image/video analytics, we encounter the problem of learning an over-complete dictionary for sparse representation from a large training dataset, which cannot be processed at once because of storage and computational constraints. To tackle the problem of dictionary learning in such scenarios, we propose an algorithm that exploits the inherent clustered structure of the training data and make use of a divide-and-conquer approach. The fundamental idea behind the algorithm is to partition the training dataset into smaller clusters, and learn local dictionaries for each cluster. Subsequently, the local dictionaries are merged to form a global dictionary. Merging is done by solving another dictionary learning problem on the atoms of the locally trained dictionaries. This algorithm is referred to as the split-and-merge algorithm. We show that the proposed algorithm is efficient in its usage of memory and computational complexity, and performs on par with the standard learning strategy, which operates on the entire data at a time. As an application, we consider the problem of image denoising. We present a comparative analysis of our algorithm with the standard learning techniques that use the entire database at a time, in terms of training and denoising performance. We observe that the split-and-merge algorithm results in a remarkable reduction of training time, without significantly affecting the denoising performance.
Resumo:
Oversmoothing of speech parameter trajectories is one of the causes for quality degradation of HMM-based speech synthesis. Various methods have been proposed to overcome this effect, the most recent ones being global variance (GV) and modulation-spectrum-based post-filter (MSPF). However, there is still a significant quality gap between natural and synthesized speech. In this paper, we propose a two-fold post-filtering technique to alleviate to a certain extent the oversmoothing of spectral and excitation parameter trajectories of HMM-based speech synthesis. For the spectral parameters, we propose a sparse coding-based post-filter to match the trajectories of synthetic speech to that of natural speech, and for the excitation trajectory, we introduce a perceptually motivated post-filter. Experimental evaluations show quality improvement compared with existing methods.
Resumo:
We develop a new dictionary learning algorithm called the l(1)-K-svp, by minimizing the l(1) distortion on the data term. The proposed formulation corresponds to maximum a posteriori estimation assuming a Laplacian prior on the coefficient matrix and additive noise, and is, in general, robust to non-Gaussian noise. The l(1) distortion is minimized by employing the iteratively reweighted least-squares algorithm. The dictionary atoms and the corresponding sparse coefficients are simultaneously estimated in the dictionary update step. Experimental results show that l(1)-K-SVD results in noise-robustness, faster convergence, and higher atom recovery rate than the method of optimal directions, K-SVD, and the robust dictionary learning algorithm (RDL), in Gaussian as well as non-Gaussian noise. For a fixed value of sparsity, number of dictionary atoms, and data dimension, l(1)-K-SVD outperforms K-SVD and RDL on small training sets. We also consider the generalized l(p), 0 < p < 1, data metric to tackle heavy-tailed/impulsive noise. In an image denoising application, l(1)-K-SVD was found to result in higher peak signal-to-noise ratio (PSNR) over K-SVD for Laplacian noise. The structural similarity index increases by 0.1 for low input PSNR, which is significant and demonstrates the efficacy of the proposed method. (C) 2015 Elsevier B.V. All rights reserved.
Resumo:
Fingerprints are used for identification in forensics and are classified into Manual and Automatic. Automatic fingerprint identification system is classified into Latent and Exemplar. A novel Exemplar technique of Fingerprint Image Verification using Dictionary Learning (FIVDL) is proposed to improve the performance of low quality fingerprints, where Dictionary learning method reduces the time complexity by using block processing instead of pixel processing. The dynamic range of an image is adjusted by using Successive Mean Quantization Transform (SMQT) technique and the frequency domain noise is reduced using spectral frequency Histogram Equalization. Then, an adaptive nonlinear dynamic range adjustment technique is utilized to determine the local spectral features on corresponding fingerprint ridge frequency and orientation. The dictionary is constructed using spatial fundamental frequency that is determined from the spectral features. These dictionaries help in removing the spurious noise present in fingerprints and reduce the time complexity by using block processing instead of pixel processing. Further, dictionaries are used to reconstruct the image for matching. The proposed FIVDL is verified on FVC database sets and Experimental result shows an improvement over the state-of-the-art techniques. (C) 2015 The Authors. Published by Elsevier B.V.
Resumo:
A tree-based dictionary learning model is developed for joint analysis of imagery and associated text. The dictionary learning may be applied directly to the imagery from patches, or to general feature vectors extracted from patches or superpixels (using any existing method for image feature extraction). Each image is associated with a path through the tree (from root to a leaf), and each of the multiple patches in a given image is associated with one node in that path. Nodes near the tree root are shared between multiple paths, representing image characteristics that are common among different types of images. Moving toward the leaves, nodes become specialized, representing details in image classes. If available, words (text) are also jointly modeled, with a path-dependent probability over words. The tree structure is inferred via a nested Dirichlet process, and a retrospective stick-breaking sampler is used to infer the tree depth and width.
Resumo:
Traditional dictionary learning algorithms are used for finding a sparse representation on high dimensional data by transforming samples into a one-dimensional (1D) vector. This 1D model loses the inherent spatial structure property of data. An alternative solution is to employ Tensor Decomposition for dictionary learning on their original structural form —a tensor— by learning multiple dictionaries along each mode and the corresponding sparse representation in respect to the Kronecker product of these dictionaries. To learn tensor dictionaries along each mode, all the existing methods update each dictionary iteratively in an alternating manner. Because atoms from each mode dictionary jointly make contributions to the sparsity of tensor, existing works ignore atoms correlations between different mode dictionaries by treating each mode dictionary independently. In this paper, we propose a joint multiple dictionary learning method for tensor sparse coding, which explores atom correlations for sparse representation and updates multiple atoms from each mode dictionary simultaneously. In this algorithm, the Frequent-Pattern Tree (FP-tree) mining algorithm is employed to exploit frequent atom patterns in the sparse representation. Inspired by the idea of K-SVD, we develop a new dictionary update method that jointly updates elements in each pattern. Experimental results demonstrate our method outperforms other tensor based dictionary learning algorithms.
Resumo:
This PhD research has proposed new machine learning techniques to improve human action recognition based on local features. Several novel video representation and classification techniques have been proposed to increase the performance with lower computational complexity. The major contributions are the construction of new feature representation techniques, based on advanced machine learning techniques such as multiple instance dictionary learning, Latent Dirichlet Allocation (LDA) and Sparse coding. A Binary-tree based classification technique was also proposed to deal with large amounts of action categories. These techniques are not only improving the classification accuracy with constrained computational resources but are also robust to challenging environmental conditions. These developed techniques can be easily extended to a wide range of video applications to provide near real-time performance.
Resumo:
In this paper, we have proposed an anomaly detection algorithm based on Histogram of Oriented Motion Vectors (HOMV) 1] in sparse representation framework. Usual behavior is learned at each location by sparsely representing the HOMVs over learnt normal feature bases obtained using an online dictionary learning algorithm. In the end, anomaly is detected based on the likelihood of the occurrence of sparse coefficients at that location. The proposed approach is found to be robust compared to existing methods as demonstrated in the experiments on UCSD Ped1 and UCSD Ped2 datasets.
Resumo:
Internet ha rivoluzionato il modo di comunicare degli individui. Siamo testimoni della nascita e dello sviluppo di un'era caratterizzata dalla disponibilità di informazione libera e accessibile a tutti. Negli ultimi anni grazie alla diffusione di smartphone, tablet e altre tipologie di dispositivi connessi, è cambiato il fulcro dell'innovazione spostandosi dalle persone agli oggetti. E' così che nasce il concetto di Internet of Things, termine usato per descrivere la rete di comunicazione creata tra i diversi dispositivi connessi ad Internet e capaci di interagire in autonomia. Gli ambiti applicativi dell'Internet of Things spaziano dalla domotica alla sanità, dall'environmental monitoring al concetto di smart cities e così via. L'obiettivo principale di tale disciplina è quello di migliorare la vita delle persone grazie a sistemi che siano in grado di interagire senza aver bisogno dell'intervento dell'essere umano. Proprio per la natura eterogenea della disciplina e in relazione ai diversi ambiti applicativi, nell'Internet of Things si può incorrere in problemi derivanti dalla presenza di tecnologie differenti o di modalità eterogenee di memorizzazione dei dati. A questo proposito viene introdotto il concetto di Internet of Things collaborativo, termine che indica l'obiettivo di realizzare applicazioni che possano garantire interoperabilità tra i diversi ecosistemi e tra le diverse fonti da cui l'Internet of Things attinge, sfruttando la presenza di piattaforme di pubblicazione di Open Data. L'obiettivo di questa tesi è stato quello di creare un sistema per l'aggregazione di dati da due piattaforme, ThingSpeak e Sparkfun, con lo scopo di unificarli in un unico database ed estrarre informazioni significative dai dati tramite due tecniche di Data Mining: il Dictionary Learning e l'Affinity Propagation. Vengono illustrate le due metodologie che rientrano rispettivamente tra le tecniche di classificazione e di clustering.
Resumo:
lmage super-resolution is defined as a class of techniques that enhance the spatial resolution of images. Super-resolution methods can be subdivided in single and multi image methods. This thesis focuses on developing algorithms based on mathematical theories for single image super resolution problems. lndeed, in arder to estimate an output image, we adopta mixed approach: i.e., we use both a dictionary of patches with sparsity constraints (typical of learning-based methods) and regularization terms (typical of reconstruction-based methods). Although the existing methods already per- form well, they do not take into account the geometry of the data to: regularize the solution, cluster data samples (samples are often clustered using algorithms with the Euclidean distance as a dissimilarity metric), learn dictionaries (they are often learned using PCA or K-SVD). Thus, state-of-the-art methods still suffer from shortcomings. In this work, we proposed three new methods to overcome these deficiencies. First, we developed SE-ASDS (a structure tensor based regularization term) in arder to improve the sharpness of edges. SE-ASDS achieves much better results than many state-of-the- art algorithms. Then, we proposed AGNN and GOC algorithms for determining a local subset of training samples from which a good local model can be computed for recon- structing a given input test sample, where we take into account the underlying geometry of the data. AGNN and GOC methods outperform spectral clustering, soft clustering, and geodesic distance based subset selection in most settings. Next, we proposed aSOB strategy which takes into account the geometry of the data and the dictionary size. The aSOB strategy outperforms both PCA and PGA methods. Finally, we combine all our methods in a unique algorithm, named G2SR. Our proposed G2SR algorithm shows better visual and quantitative results when compared to the results of state-of-the-art methods.
Resumo:
Nanotechnology has revolutionised humanity's capability in building microscopic systems by manipulating materials on a molecular and atomic scale. Nan-osystems are becoming increasingly smaller and more complex from the chemical perspective which increases the demand for microscopic characterisation techniques. Among others, transmission electron microscopy (TEM) is an indispensable tool that is increasingly used to study the structures of nanosystems down to the molecular and atomic scale. However, despite the effectivity of this tool, it can only provide 2-dimensional projection (shadow) images of the 3D structure, leaving the 3-dimensional information hidden which can lead to incomplete or erroneous characterization. One very promising inspection method is Electron Tomography (ET), which is rapidly becoming an important tool to explore the 3D nano-world. ET provides (sub-)nanometer resolution in all three dimensions of the sample under investigation. However, the fidelity of the ET tomogram that is achieved by current ET reconstruction procedures remains a major challenge. This thesis addresses the assessment and advancement of electron tomographic methods to enable high-fidelity three-dimensional investigations. A quality assessment investigation was conducted to provide a quality quantitative analysis of the main established ET reconstruction algorithms and to study the influence of the experimental conditions on the quality of the reconstructed ET tomogram. Regular shaped nanoparticles were used as a ground-truth for this study. It is concluded that the fidelity of the post-reconstruction quantitative analysis and segmentation is limited, mainly by the fidelity of the reconstructed ET tomogram. This motivates the development of an improved tomographic reconstruction process. In this thesis, a novel ET method was proposed, named dictionary learning electron tomography (DLET). DLET is based on the recent mathematical theorem of compressed sensing (CS) which employs the sparsity of ET tomograms to enable accurate reconstruction from undersampled (S)TEM tilt series. DLET learns the sparsifying transform (dictionary) in an adaptive way and reconstructs the tomogram simultaneously from highly undersampled tilt series. In this method, the sparsity is applied on overlapping image patches favouring local structures. Furthermore, the dictionary is adapted to the specific tomogram instance, thereby favouring better sparsity and consequently higher quality reconstructions. The reconstruction algorithm is based on an alternating procedure that learns the sparsifying dictionary and employs it to remove artifacts and noise in one step, and then restores the tomogram data in the other step. Simulation and real ET experiments of several morphologies are performed with a variety of setups. Reconstruction results validate its efficiency in both noiseless and noisy cases and show that it yields an improved reconstruction quality with fast convergence. The proposed method enables the recovery of high-fidelity information without the need to worry about what sparsifying transform to select or whether the images used strictly follow the pre-conditions of a certain transform (e.g. strictly piecewise constant for Total Variation minimisation). This can also avoid artifacts that can be introduced by specific sparsifying transforms (e.g. the staircase artifacts the may result when using Total Variation minimisation). Moreover, this thesis shows how reliable elementally sensitive tomography using EELS is possible with the aid of both appropriate use of Dual electron energy loss spectroscopy (DualEELS) and the DLET compressed sensing algorithm to make the best use of the limited data volume and signal to noise inherent in core-loss electron energy loss spectroscopy (EELS) from nanoparticles of an industrially important material. Taken together, the results presented in this thesis demonstrates how high-fidelity ET reconstructions can be achieved using a compressed sensing approach.
Resumo:
Object recognition has long been a core problem in computer vision. To improve object spatial support and speed up object localization for object recognition, generating high-quality category-independent object proposals as the input for object recognition system has drawn attention recently. Given an image, we generate a limited number of high-quality and category-independent object proposals in advance and used as inputs for many computer vision tasks. We present an efficient dictionary-based model for image classification task. We further extend the work to a discriminative dictionary learning method for tensor sparse coding. In the first part, a multi-scale greedy-based object proposal generation approach is presented. Based on the multi-scale nature of objects in images, our approach is built on top of a hierarchical segmentation. We first identify the representative and diverse exemplar clusters within each scale. Object proposals are obtained by selecting a subset from the multi-scale segment pool via maximizing a submodular objective function, which consists of a weighted coverage term, a single-scale diversity term and a multi-scale reward term. The weighted coverage term forces the selected set of object proposals to be representative and compact; the single-scale diversity term encourages choosing segments from different exemplar clusters so that they will cover as many object patterns as possible; the multi-scale reward term encourages the selected proposals to be discriminative and selected from multiple layers generated by the hierarchical image segmentation. The experimental results on the Berkeley Segmentation Dataset and PASCAL VOC2012 segmentation dataset demonstrate the accuracy and efficiency of our object proposal model. Additionally, we validate our object proposals in simultaneous segmentation and detection and outperform the state-of-art performance. To classify the object in the image, we design a discriminative, structural low-rank framework for image classification. We use a supervised learning method to construct a discriminative and reconstructive dictionary. By introducing an ideal regularization term, we perform low-rank matrix recovery for contaminated training data from all categories simultaneously without losing structural information. A discriminative low-rank representation for images with respect to the constructed dictionary is obtained. With semantic structure information and strong identification capability, this representation is good for classification tasks even using a simple linear multi-classifier.
Resumo:
Cross domain and cross-modal matching has many applications in the field of computer vision and pattern recognition. A few examples are heterogeneous face recognition, cross view action recognition, etc. This is a very challenging task since the data in two domains can differ significantly. In this work, we propose a coupled dictionary and transformation learning approach that models the relationship between the data in both domains. The approach learns a pair of transformation matrices that map the data in the two domains in such a manner that they share common sparse representations with respect to their own dictionaries in the transformed space. The dictionaries for the two domains are learnt in a coupled manner with an additional discriminative term to ensure improved recognition performance. The dictionaries and the transformation matrices are jointly updated in an iterative manner. The applicability of the proposed approach is illustrated by evaluating its performance on different challenging tasks: face recognition across pose, illumination and resolution, heterogeneous face recognition and cross view action recognition. Extensive experiments on five datasets namely, CMU-PIE, Multi-PIE, ChokePoint, HFB and IXMAS datasets and comparisons with several state-of-the-art approaches show the effectiveness of the proposed approach. (C) 2015 Elsevier B.V. All rights reserved.