21 resultados para Feature space

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is well recognized in dimensionality reduction, and kernel PCA (KPCA) has also been proposed in statistical data analysis. However, KPCA fails to detect the nonlinear structure of data well when outliers exist. To reduce this problem, this paper presents a novel algorithm, named iterative robust KPCA (IRKPCA). IRKPCA works well in dealing with outliers, and can be carried out in an iterative manner, which makes it suitable to process incremental input data. As in the traditional robust PCA (RPCA), a binary field is employed for characterizing the outlier process, and the optimization problem is formulated as maximizing marginal distribution of a Gibbs distribution. In this paper, this optimization problem is solved by stochastic gradient descent techniques. In IRKPCA, the outlier process is in a high-dimensional feature space, and therefore kernel trick is used. IRKPCA can be regarded as a kernelized version of RPCA and a robust form of kernel Hebbian algorithm. Experimental results on synthetic data demonstrate the effectiveness of IRKPCA. © 2010 Taylor & Francis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper surveys the context of feature extraction by neural network approaches, and compares and contrasts their behaviour as prospective data visualisation tools in a real world problem. We also introduce and discuss a hybrid approach which allows us to control the degree of discriminatory and topographic information in the extracted feature space.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Magnification factors specify the extent to which the area of a small patch of the latent (or `feature') space of a topographic mapping is magnified on projection to the data space, and are of considerable interest in both neuro-biological and data analysis contexts. Previous attempts to consider magnification factors for the self-organizing map (SOM) algorithm have been hindered because the mapping is only defined at discrete points (given by the reference vectors). In this paper we consider the batch version of SOM, for which a continuous mapping can be defined, as well as the Generative Topographic Mapping (GTM) algorithm of Bishop et al. (1997) which has been introduced as a probabilistic formulation of the SOM. We show how the techniques of differential geometry can be used to determine magnification factors as continuous functions of the latent space coordinates. The results are illustrated here using a problem involving the identification of crab species from morphological data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Using methods of Statistical Physics, we investigate the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks. For nonlinear classification rules, the generalization error saturates on a plateau, when the number of examples is too small to properly estimate the coefficients of the nonlinear part. When trained on simple rules, we find that SVMs overfit only weakly. The performance of SVMs is strongly enhanced, when the distribution of the inputs has a gap in feature space.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a novel approach to water pollution detection from remotely sensed low-platform mounted visible band camera images. We examine the feasibility of unsupervised segmentation for slick (oily spills on the water surface) region labelling. Adaptive and non adaptive filtering is combined with density modeling of the obtained textural features. A particular effort is concentrated on the textural feature extraction from raw intensity images using filter banks and adaptive feature extraction from the obtained output coefficients. Segmentation in the extracted feature space is achieved using Gaussian mixture models (GMM).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A domain independent ICA-based watermarking method is introduced and studied by numerical simulations. This approach can be used either on images, music or video to convey a hidden message. It relies on embedding the information in a set of statistically independent sources (the independent components) as the feature space. For the experiments the medium has arbritraly chosen to be digital images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A domain independent ICA-based approach to watermarking is presented. This approach can be used on images, music or video to embed either a robust or fragile watermark. In the case of robust watermarking, the method shows high information rate and robustness against malicious and non-malicious attacks, while keeping a low induced distortion. The fragile watermarking scheme, on the other hand, shows high sensitivity to tampering attempts while keeping the requirement for high information rate and low distortion. The improved performance is achieved by employing a set of statistically independent sources (the independent components) as the feature space and principled statistical decoding methods. The performance of the suggested method is compared to other state of the art approaches. The paper focuses on applying the method to digitized images although the same approach can be used for other media, such as music or video.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The assessment of the reliability of systems which learn from data is a key issue to investigate thoroughly before the actual application of information processing techniques to real-world problems. Over the recent years Gaussian processes and Bayesian neural networks have come to the fore and in this thesis their generalisation capabilities are analysed from theoretical and empirical perspectives. Upper and lower bounds on the learning curve of Gaussian processes are investigated in order to estimate the amount of data required to guarantee a certain level of generalisation performance. In this thesis we analyse the effects on the bounds and the learning curve induced by the smoothness of stochastic processes described by four different covariance functions. We also explain the early, linearly-decreasing behaviour of the curves and we investigate the asymptotic behaviour of the upper bounds. The effect of the noise and the characteristic lengthscale of the stochastic process on the tightness of the bounds are also discussed. The analysis is supported by several numerical simulations. The generalisation error of a Gaussian process is affected by the dimension of the input vector and may be decreased by input-variable reduction techniques. In conventional approaches to Gaussian process regression, the positive definite matrix estimating the distance between input points is often taken diagonal. In this thesis we show that a general distance matrix is able to estimate the effective dimensionality of the regression problem as well as to discover the linear transformation from the manifest variables to the hidden-feature space, with a significant reduction of the input dimension. Numerical simulations confirm the significant superiority of the general distance matrix with respect to the diagonal one.In the thesis we also present an empirical investigation of the generalisation errors of neural networks trained by two Bayesian algorithms, the Markov Chain Monte Carlo method and the evidence framework; the neural networks have been trained on the task of labelling segmented outdoor images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper addresses the security of a specific class of common watermarking methods based on Dither modulation-quantisation index modulation (DM-QIM) and focusing on watermark-only attacks (WOA). The vulnerabilities of and probable attacks on lattice structure based watermark embedding methods have been presented in the literature. DM-QIM is one of the best known lattice structure based watermarking techniques. In this paper, the authors discuss a watermark-only attack scenario (the attacker has access to a single watermarked content only). In the literature it is an assumption that DM-QIM methods are secure to WOA. However, the authors show that the DM-QIM based embedding method is vulnerable against a guided key guessing attack by exploiting subtle statistical regularities in the feature space embeddings for time series and images. Using a distribution-free algorithm, this paper presents an analysis of the attack and numerical results for multiple examples of image and time series data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper addresses the security of a specific class of common watermarking methods based on Dither modulation-quantisation index modulation (DM-QIM) and focusing on watermark-only attacks (WOA). The vulnerabilities of and probable attacks on lattice structure based watermark embedding methods have been presented in the literature. DM-QIM is one of the best known lattice structure based watermarking techniques. In this paper, the authors discuss a watermark-only attack scenario (the attacker has access to a single watermarked content only). In the literature it is an assumption that DM-QIM methods are secure to WOA. However, the authors show that the DM-QIM based embedding method is vulnerable against a guided key guessing attack by exploiting subtle statistical regularities in the feature space embeddings for time series and images. Using a distribution-free algorithm, this paper presents an analysis of the attack and numerical results for multiple examples of image and time series data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acute life-threatening events are mostly predictable in adults and children. Despite real-time monitoring these events still occur at a rate of 4%. This paper describes an automated prediction system based on the feature space embedding and time series forecasting methods of the SpO2 signal; a pulsatile signal synchronised with heart beat. We develop an age-independent index of abnormality that distinguishes patient-specific normal to abnormal physiology transitions. Two different methods were used to distinguish between normal and abnormal physiological trends based on SpO2 behaviour. The abnormality index derived by each method is compared against the current gold standard of clinical prediction of critical deterioration. Copyright © 2013 Inderscience Enterprises Ltd.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Protein-DNA interactions are an essential feature in the genetic activities of life, and the ability to predict and manipulate such interactions has applications in a wide range of fields. This Thesis presents the methods of modelling the properties of protein-DNA interactions. In particular, it investigates the methods of visualising and predicting the specificity of DNA-binding Cys2His2 zinc finger interaction. The Cys2His2 zinc finger proteins interact via their individual fingers to base pair subsites on the target DNA. Four key residue positions on the a- helix of the zinc fingers make non-covalent interactions with the DNA with sequence specificity. Mutating these key residues generates combinatorial possibilities that could potentially bind to any DNA segment of interest. Many attempts have been made to predict the binding interaction using structural and chemical information, but with only limited success. The most important contribution of the thesis is that the developed model allows for the binding properties of a given protein-DNA binding to be visualised in relation to other protein-DNA combinations without having to explicitly physically model the specific protein molecule and specific DNA sequence. To prove this, various databases were generated, including a synthetic database which includes all possible combinations of the DNA-binding Cys2His2 zinc finger interactions. NeuroScale, a topographic visualisation technique, is exploited to represent the geometric structures of the protein-DNA interactions by measuring dissimilarity between the data points. In order to verify the effect of visualisation on understanding the binding properties of the DNA-binding Cys2His2 zinc finger interaction, various prediction models are constructed by using both the high dimensional original data and the represented data in low dimensional feature space. Finally, novel data sets are studied through the selected visualisation models based on the experimental DNA-zinc finger protein database. The result of the NeuroScale projection shows that different dissimilarity representations give distinctive structural groupings, but clustering in biologically-interesting ways. This method can be used to forecast the physiochemical properties of the novel proteins which may be beneficial for therapeutic purposes involving genome targeting in general.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years, learning word vector representations has attracted much interest in Natural Language Processing. Word representations or embeddings learned using unsupervised methods help addressing the problem of traditional bag-of-word approaches which fail to capture contextual semantics. In this paper we go beyond the vector representations at the word level and propose a novel framework that learns higher-level feature representations of n-grams, phrases and sentences using a deep neural network built from stacked Convolutional Restricted Boltzmann Machines (CRBMs). These representations have been shown to map syntactically and semantically related n-grams to closeby locations in the hidden feature space. We have experimented to additionally incorporate these higher-level features into supervised classifier training for two sentiment analysis tasks: subjectivity classification and sentiment classification. Our results have demonstrated the success of our proposed framework with 4% improvement in accuracy observed for subjectivity classification and improved the results achieved for sentiment classification over models trained without our higher level features.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acute life threatening events such as cardiac/respiratory arrests are often predictable in adults and children. However critical events such as unplanned extubations are considered as not predictable. This paper seeks to evaluate the ability of automated prediction systems based on feature space embedding and time series methods to predict unplanned extubations in paediatric intensive care patients. We try to exploit the trends in the physiological signals such as Heart Rate, Respiratory Rate, Systolic Blood Pressure and Oxygen saturation levels in the blood using signal processing aspects of a frame-based approach of expanding signals using a nonorthogonal basis derived from the data. We investigate the significance of the trends in a computerised prediction system. The results are compared with clinical observations of predictability. We will conclude by investigating whether the prediction capability of the system could be exploited to prevent future unplanned extubations. © 2014 IEEE.