621 resultados para Dimensionality
Resumo:
© 2015 John P. Cunningham and Zoubin Ghahramani. Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of interest, such as covariance, dynamical structure, correlation between data sets, input-output relationships, and margin between data classes. Methods have been developed with a variety of names and motivations in many fields, and perhaps as a result the connections between all these methods have not been highlighted. Here we survey methods from this disparate literature as optimization programs over matrix manifolds. We discuss principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, maximum autocorrelation factors, slow feature analysis, sufficient dimensionality reduction, undercomplete independent component analysis, linear regression, distance metric learning, and more. This optimization framework gives insight to some rarely discussed shortcomings of well-known methods, such as the suboptimality of certain eigenvector solutions. Modern techniques for optimization over matrix manifolds enable a generic linear dimensionality reduction solver, which accepts as input data and an objective to be optimized, and returns, as output, an optimal low-dimensional projection of the data. This simple optimization framework further allows straightforward generalizations and novel variants of classical methods, which we demonstrate here by creating an orthogonal-projection canonical correlations analysis. More broadly, this survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology.
Resumo:
Semisupervised dimensionality reduction has been attracting much attention as it not only utilizes both labeled and unlabeled data simultaneously, but also works well in the situation of out-of-sample. This paper proposes an effective approach of semisupervised dimensionality reduction through label propagation and label regression. Different from previous efforts, the new approach propagates the label information from labeled to unlabeled data with a well-designed mechanism of random walks, in which outliers are effectively detected and the obtained virtual labels of unlabeled data can be well encoded in a weighted regression model. These virtual labels are thereafter regressed with a linear model to calculate the projection matrix for dimensionality reduction. By this means, when the manifold or the clustering assumption of data is satisfied, the labels of labeled data can be correctly propagated to the unlabeled data; and thus, the proposed approach utilizes the labeled and the unlabeled data more effectively than previous work. Experimental results are carried out upon several databases, and the advantage of the new approach is well demonstrated.
Resumo:
The Gaussian process latent variable model (GP-LVM) has been identified to be an effective probabilistic approach for dimensionality reduction because it can obtain a low-dimensional manifold of a data set in an unsupervised fashion. Consequently, the GP-LVM is insufficient for supervised learning tasks (e. g., classification and regression) because it ignores the class label information for dimensionality reduction. In this paper, a supervised GP-LVM is developed for supervised learning tasks, and the maximum a posteriori algorithm is introduced to estimate positions of all samples in the latent variable space. We present experimental evidences suggesting that the supervised GP-LVM is able to use the class label information effectively, and thus, it outperforms the GP-LVM and the discriminative extension of the GP-LVM consistently. The comparison with some supervised classification methods, such as Gaussian process classification and support vector machines, is also given to illustrate the advantage of the proposed method.
Resumo:
R. Jensen and Q. Shen. Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough Based Approaches. IEEE Transactions on Knowledge and Data Engineering, 16(12): 1457-1471. 2004.
Resumo:
Ecological stability is touted as a complex and multifaceted concept, including components such as variability, resistance, resilience, persistence and robustness. Even though a complete appreciation of the effects of perturbations on ecosystems requires the simultaneous measurement of these multiple components of stability, most ecological research has focused on one or a few of those components analysed in isolation. Here, we present a new view of ecological stability that recognises explicitly the non-independence of components of stability. This provides an approach for simplifying the concept of stability. We illustrate the concept and approach using results from a field experiment, and show that the effective dimensionality of ecological stability is considerably lower than if the various components of stability were unrelated. However, strong perturbations can modify, and even decouple, relationships among individual components of stability. Thus, perturbations not only increase the dimensionality of stability but they can also alter the relationships among components of stability in different ways. Studies that focus on single forms of stability in isolation therefore risk underestimating significantly the potential of perturbations to destabilise ecosystems. In contrast, application of the multidimensional stability framework that we propose gives a far richer understanding of how communities respond to perturbations.
Resumo:
A novel non-linear dimensionality reduction method, called Temporal Laplacian Eigenmaps, is introduced to process efficiently time series data. In this embedded-based approach, temporal information is intrinsic to the objective function, which produces description of low dimensional spaces with time coherence between data points. Since the proposed scheme also includes bidirectional mapping between data and embedded spaces and automatic tuning of key parameters, it offers the same benefits as mapping-based approaches. Experiments on a couple of computer vision applications demonstrate the superiority of the new approach to other dimensionality reduction method in term of accuracy. Moreover, its lower computational cost and generalisation abilities suggest it is scalable to larger datasets. © 2010 IEEE.
Resumo:
In many applications in applied statistics researchers reduce the complexity of a data set by combining a group of variables into a single measure using factor analysis or an index number. We argue that such compression loses information if the data actually has high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data does not have to be linearized. Applying our ideas to the United Nations Human Development Index we find that the four variables that are used in its construction have dimension three and the index loses information.
Resumo:
A reliable and valid instrument is needed to screen for depression in palliative patients. The interRAI Depression Rating Scale (DRS) is based on seven items in the interRAI Palliative Care instrument. This study is the first to explore the dimensionality, reliability and validity of the DRS in a palliative population. Palliative home care patients (n = 5,175) residing in Ontario (Canada) were assessed with the interRAI Palliative Care instrument. Exploratory factor analysis and Mokken scale analysis were used to identify candidate conceptual models and evaluate scale homogeneity/performance. Confirmatory factor analysis compared models using standard goodness-of-fit indices. Convergent and divergent validity were investigated by examining polychoric correlations between the DRS and other items. The “known groups” test determined if the DRS meaningfully distinguished among client subgroups. The non-hierarchical two factor model showed acceptable fit with the data, and ordinal alpha coefficients of 0.83 and 0.82 were observed for the two DRS subscales. Omega hierarchical (ωh) was 0.78 for the bifactor model, with the general factor explaining three quarters of the common variance. Despite the multidimensionality evident in the factor analyses, bifactor modelling and the Mokken homogeneity coefficient (0.34) suggest that the DRS is a coherent scale that captures important information on sub-constructs of depression (e.g., somatic symptoms). Higher correlations were seen between the DRS and mood and psychosocial well-being items, and lower correlations with functional status and demographic variables. The DRS distinguished in the expected manner for known risk factors (e.g., social support, pain). The results suggest that the DRS is primarily unidimensional and reliable for use in screening for depression in palliative care patients.
Resumo:
A criticism of consociational power sharing as an institutional response to violent conflict is that it buttresses rather than ameliorates the underlying (linguistic, religious or ethno-national) divide, hence prohibiting the emergence of new dimensions of political competition (such as economic left-right or moral liberal-conservative dimensions) that are characteristic of 'normal' societies. We test this argument in the context of the illustrative Northern Ireland case, using data from expert coding of party policy documents and opinion data derived from two Voter Advice Applications (VAAs). We find evidence for a moral liberal-conservative dimension of politics in addition to the ethno-national dimension. Hence, we caution against assuming that consociational polities are uni-dimensional.
Resumo:
This report explores how recurrent neural networks can be exploited for learning high-dimensional mappings. Since recurrent networks are as powerful as Turing machines, an interesting question is how recurrent networks can be used to simplify the problem of learning from examples. The main problem with learning high-dimensional functions is the curse of dimensionality which roughly states that the number of examples needed to learn a function increases exponentially with input dimension. This thesis proposes a way of avoiding this problem by using a recurrent network to decompose a high-dimensional function into many lower dimensional functions connected in a feedback loop.
Resumo:
Biological systems exhibit rich and complex behavior through the orchestrated interplay of a large array of components. It is hypothesized that separable subsystems with some degree of functional autonomy exist; deciphering their independent behavior and functionality would greatly facilitate understanding the system as a whole. Discovering and analyzing such subsystems are hence pivotal problems in the quest to gain a quantitative understanding of complex biological systems. In this work, using approaches from machine learning, physics and graph theory, methods for the identification and analysis of such subsystems were developed. A novel methodology, based on a recent machine learning algorithm known as non-negative matrix factorization (NMF), was developed to discover such subsystems in a set of large-scale gene expression data. This set of subsystems was then used to predict functional relationships between genes, and this approach was shown to score significantly higher than conventional methods when benchmarking them against existing databases. Moreover, a mathematical treatment was developed to treat simple network subsystems based only on their topology (independent of particular parameter values). Application to a problem of experimental interest demonstrated the need for extentions to the conventional model to fully explain the experimental data. Finally, the notion of a subsystem was evaluated from a topological perspective. A number of different protein networks were examined to analyze their topological properties with respect to separability, seeking to find separable subsystems. These networks were shown to exhibit separability in a nonintuitive fashion, while the separable subsystems were of strong biological significance. It was demonstrated that the separability property found was not due to incomplete or biased data, but is likely to reflect biological structure.
Resumo:
Functional Data Analysis (FDA) deals with samples where a whole function is observed for each individual. A particular case of FDA is when the observed functions are density functions, that are also an example of infinite dimensional compositional data. In this work we compare several methods for dimensionality reduction for this particular type of data: functional principal components analysis (PCA) with or without a previous data transformation and multidimensional scaling (MDS) for diferent inter-densities distances, one of them taking into account the compositional nature of density functions. The difeerent methods are applied to both artificial and real data (households income distributions)