6 resultados para Low-dimensional systems

em Duke University


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces model linear correlation and are a good fit to signals generated by physical systems, such as frontal images of human faces and multiple sources impinging at an antenna array. Manifolds model sources that are not linearly correlated, but where signals are determined by a small number of parameters. Examples are images of human faces under different poses or expressions, and handwritten digits with varying styles. However, there will always be some degree of model mismatch between the subspace or manifold model and the true statistics of the source. This dissertation exploits subspace and manifold models as prior information in various signal processing and machine learning tasks.

A near-low-rank Gaussian mixture model measures proximity to a union of linear or affine subspaces. This simple model can effectively capture the signal distribution when each class is near a subspace. This dissertation studies how the pairwise geometry between these subspaces affects classification performance. When model mismatch is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the model mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. This linear transformation, termed TRAIT, also preserves some specific features in each class, being complementary to a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is more significant, TRAIT shows superior performance compared to LRT.

The manifold model enforces a constraint on the freedom of data variation. Learning features that are robust to data variation is very important, especially when the size of the training set is small. A learning machine with large numbers of parameters, e.g., deep neural network, can well describe a very complicated data distribution. However, it is also more likely to be sensitive to small perturbations of the data, and to suffer from suffer from degraded performance when generalizing to unseen (test) data.

From the perspective of complexity of function classes, such a learning machine has a huge capacity (complexity), which tends to overfit. The manifold model provides us with a way of regularizing the learning machine, so as to reduce the generalization error, therefore mitigate overfiting. Two different overfiting-preventing approaches are proposed, one from the perspective of data variation, the other from capacity/complexity control. In the first approach, the learning machine is encouraged to make decisions that vary smoothly for data points in local neighborhoods on the manifold. In the second approach, a graph adjacency matrix is derived for the manifold, and the learned features are encouraged to be aligned with the principal components of this adjacency matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious advantage of the proposed approaches when the training set is small.

Stochastic optimization makes it possible to track a slowly varying subspace underlying streaming data. By approximating local neighborhoods using affine subspaces, a slowly varying manifold can be efficiently tracked as well, even with corrupted and noisy data. The more the local neighborhoods, the better the approximation, but the higher the computational complexity. A multiscale approximation scheme is proposed, where the local approximating subspaces are organized in a tree structure. Splitting and merging of the tree nodes then allows efficient control of the number of neighbourhoods. Deviation (of each datum) from the learned model is estimated, yielding a series of statistics for anomaly detection. This framework extends the classical {\em changepoint detection} technique, which only works for one dimensional signals. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

At the jamming transition, amorphous packings are known to display anomalous vibrational modes with a density of states (DOS) that remains constant at low frequency. The scaling of the DOS at higher packing fractions remains, however, unclear. One might expect to find a simple Debye scaling, but recent results from effective medium theory and the exact solution of mean-field models both predict an anomalous, non-Debye scaling. Being mean-field in nature, however, these solutions are only strictly valid in the limit of infinite spatial dimension, and it is unclear what value they have for finite-dimensional systems. Here, we study packings of soft spheres in dimensions 3 through 7 and find, away from jamming, a universal non-Debye scaling of the DOS that is consistent with the mean-field predictions. We also consider how the soft mode participation ratio evolves as dimension increases.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Intriguing lattice dynamics has been predicted for aperiodic crystals that contain incommensurate substructures. Here we report inelastic neutron scattering measurements of phonon and magnon dispersions in Sr14Cu24O41, which contains incommensurate one-dimensional (1D) chain and two-dimensional (2D) ladder substructures. Two distinct acoustic phonon-like modes, corresponding to the sliding motion of one sublattice against the other, are observed for atomic motions polarized along the incommensurate axis. In the long wavelength limit, it is found that the sliding mode shows a remarkably small energy gap of 1.7-1.9 meV, indicating very weak interactions between the two incommensurate sublattices. The measurements also reveal a gapped and steep linear magnon dispersion of the ladder sublattice. The high group velocity of this magnon branch and weak coupling with acoustic phonons can explain the large magnon thermal conductivity in Sr14Cu24O41 crystals. In addition, the magnon specific heat is determined from the measured total specific heat and phonon density of states, and exhibits a Schottky anomaly due to gapped magnon modes of the spin chains. These findings offer new insights into the phonon and magnon dynamics and thermal transport properties of incommensurate magnetic crystals that contain low-dimensional substructures.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The ability of diffuse reflectance spectroscopy to extract quantitative biological composition of tissues has been used to discern tissue types in both pre-clinical and clinical cancer studies. Typically, diffuse reflectance spectroscopy systems are designed for single-point measurements. Clinically, an imaging system would provide valuable spatial information on tissue composition. While it is feasible to build a multiplexed fiber-optic probe based spectral imaging system, these systems suffer from drawbacks with respect to cost and size. To address these we developed a compact and low cost system using a broadband light source with an 8-slot filter wheel for illumination and silicon photodiodes for detection. The spectral imaging system was tested on a set of tissue mimicking liquid phantoms which yielded an optical property extraction accuracy of 6.40 +/- 7.78% for the absorption coefficient (micro(a)) and 11.37 +/- 19.62% for the wavelength-averaged reduced scattering coefficient (micro(s)').