3 resultados para Antenna Array
em Duke University
Resumo:
Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces model linear correlation and are a good fit to signals generated by physical systems, such as frontal images of human faces and multiple sources impinging at an antenna array. Manifolds model sources that are not linearly correlated, but where signals are determined by a small number of parameters. Examples are images of human faces under different poses or expressions, and handwritten digits with varying styles. However, there will always be some degree of model mismatch between the subspace or manifold model and the true statistics of the source. This dissertation exploits subspace and manifold models as prior information in various signal processing and machine learning tasks.
A near-low-rank Gaussian mixture model measures proximity to a union of linear or affine subspaces. This simple model can effectively capture the signal distribution when each class is near a subspace. This dissertation studies how the pairwise geometry between these subspaces affects classification performance. When model mismatch is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the model mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. This linear transformation, termed TRAIT, also preserves some specific features in each class, being complementary to a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is more significant, TRAIT shows superior performance compared to LRT.
The manifold model enforces a constraint on the freedom of data variation. Learning features that are robust to data variation is very important, especially when the size of the training set is small. A learning machine with large numbers of parameters, e.g., deep neural network, can well describe a very complicated data distribution. However, it is also more likely to be sensitive to small perturbations of the data, and to suffer from suffer from degraded performance when generalizing to unseen (test) data.
From the perspective of complexity of function classes, such a learning machine has a huge capacity (complexity), which tends to overfit. The manifold model provides us with a way of regularizing the learning machine, so as to reduce the generalization error, therefore mitigate overfiting. Two different overfiting-preventing approaches are proposed, one from the perspective of data variation, the other from capacity/complexity control. In the first approach, the learning machine is encouraged to make decisions that vary smoothly for data points in local neighborhoods on the manifold. In the second approach, a graph adjacency matrix is derived for the manifold, and the learned features are encouraged to be aligned with the principal components of this adjacency matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious advantage of the proposed approaches when the training set is small.
Stochastic optimization makes it possible to track a slowly varying subspace underlying streaming data. By approximating local neighborhoods using affine subspaces, a slowly varying manifold can be efficiently tracked as well, even with corrupted and noisy data. The more the local neighborhoods, the better the approximation, but the higher the computational complexity. A multiscale approximation scheme is proposed, where the local approximating subspaces are organized in a tree structure. Splitting and merging of the tree nodes then allows efficient control of the number of neighbourhoods. Deviation (of each datum) from the learned model is estimated, yielding a series of statistics for anomaly detection. This framework extends the classical {\em changepoint detection} technique, which only works for one dimensional signals. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.
Resumo:
The genomes of many strains of baker’s yeast, Saccharomyces cerevisiae, contain multiple repeats of the copper-binding protein Cup1. Cup1 is a member of the metallothionein family, and is found in a tandem array on chromosome VIII. In this thesis, I describe studies that characterized these tandem arrays and their mechanism of formation across diverse strains of yeast. I show that CUP1 arrays are an illuminating model system for observing recombination in eukaryotes, and describe insights derived from these observations.
In our first study, we analyzed 101 natural isolates of S. cerevisiae in order to examine the diversity of CUP1-containing repeats across different strains. We identified five distinct classes of repeats that contain CUP1. We also showed that some strains have only a single copy of CUP1. By comparing the sequences of all the strains, we were able to elucidate the mechanism of formation of the CUP1 tandem arrays, which involved unequal non-homologous recombination events starting from a strain that had only a single CUP1 gene. Our observation of CUP1 repeat formation allows more general insights about the formation of tandem repeats from single-copy genes in eukaryotes, which is one of the most important mechanisms by which organisms evolve.
In our second study, we delved deeper into our mechanistic investigations by measuring the relative rates of inter-homolog and intra-/inter-sister chromatid recombination in CUP1 tandem arrays. We used a diploid strain that is heterozygous both for insertion of a selectable marker (URA3) inside the tandem array, and also for markers at either end of the array. The intra-/inter-sister chromatid recombination rate turned out to be more than ten-fold greater than the inter-homolog rate. Moreover, we found that loss of the proteins Rad51 and Rad52, which are required for most inter-homolog recombination, did not greatly reduce recombination in the CUP1 tandem repeats. Additionally, we investigated the effects of elevated copper levels on the rate of each type of recombination at the CUP1 locus. Both types of recombination are increased at high concentrations of copper (as is known to be the case for CUP1 transcription). Furthermore, the inter-homolog recombination rate at the CUP1 locus is higher than the average over the genome during mitosis, but is lower than the average during meiosis.
The research described in Chapter 2 is published in 2014.