2 resultados para distributed combination of classifiers
em DRUM (Digital Repository at the University of Maryland)
Resumo:
An increasing focus in evolutionary biology is on the interplay between mesoscale ecological and evolutionary processes such as population demographics, habitat tolerance, and especially geographic distribution, as potential drivers responsible for patterns of diversification and extinction over geologic time. However, few studies to date connect organismal processes such as survival and reproduction through mesoscale patterns to long-term macroevolutionary trends. In my dissertation, I investigate how mechanism of seed dispersal, mediated through geographic range size, influences diversification rates in the Rosales (Plantae: Anthophyta). In my first chapter, I validate the phylogenetic comparative methods that I use in my second and third chapters. Available state speciation and extinction (SSE) models assumptions about evolution known to be false through fossil data. I show, however, that as long as net diversification rates remain positive – a condition likely true for the Rosales – these violations of SSE’s assumptions do not cause significantly biased results. With SSE methods validated, my second chapter reconstructs three associations that appear to increase diversification rate for Rosalean genera: (1) herbaceous habit; (2) a three-way interaction combining animal dispersal, high within-genus species richness, and geographic range on multiple continents; (3) a four-way interaction combining woody habit with the other three characteristics of (2). I suggest that the three- and four-way interactions represent colonization ability and resulting extinction resistance in the face of late Cenozoic climate change; however, there are other possibilities as well that I hope to investigate in future research. My third chapter reconstructs the phylogeographic history of the Rosales using both non-fossil-assisted SSE methods as well as fossil-informed traditional phylogeographic analysis. Ancestral state reconstructions indicate that the Rosaceae diversified in North America while the other Rosalean families diversified elsewhere, possibly in Eurasia. SSE is able to successfully identify groups of genera that were likely to have been ancestrally widespread, but has poorer taxonomic resolution than methods that use fossil data. In conclusion, these chapters together suggest several potential causal links between organismal, mesoscale, and geologic scale processes, but further work will be needed to test the hypotheses that I raise here.
Resumo:
Finding rare events in multidimensional data is an important detection problem that has applications in many fields, such as risk estimation in insurance industry, finance, flood prediction, medical diagnosis, quality assurance, security, or safety in transportation. The occurrence of such anomalies is so infrequent that there is usually not enough training data to learn an accurate statistical model of the anomaly class. In some cases, such events may have never been observed, so the only information that is available is a set of normal samples and an assumed pairwise similarity function. Such metric may only be known up to a certain number of unspecified parameters, which would either need to be learned from training data, or fixed by a domain expert. Sometimes, the anomalous condition may be formulated algebraically, such as a measure exceeding a predefined threshold, but nuisance variables may complicate the estimation of such a measure. Change detection methods used in time series analysis are not easily extendable to the multidimensional case, where discontinuities are not localized to a single point. On the other hand, in higher dimensions, data exhibits more complex interdependencies, and there is redundancy that could be exploited to adaptively model the normal data. In the first part of this dissertation, we review the theoretical framework for anomaly detection in images and previous anomaly detection work done in the context of crack detection and detection of anomalous components in railway tracks. In the second part, we propose new anomaly detection algorithms. The fact that curvilinear discontinuities in images are sparse with respect to the frame of shearlets, allows us to pose this anomaly detection problem as basis pursuit optimization. Therefore, we pose the problem of detecting curvilinear anomalies in noisy textured images as a blind source separation problem under sparsity constraints, and propose an iterative shrinkage algorithm to solve it. Taking advantage of the parallel nature of this algorithm, we describe how this method can be accelerated using graphical processing units (GPU). Then, we propose a new method for finding defective components on railway tracks using cameras mounted on a train. We describe how to extract features and use a combination of classifiers to solve this problem. Then, we scale anomaly detection to bigger datasets with complex interdependencies. We show that the anomaly detection problem naturally fits in the multitask learning framework. The first task consists of learning a compact representation of the good samples, while the second task consists of learning the anomaly detector. Using deep convolutional neural networks, we show that it is possible to train a deep model with a limited number of anomalous examples. In sequential detection problems, the presence of time-variant nuisance parameters affect the detection performance. In the last part of this dissertation, we present a method for adaptively estimating the threshold of sequential detectors using Extreme Value Theory on a Bayesian framework. Finally, conclusions on the results obtained are provided, followed by a discussion of possible future work.