2 resultados para model selection in binary regression

em DRUM (Digital Repository at the University of Maryland)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An increasing focus in evolutionary biology is on the interplay between mesoscale ecological and evolutionary processes such as population demographics, habitat tolerance, and especially geographic distribution, as potential drivers responsible for patterns of diversification and extinction over geologic time. However, few studies to date connect organismal processes such as survival and reproduction through mesoscale patterns to long-term macroevolutionary trends. In my dissertation, I investigate how mechanism of seed dispersal, mediated through geographic range size, influences diversification rates in the Rosales (Plantae: Anthophyta). In my first chapter, I validate the phylogenetic comparative methods that I use in my second and third chapters. Available state speciation and extinction (SSE) models assumptions about evolution known to be false through fossil data. I show, however, that as long as net diversification rates remain positive – a condition likely true for the Rosales – these violations of SSE’s assumptions do not cause significantly biased results. With SSE methods validated, my second chapter reconstructs three associations that appear to increase diversification rate for Rosalean genera: (1) herbaceous habit; (2) a three-way interaction combining animal dispersal, high within-genus species richness, and geographic range on multiple continents; (3) a four-way interaction combining woody habit with the other three characteristics of (2). I suggest that the three- and four-way interactions represent colonization ability and resulting extinction resistance in the face of late Cenozoic climate change; however, there are other possibilities as well that I hope to investigate in future research. My third chapter reconstructs the phylogeographic history of the Rosales using both non-fossil-assisted SSE methods as well as fossil-informed traditional phylogeographic analysis. Ancestral state reconstructions indicate that the Rosaceae diversified in North America while the other Rosalean families diversified elsewhere, possibly in Eurasia. SSE is able to successfully identify groups of genera that were likely to have been ancestrally widespread, but has poorer taxonomic resolution than methods that use fossil data. In conclusion, these chapters together suggest several potential causal links between organismal, mesoscale, and geologic scale processes, but further work will be needed to test the hypotheses that I raise here.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.