923 resultados para model selection in binary regression
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Ce mémoire s’intéresse à l’étude du critère de validation croisée pour le choix des modèles relatifs aux petits domaines. L’étude est limitée aux modèles de petits domaines au niveau des unités. Le modèle de base des petits domaines est introduit par Battese, Harter et Fuller en 1988. C’est un modèle de régression linéaire mixte avec une ordonnée à l’origine aléatoire. Il se compose d’un certain nombre de paramètres : le paramètre β de la partie fixe, la composante aléatoire et les variances relatives à l’erreur résiduelle. Le modèle de Battese et al. est utilisé pour prédire, lors d’une enquête, la moyenne d’une variable d’intérêt y dans chaque petit domaine en utilisant une variable auxiliaire administrative x connue sur toute la population. La méthode d’estimation consiste à utiliser une distribution normale, pour modéliser la composante résiduelle du modèle. La considération d’une dépendance résiduelle générale, c’est-à-dire autre que la loi normale donne une méthodologie plus flexible. Cette généralisation conduit à une nouvelle classe de modèles échangeables. En effet, la généralisation se situe au niveau de la modélisation de la dépendance résiduelle qui peut être soit normale (c’est le cas du modèle de Battese et al.) ou non-normale. L’objectif est de déterminer les paramètres propres aux petits domaines avec le plus de précision possible. Cet enjeu est lié au choix de la bonne dépendance résiduelle à utiliser dans le modèle. Le critère de validation croisée sera étudié à cet effet.
Resumo:
The spread of invasive organisms is one of the greatest threats to ecosystems and biodiversity worldwide. Understanding the evolutionary and ecological factors responsible for the transport, introduction, establishment and spread of invasive species will assist the development of control strategies. The New Zealand mudsnail, Potamopyrgus antipodarum (Gray 1843) (Gastropoda: Hydrobiidae), is a global freshwater invader, with populations established in Europe, Asia, the Americas and Australia. While sexual and asexual P. antipodarum coexist in the native range, invasive populations reproduce by parthenogenesis, producing dense populations that compete for resources with native species. Potamopyrgus antipodarum is a natural model system for the study of evolutionary and ecological processes underlying invasion. This thesis assesses the invasion history, genetic diversity and ecology of P. antipodarum in Australia, with particular focus on: a) potential source populations, b) distribution and structure of populations, and c) species traits related to the establishment, persistence and spread of invasive P. antipodarum. Genetic analyses were carried out on specimens collected for this study from New Zealand and Australia, along with existing museum samples. In combination with published data, the analyses revealed low genetic diversity among and within invasive populations in south-eastern Australia, relative to New Zealand populations. Phylogenetic relationships inferred from mitochondrial sequences indicated that the Australian populations belong to clades dominated by parthenogenetic haplotypes that are known to be present in Europe and the US. These ‘invasive clades’ are likely to originate from the North Island of New Zealand, and suggest a role for selection in determining genetic composition of invasive populations. The genotypic diversity of Australian P. antipodarum was low, with few, closely related clones distributed across south-eastern Australia. The pattern of clone distribution was not consistent with any assessed geographical or abiotic factors; instead a few, widely-distributed clones were present in high frequencies at most sites. Differences in clone frequencies were found, which may indicate differential success of clonal lineages. A range of traits have been proposed as facilitators of invasion success, and within-species variation in these traits can promote differential success of genotypes. Using laboratory-based experiments, the performance of the three most common Australian clones was tested across a suite of invasion-relevant traits. Ecologically-relevant variation in traits was found among the clones. These differences may have determined the spatial distribution of clones, and may continue to do so into the future. This thesis found that the P. antipodarum invasion of Australia is the result of few introductions of a small number of globally-invasive genotypes that vary in ecologically-relevant traits. From a source of considerable genetic diversity in the native range, very few genotypes have become invasive. Those that are invasive appear to be very successful at continental scales. These findings highlight a capacity in asexual invaders to successfully invade, and potentially adapt to, a broad range of ecosystems. The P. antipodarum invasion system is amenable to research using combinations of field-based studies, molecular and laboratory approaches, and is likely to yield significant, broadly-applicable insights into invasion.
Resumo:
The role of computer modeling has grown recently to integrate itself as an inseparable tool to experimental studies for the optimization of automotive engines and the development of future fuels. Traditionally, computer models rely on simplified global reaction steps to simulate the combustion and pollutant formation inside the internal combustion engine. With the current interest in advanced combustion modes and injection strategies, this approach depends on arbitrary adjustment of model parameters that could reduce credibility of the predictions. The purpose of this study is to enhance the combustion model of KIVA, a computational fluid dynamics code, by coupling its fluid mechanics solution with detailed kinetic reactions solved by the chemistry solver, CHEMKIN. As a result, an engine-friendly reaction mechanism for n-heptane was selected to simulate diesel oxidation. Each cell in the computational domain is considered as a perfectly-stirred reactor which undergoes adiabatic constant- volume combustion. The model was applied to an ideally-prepared homogeneous- charge compression-ignition combustion (HCCI) and direct injection (DI) diesel combustion. Ignition and combustion results show that the code successfully simulates the premixed HCCI scenario when compared to traditional combustion models. Direct injection cases, on the other hand, do not offer a reliable prediction mainly due to the lack of turbulent-mixing model, inherent in the perfectly-stirred reactor formulation. In addition, the model is sensitive to intake conditions and experimental uncertainties which require implementation of enhanced predictive tools. It is recommended that future improvements consider turbulent-mixing effects as well as optimization techniques to accurately simulate actual in-cylinder process with reduced computational cost. Furthermore, the model requires the extension of existing fuel oxidation mechanisms to include pollutant formation kinetics for emission control studies.
Resumo:
One of the simplest models of adaptation to a new environment is Fisher's Geometric Model (FGM), in which populations move on a multidimensional landscape defined by the traits under selection. The predictions of this model have been found to be consistent with current observations of patterns of fitness increase in experimentally evolved populations. Recent studies investigated the dynamics of allele frequency change along adaptation of microbes to simple laboratory conditions and unveiled a dramatic pattern of competition between cohorts of mutations, i.e., multiple mutations simultaneously segregating and ultimately reaching fixation. Here, using simulations, we study the dynamics of phenotypic and genetic change as asexual populations under clonal interference climb a Fisherian landscape, and ask about the conditions under which FGM can display the simultaneous increase and fixation of multiple mutations-mutation cohorts-along the adaptive walk. We find that FGM under clonal interference, and with varying levels of pleiotropy, can reproduce the experimentally observed competition between different cohorts of mutations, some of which have a high probability of fixation along the adaptive walk. Overall, our results show that the surprising dynamics of mutation cohorts recently observed during experimental adaptation of microbial populations can be expected under one of the oldest and simplest theoretical models of adaptation-FGM.
Resumo:
We determine numerically the single-particle and the two-particle spectrum of the three-state quantum Potts model on a lattice by using the density matrix renormalization group method, and extract information on the asymptotic (small momentum) S-matrix of the quasiparticles. The low energy part of the finite size spectrum can be understood in terms of a simple effective model introduced in a previous work, and is consistent with an asymptotic S-matrix of an exchange form below a momentum scale p*. This scale appears to vanish faster than the Compton scale, mc, as one approaches the critical point, suggesting that a dangerously irrelevant operator may be responsible for the behaviour observed on the lattice.
Resumo:
BACKGROUND: Bilirubin can prevent lipid oxidation in vitro, but the association in vivo with oxidized low-density lipoprotein (Ox-LDL) levels has been poorly explored. Our aim is to the association of Ox-LDL with total bilirubin (TB) levels and with variables related with metabolic syndrome and inflammation, in young obese individuals. FINDINGS: 125 obese patients (13.4 years; 53.6% females) were studied. TB, lipid profile including Ox-LDL, markers of glucose metabolism, and levels of C-reactive protein (CRP) and adiponectin were determined. Anthropometric data was also collected. In all patients, Ox-LDL correlated positively with BMI, total cholesterol, LDLc, triglycerides (TG), CRP, glucose, insulin and HOMAIR; while inversely with TB and HDLc/Total cholesterol ratio (P < 0.05 for all). In multiple linear regression analysis, LDLc, TG, HDLc and TB levels were significantly associated with Ox-LDL (standardized Beta: 0.656, 0.293, -0.283, -0.164, respectively; P < 0.01 for all). After removing TG and HDLc from the analysis, HOMAIR was included in the regression model. In this new model, LDLc remained the best predictor of Ox-LDL levels (β = 0.665, P < 0.001), followed by TB (β = -0.202, P = 0.002) and HOMAIR (β = 0.163, P = 0.010). CONCLUSIONS: Lower bilirubin levels may contribute to increased LDL oxidation in obese children and adolescents, predisposing to increased cardiovascular risk.
Resumo:
The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.
Resumo:
Cette thèse développe des méthodes bootstrap pour les modèles à facteurs qui sont couram- ment utilisés pour générer des prévisions depuis l'article pionnier de Stock et Watson (2002) sur les indices de diffusion. Ces modèles tolèrent l'inclusion d'un grand nombre de variables macroéconomiques et financières comme prédicteurs, une caractéristique utile pour inclure di- verses informations disponibles aux agents économiques. Ma thèse propose donc des outils éco- nométriques qui améliorent l'inférence dans les modèles à facteurs utilisant des facteurs latents extraits d'un large panel de prédicteurs observés. Il est subdivisé en trois chapitres complémen- taires dont les deux premiers en collaboration avec Sílvia Gonçalves et Benoit Perron. Dans le premier article, nous étudions comment les méthodes bootstrap peuvent être utilisées pour faire de l'inférence dans les modèles de prévision pour un horizon de h périodes dans le futur. Pour ce faire, il examine l'inférence bootstrap dans un contexte de régression augmentée de facteurs où les erreurs pourraient être autocorrélées. Il généralise les résultats de Gonçalves et Perron (2014) et propose puis justifie deux approches basées sur les résidus : le block wild bootstrap et le dependent wild bootstrap. Nos simulations montrent une amélioration des taux de couverture des intervalles de confiance des coefficients estimés en utilisant ces approches comparativement à la théorie asymptotique et au wild bootstrap en présence de corrélation sérielle dans les erreurs de régression. Le deuxième chapitre propose des méthodes bootstrap pour la construction des intervalles de prévision permettant de relâcher l'hypothèse de normalité des innovations. Nous y propo- sons des intervalles de prédiction bootstrap pour une observation h périodes dans le futur et sa moyenne conditionnelle. Nous supposons que ces prévisions sont faites en utilisant un ensemble de facteurs extraits d'un large panel de variables. Parce que nous traitons ces facteurs comme latents, nos prévisions dépendent à la fois des facteurs estimés et les coefficients de régres- sion estimés. Sous des conditions de régularité, Bai et Ng (2006) ont proposé la construction d'intervalles asymptotiques sous l'hypothèse de Gaussianité des innovations. Le bootstrap nous permet de relâcher cette hypothèse et de construire des intervalles de prédiction valides sous des hypothèses plus générales. En outre, même en supposant la Gaussianité, le bootstrap conduit à des intervalles plus précis dans les cas où la dimension transversale est relativement faible car il prend en considération le biais de l'estimateur des moindres carrés ordinaires comme le montre une étude récente de Gonçalves et Perron (2014). Dans le troisième chapitre, nous suggérons des procédures de sélection convergentes pour les regressions augmentées de facteurs en échantillons finis. Nous démontrons premièrement que la méthode de validation croisée usuelle est non-convergente mais que sa généralisation, la validation croisée «leave-d-out» sélectionne le plus petit ensemble de facteurs estimés pour l'espace généré par les vraies facteurs. Le deuxième critère dont nous montrons également la validité généralise l'approximation bootstrap de Shao (1996) pour les regressions augmentées de facteurs. Les simulations montrent une amélioration de la probabilité de sélectionner par- cimonieusement les facteurs estimés comparativement aux méthodes de sélection disponibles. L'application empirique revisite la relation entre les facteurs macroéconomiques et financiers, et l'excès de rendement sur le marché boursier américain. Parmi les facteurs estimés à partir d'un large panel de données macroéconomiques et financières des États Unis, les facteurs fortement correlés aux écarts de taux d'intérêt et les facteurs de Fama-French ont un bon pouvoir prédictif pour les excès de rendement.
Resumo:
Background and objective: Participation in colorectal cancer (CRC) screening varies widely among different countries and different socio-demographic groups. Our objective was to assess the effectiveness of three primary-care interventions to increase CRC screening participation among persons over the age of 50 years and to identify the health and socio-demographic-related factors that determine greater participation. Methods: We conducted a randomized experimental study with only one post-test control group. A total of 1,690 subjects were randomly distributed into four groups: written briefing; telephone briefing; an invitation to attend a group meeting; and no briefing. Subjects were evaluated 2 years post-intervention, with the outcome variable being participation in CRC screening. Results: A total of 1,129 subjects were interviewed. Within the groups, homogeneity was tested in terms of socio-demographic characteristics and health-related variables. The proportion of subjects who participated in screening was: 15.4% in the written information group (95% confidence interval [CI]: 11.2-19.7); 28.8% in the telephone information group (95% CI: 23.6-33.9); 8.1% in the face-to-face information group (95% CI: 4.5-11.7); and 5.9% in the control group (95% CI: 2.9-9.0), with this difference proving statistically significant (p < 0.001). Logistic regression showed that only interventions based on written or telephone briefing were effective. Apart from type of intervention, number of reported health problems and place of residence remained in the regression model. Conclusions: Both written and telephone information can serve to improve participation in CRC screening. This preventive activity could be optimized by means of simple interventions coming within the scope of primary health-care professionals.
Resumo:
The aim of the present study was to propose and evaluate the use of factor analysis (FA) in obtaining latent variables (factors) that represent a set of pig traits simultaneously, for use in genome-wide selection (GWS) studies. We used crosses between outbred F2 populations of Brazilian Piau X commercial pigs. Data were obtained on 345 F2 pigs, genotyped for 237 SNPs, with 41 traits. FA allowed us to obtain four biologically interpretable factors: ?weight?, ?fat?, ?loin?, and ?performance?. These factors were used as dependent variables in multiple regression models of genomic selection (Bayes A, Bayes B, RR-BLUP, and Bayesian LASSO). The use of FA is presented as an interesting alternative to select individuals for multiple variables simultaneously in GWS studies; accuracy measurements of the factors were similar to those obtained when the original traits were considered individually. The similarities between the top 10% of individuals selected by the factor, and those selected by the individual traits, were also satisfactory. Moreover, the estimated markers effects for the traits were similar to those found for the relevant factor.
Resumo:
Several deterministic and probabilistic methods are used to evaluate the probability of seismically induced liquefaction of a soil. The probabilistic models usually possess some uncertainty in that model and uncertainties in the parameters used to develop that model. These model uncertainties vary from one statistical model to another. Most of the model uncertainties are epistemic, and can be addressed through appropriate knowledge of the statistical model. One such epistemic model uncertainty in evaluating liquefaction potential using a probabilistic model such as logistic regression is sampling bias. Sampling bias is the difference between the class distribution in the sample used for developing the statistical model and the true population distribution of liquefaction and non-liquefaction instances. Recent studies have shown that sampling bias can significantly affect the predicted probability using a statistical model. To address this epistemic uncertainty, a new approach was developed for evaluating the probability of seismically-induced soil liquefaction, in which a logistic regression model in combination with Hosmer-Lemeshow statistic was used. This approach was used to estimate the population (true) distribution of liquefaction to non-liquefaction instances of standard penetration test (SPT) and cone penetration test (CPT) based most updated case histories. Apart from this, other model uncertainties such as distribution of explanatory variables and significance of explanatory variables were also addressed using KS test and Wald statistic respectively. Moreover, based on estimated population distribution, logistic regression equations were proposed to calculate the probability of liquefaction for both SPT and CPT based case history. Additionally, the proposed probability curves were compared with existing probability curves based on SPT and CPT case histories.
Resumo:
Planning, navigation, and search are fundamental human cognitive abilities central to spatial problem solving in search and rescue, law enforcement, and military operations. Despite a wealth of literature concerning naturalistic spatial problem solving in animals, literature on naturalistic spatial problem solving in humans is comparatively lacking and generally conducted by separate camps among which there is little crosstalk. Addressing this deficiency will allow us to predict spatial decision making in operational environments, and understand the factors leading to those decisions. The present dissertation is comprised of two related efforts, (1) a set of empirical research studies intended to identify characteristics of planning, execution, and memory in naturalistic spatial problem solving tasks, and (2) a computational modeling effort to develop a model of naturalistic spatial problem solving. The results of the behavioral studies indicate that problem space hierarchical representations are linear in shape, and that human solutions are produced according to multiple optimization criteria. The Mixed Criteria Model presented in this dissertation accounts for global and local human performance in a traditional and naturalistic Traveling Salesman Problem. The results of the empirical and modeling efforts hold implications for basic and applied science in domains such as problem solving, operations research, human-computer interaction, and artificial intelligence.
Resumo:
The study analyzed hydro-climatic and land use sensitivities of stormwater runoff and quality in the complex coastal urban watershed of Miami River Basin, Florida by developing a Storm Water Management Model (EPA SWMM 5). Regression-based empirical models were also developed to explain stream water quality in relation to internal (land uses and hydrology) and external (upstream contribution, seawater) sources and drivers in six highly urbanized canal basins of Southeast Florida. Stormwater runoff and quality were most sensitive to rainfall, imperviousness, and conversion of open lands/parks to residential, commercial and industrial areas. In-stream dissolved oxygen and total phosphorus in the watersheds were dictated by internal stressors while external stressors were dominant for total nitrogen and specific conductance. The research findings and tools will be useful for proactive monitoring and management of storm runoff and urban stream water quality under the changing climate and environment in South Florida and around the world.