313 resultados para VARIABLE SELECTION
em Queensland University of Technology - ePrints Archive
Resumo:
The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.
Resumo:
We carried out a discriminant analysis with identity by descent (IBD) at each marker as inputs, and the sib pair type (affected-affected versus affected-unaffected) as the output. Using simple logistic regression for this discriminant analysis, we illustrate the importance of comparing models with different number of parameters. Such model comparisons are best carried out using either the Akaike information criterion (AIC) or the Bayesian information criterion (BIC). When AIC (or BIC) stepwise variable selection was applied to the German Asthma data set, a group of markers were selected which provide the best fit to the data (assuming an additive effect). Interestingly, these 25-26 markers were not identical to those with the highest (in magnitude) single-locus lod scores.
Resumo:
Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.
Resumo:
Motivated by the analysis of the Australian Grain Insect Resistance Database (AGIRD), we develop a Bayesian hurdle modelling approach to assess trends in strong resistance of stored grain insects to phosphine over time. The binary response variable from AGIRD indicating presence or absence of strong resistance is characterized by a majority of absence observations and the hurdle model is a two step approach that is useful when analyzing such a binary response dataset. The proposed hurdle model utilizes Bayesian classification trees to firstly identify covariates and covariate levels pertaining to possible presence or absence of strong resistance. Secondly, generalized additive models (GAMs) with spike and slab priors for variable selection are fitted to the subset of the dataset identified from the Bayesian classification tree indicating possibility of presence of strong resistance. From the GAM we assess trends, biosecurity issues and site specific variables influencing the presence of strong resistance using a variable selection approach. The proposed Bayesian hurdle model is compared to its frequentist counterpart, and also to a naive Bayesian approach which fits a GAM to the entire dataset. The Bayesian hurdle model has the benefit of providing a set of good trees for use in the first step and appears to provide enough flexibility to represent the influence of variables on strong resistance compared to the frequentist model, but also captures the subtle changes in the trend that are missed by the frequentist and naive Bayesian models.
Resumo:
Purpose: Data from two randomized phase III trials were analyzed to evaluate prognostic factors and treatment selection in the first-line management of advanced non-small cell lung cancer patients with performance status (PS) 2. Patients and Methods: Patients randomized to combination chemotherapy (carboplatin and paclitaxel) in one trial and single-agent therapy (gemcitabine or vinorelbine) in the second were included in these analyses. Both studies had identical eligibility criteria and were conducted simultaneously. Comparison of efficacy and safety was performed between the two cohorts. A regression analysis identified prognostic factors and subgroups of patients that may benefit from combination or single-agent therapy. Results: Two hundred one patients were treated with combination and 190 with single-agent therapy. Objective responses were 37 and 15%, respectively. Median time to progression was 4.6 months in the combination arm and 3.5 months in the single-agent arm (p < 0.001). Median survival imes were 8.0 and 6.6 months, and 1-year survival rates were 31 and 26%, respectively. Albumin <3.5 g, extrathoracic metastases, lactate dehydrogenase ≥200 IU, and 2 comorbid conditions predicted outcome. Patients with 0-2 risk factors had similar outcomes independent of treatment, whereas patients with 3-4 factors had a nonsignificant improvement in median survival with combination chemotherapy. Conclusion: Our results show that PS2 non-small cell lung cancer patients are a heterogeneous group who have significantly different outcomes. Patients treated with first-line combination chemotherapy had a higher response and longer time to progression, whereas overall survival did not appear significantly different. A prognostic model may be helpful in selecting PS 2 patients for either treatment strategy. © 2009 by the International Association for the Study of Lung Cancer.
Resumo:
Ureaplasmas are the microorganisms most frequently isolated from the amniotic fluid of pregnant women and can cause chronic intrauterine infections. These tiny bacteria are thought to undergo rapid evolution and exhibit a hypermutatable phenotype; however, little is known about how ureaplasmas respond to selective pressures in utero. Using an ovine model of chronic intra-amniotic infection, we investigated if exposure of ureaplasmas to sub-inhibitory concentrations of erythromycin could induce phenotypic or genetic indicators of macrolide resistance. At 55 days gestation, 12 pregnant ewes received an intra-amniotic injection of a non-clonal, clinical U. parvum strain, followed by: (i) erythromycin treatment (IM, 30 mg/kg/day, n=6); or (ii) saline (IM, n=6) at 100 days gestation. Fetuses were then delivered surgically at 125 days gestation. Despite injecting the same inoculum into all ewes, significant differences between amniotic fluid and chorioamnion ureaplasmas were detected following chronic intra-amniotic infection. Numerous polymorphisms were observed in domain V of the 23S rRNA gene of ureaplasmas isolated from the chorioamnion (but not the amniotic fluid), resulting in a mosaic-like sequence. Chorioamnion isolates also harboured the macrolide resistance genes erm(B) and msr(D) and were associated with variable roxithromycin minimum inhibitory concentrations. Remarkably, this variability occurred independently of exposure of ureaplasmas to erythromycin, suggesting that low-level erythromycin exposure does not induce ureaplasmal macrolide resistance in utero. Rather, the significant differences observed between amniotic fluid and chorioamnion ureaplasmas suggest that different anatomical sites may select for ureaplasma sub-types within non-clonal, clinical strains. This may have implications for the treatment of intrauterine ureaplasma infections.
Resumo:
Aggressive driving has been associated with engagement in other risky driving behaviours, such as speeding; while drivers using their mobile phones have an increased crash risk, despite the tendency to reduce their speed. Research has amassed separately for mobile phone use and aggressive driving among younger drivers, however little is known about the extent to which these behaviours may function independently and in combination to influence speed selection behaviour. The main aim of the current study was to investigate the effect of driver aggression (measured by the Driving Anger Expression Inventory) and mobile phone use on speed selection by young drivers. The CARRS-Q advanced driving simulator was used to test the speed selection of drivers aged 18 to 26 years (N = 32) in a suburban (60kph zone) driving context. A 2 (level of driving anger expression: low, high) X 3 (mobile phone use condition: baseline, hands-free, hand-held) mixed factorial ANOVA was conducted with speed selection as the dependent variable. Results revealed a significant main effect for mobile phone use condition such that speed selection was lowest for the hand-held condition and highest for the baseline condition. Speed selection, however, was not significantly different across the levels of driving anger expression; nor was there a significant interaction effect between the mobile phone use and driving anger expression. As young drivers are over-represented in road crash statistics, future research should further investigate the combined impact of driver aggression and mobile phone use on speed selection.
Resumo:
Aggressive driving has been associated with engagement in other risky driving behaviours, such as speeding; while drivers using their mobile phones have an increased crash risk, despite the tendency to reduce their speed. Research has amassed separately for mobile phone use and aggressive driving among younger drivers, however little is known about the extent to which these behaviours may function independently and in combination to influence speed selection behaviour. The main aim of the current study was to investigate the effect of driver aggression (measured by the Driving Anger Expression Inventory) and mobile phone use on speed selection by young drivers. The CARRS-Q advanced driving simulator was used to test the speed selection of drivers aged 18 to 26 years (N = 32) in a suburban (60kph zone) driving context. A 2 (level of driving anger expression: low, high) X 3 (mobile phone use condition: baseline, hands-free, hand-held) mixed factorial ANOVA was conducted with speed selection as the dependent variable. Results revealed a significant main effect for mobile phone use condition such that speed selection was lowest for the hand-held condition and highest for the baseline condition. Speed selection, however, was not significantly different across the levels of driving anger expression; nor was there a significant interaction effect between the mobile phone use and driving anger expression. As young drivers are over-represented in road crash statistics, future research should further investigate the combined impact of driver aggression and mobile phone use on speed selection.
Resumo:
Consider a general regression model with an arbitrary and unknown link function and a stochastic selection variable that determines whether the outcome variable is observable or missing. The paper proposes U-statistics that are based on kernel functions as estimators for the directions of the parameter vectors in the link function and the selection equation, and shows that these estimators are consistent and asymptotically normal.
Resumo:
The specific mechanisms by which selective pressures affect individuals are often difficult to resolve. In tephritid fruit flies, males respond strongly and positively to certain plant derived chemicals. Sexual selection by female choice has been hypothesized as the mechanism driving this behaviour in certain species, as females preferentially mate with males that have fed on these chemicals. This hypothesis is, to date, based on studies of only very few species and its generality is largely untested. We tested the hypothesis on different spatial scales (small cage and seminatural field-cage) using the monophagous fruit fly, Bactrocera cacuminata. This species is known to respond to methyl eugenol (ME), a chemical found in many plant species and one upon which previous studies have focused. Contrary to expectation, no obvious female choice was apparent in selecting ME-fed males over unfed males as measured by the number of matings achieved over time, copulation duration, or time of copulation initiation. However, the number of matings achieved by ME-fed males was significantly greater than unfed males 16 and 32 days after exposure to ME in small cages (but not in a field-cage). This delayed advantage suggests that ME may not influence the pheromone system of B. cacuminata but may have other consequences, acting on some other fitness consequence (e.g., enhancement of physiology or survival) of male exposure to these chemicals. We discuss the ecological and evolutionary implications of our findings to explore alternate hypotheses to explain the patterns of response of dacine fruit flies to specific plant-derived chemicals.