971 resultados para Bayesian variable selection
Resumo:
This thesis presents Bayesian solutions to inference problems for three types of social network data structures: a single observation of a social network, repeated observations on the same social network, and repeated observations on a social network developing through time. A social network is conceived as being a structure consisting of actors and their social interaction with each other. A common conceptualisation of social networks is to let the actors be represented by nodes in a graph with edges between pairs of nodes that are relationally tied to each other according to some definition. Statistical analysis of social networks is to a large extent concerned with modelling of these relational ties, which lends itself to empirical evaluation. The first paper deals with a family of statistical models for social networks called exponential random graphs that takes various structural features of the network into account. In general, the likelihood functions of exponential random graphs are only known up to a constant of proportionality. A procedure for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods is presented. The algorithm consists of two basic steps, one in which an ordinary Metropolis-Hastings up-dating step is used, and another in which an importance sampling scheme is used to calculate the acceptance probability of the Metropolis-Hastings step. In paper number two a method for modelling reports given by actors (or other informants) on their social interaction with others is investigated in a Bayesian framework. The model contains two basic ingredients: the unknown network structure and functions that link this unknown network structure to the reports given by the actors. These functions take the form of probit link functions. An intrinsic problem is that the model is not identified, meaning that there are combinations of values on the unknown structure and the parameters in the probit link functions that are observationally equivalent. Instead of using restrictions for achieving identification, it is proposed that the different observationally equivalent combinations of parameters and unknown structure be investigated a posteriori. Estimation of parameters is carried out using Gibbs sampling with a switching devise that enables transitions between posterior modal regions. The main goal of the procedures is to provide tools for comparisons of different model specifications. Papers 3 and 4, propose Bayesian methods for longitudinal social networks. The premise of the models investigated is that overall change in social networks occurs as a consequence of sequences of incremental changes. Models for the evolution of social networks using continuos-time Markov chains are meant to capture these dynamics. Paper 3 presents an MCMC algorithm for exploring the posteriors of parameters for such Markov chains. More specifically, the unobserved evolution of the network in-between observations is explicitly modelled thereby avoiding the need to deal with explicit formulas for the transition probabilities. This enables likelihood based parameter inference in a wider class of network evolution models than has been available before. Paper 4 builds on the proposed inference procedure of Paper 3 and demonstrates how to perform model selection for a class of network evolution models.
Resumo:
This thesis presents a creative and practical approach to dealing with the problem of selection bias. Selection bias may be the most important vexing problem in program evaluation or in any line of research that attempts to assert causality. Some of the greatest minds in economics and statistics have scrutinized the problem of selection bias, with the resulting approaches – Rubin’s Potential Outcome Approach(Rosenbaum and Rubin,1983; Rubin, 1991,2001,2004) or Heckman’s Selection model (Heckman, 1979) – being widely accepted and used as the best fixes. These solutions to the bias that arises in particular from self selection are imperfect, and many researchers, when feasible, reserve their strongest causal inference for data from experimental rather than observational studies. The innovative aspect of this thesis is to propose a data transformation that allows measuring and testing in an automatic and multivariate way the presence of selection bias. The approach involves the construction of a multi-dimensional conditional space of the X matrix in which the bias associated with the treatment assignment has been eliminated. Specifically, we propose the use of a partial dependence analysis of the X-space as a tool for investigating the dependence relationship between a set of observable pre-treatment categorical covariates X and a treatment indicator variable T, in order to obtain a measure of bias according to their dependence structure. The measure of selection bias is then expressed in terms of inertia due to the dependence between X and T that has been eliminated. Given the measure of selection bias, we propose a multivariate test of imbalance in order to check if the detected bias is significant, by using the asymptotical distribution of inertia due to T (Estadella et al. 2005) , and by preserving the multivariate nature of data. Further, we propose the use of a clustering procedure as a tool to find groups of comparable units on which estimate local causal effects, and the use of the multivariate test of imbalance as a stopping rule in choosing the best cluster solution set. The method is non parametric, it does not call for modeling the data, based on some underlying theory or assumption about the selection process, but instead it calls for using the existing variability within the data and letting the data to speak. The idea of proposing this multivariate approach to measure selection bias and test balance comes from the consideration that in applied research all aspects of multivariate balance, not represented in the univariate variable- by-variable summaries, are ignored. The first part contains an introduction to evaluation methods as part of public and private decision process and a review of the literature of evaluation methods. The attention is focused on Rubin Potential Outcome Approach, matching methods, and briefly on Heckman’s Selection Model. The second part focuses on some resulting limitations of conventional methods, with particular attention to the problem of how testing in the correct way balancing. The third part contains the original contribution proposed , a simulation study that allows to check the performance of the method for a given dependence setting and an application to a real data set. Finally, we discuss, conclude and explain our future perspectives.
Resumo:
In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.
Resumo:
Major histocompatibility complex (MHC) antigen-presenting genes are the most variable loci in vertebrate genomes. Host-parasite co-evolution is assumed to maintain the excessive polymorphism in the MHC loci. However, the molecular mechanisms underlying the striking diversity in the MHC remain contentious. The extent to which recombination contributes to the diversity at MHC loci in natural populations is still controversial, and there have been only few comparative studies that make quantitative estimates of recombination rates. In this study, we performed a comparative analysis for 15 different ungulates species to estimate the population recombination rate, and to quantify levels of selection. As expected for all species, we observed signatures of strong positive selection, and identified individual residues experiencing selection that were congruent with those constituting the peptide-binding region of the human DRB gene. However, in addition for each species, we also observed recombination rates that were significantly different from zero on the basis of likelihood-permutation tests, and in other non-quantitative analyses. Patterns of synonymous and non-synonymous sequence diversity were consistent with differing demographic histories between species, but recent simulation studies by other authors suggest inference of selection and recombination is likely to be robust to such deviations from standard models. If high rates of recombination are common in MHC genes of other taxa, re-evaluation of many inference-based phylogenetic analyses of MHC loci, such as estimates of the divergence time of alleles and trans-specific polymorphism, may be required.
Resumo:
The majority of mutations that cause isolated GH deficiency type II (IGHD II) affect splicing of GH-1 transcripts and produce a dominant-negative GH isoform lacking exon 3 resulting in a 17.5-kDa isoform, which further leads to disruption of the GH secretory pathway. A clinical variability in the severity of the IGHD II phenotype depending on the GH-1 gene alteration has been reported, and in vitro and transgenic animal data suggest that the onset and severity of the phenotype relates to the proportion of 17.5-kDa produced. The removal of GH in IGHD creates a positive feedback loop driving more GH expression, which may itself increase 17.5-kDa isoform productions from alternate splice sites in the mutated GH-1 allele. In this study, we aimed to test this idea by comparing the impact of stimulated expression by glucocorticoids on the production of different GH isoforms from wild-type (wt) and mutant GH-1 genes, relying on the glucocorticoid regulatory element within intron 1 in the GH-1 gene. AtT-20 cells were transfected with wt-GH or mutated GH-1 variants (5'IVS-3 + 2-bp T->C; 5'IVS-3 + 6 bp T->C; ISEm1: IVS-3 + 28 G->A) known to cause clinical IGHD II of varying severity. Cells were stimulated with 1 and 10 mum dexamethasone (DEX) for 24 h, after which the relative amounts of GH-1 splice variants were determined by semiquantitative and quantitative (TaqMan) RT-PCR. In the absence of DEX, only around 1% wt-GH-1 transcripts were the 17.5-kDa isoform, whereas the three mutant GH-1 variants produced 29, 39, and 78% of the 17.5-kDa isoform. DEX stimulated total GH-1 gene transcription from all constructs. Notably, however, DEX increased the amount of 17.5-kDa GH isoform relative to the 22- and 20-kDa isoforms produced from the mutated GH-1 variants, but not from wt-GH-1. This DEX-induced enhancement of 17.5-kDa GH isoform production, up to 100% in the most severe case, was completely blocked by the addition of RU486. In other studies, we measured cell proliferation rates, annexin V staining, and DNA fragmentation in cells transfected with the same GH-1 constructs. The results showed that that the 5'IVS-3 + 2-bp GH-1 gene mutation had a more severe impact on those measures than the splice site mutations within 5'IVS-3 + 6 bp or ISE +28, in line with the clinical severity observed with these mutations. Our findings that the proportion of 17.5-kDa produced from mutant GH-1 alleles increases with increased drive for gene expression may help to explain the variable onset progression, and severity observed in IGHD II.
Resumo:
In a matched experimental design, the effectiveness of matching in reducing bias and increasing power depends on the strength of the association between the matching variable and the outcome of interest. In particular, in the design of a community health intervention trial, the effectiveness of a matched design, where communities are matched according to some community characteristic, depends on the strength of the correlation between the matching characteristic and the change in the health behavior being measured. We attempt to estimate the correlation between community characteristics and changes in health behaviors in four datasets from community intervention trials and observational studies. Community characteristics that are highly correlated with changes in health behaviors would potentially be effective matching variables in studies of health intervention programs designed to change those behaviors. Among the community characteristics considered, the urban-rural character of the community was the most highly correlated with changes in health behaviors. The correlations between Per Capita Income, Percent Low Income & Percent aged over 65 and changes in health behaviors were marginally statistically significant (p < 0.08).
Resumo:
Traffic particle concentrations show considerable spatial variability within a metropolitan area. We consider latent variable semiparametric regression models for modeling the spatial and temporal variability of black carbon and elemental carbon concentrations in the greater Boston area. Measurements of these pollutants, which are markers of traffic particles, were obtained from several individual exposure studies conducted at specific household locations as well as 15 ambient monitoring sites in the city. The models allow for both flexible, nonlinear effects of covariates and for unexplained spatial and temporal variability in exposure. In addition, the different individual exposure studies recorded different surrogates of traffic particles, with some recording only outdoor concentrations of black or elemental carbon, some recording indoor concentrations of black carbon, and others recording both indoor and outdoor concentrations of black carbon. A joint model for outdoor and indoor exposure that specifies a spatially varying latent variable provides greater spatial coverage in the area of interest. We propose a penalised spline formation of the model that relates to generalised kringing of the latent traffic pollution variable and leads to a natural Bayesian Markov Chain Monte Carlo algorithm for model fitting. We propose methods that allow us to control the degress of freedom of the smoother in a Bayesian framework. Finally, we present results from an analysis that applies the model to data from summer and winter separately
Resumo:
The purpose of this study is to develop statistical methodology to facilitate indirect estimation of the concentration of antiretroviral drugs and viral loads in the prostate gland and the seminal vesicle. The differences in antiretroviral drug concentrations in these organs may lead to suboptimal concentrations in one gland compared to the other. Suboptimal levels of the antiretroviral drugs will not be able to fully suppress the virus in that gland, lead to a source of sexually transmissible virus and increase the chance of selecting for drug resistant virus. This information may be useful selecting antiretroviral drug regimen that will achieve optimal concentrations in most of male genital tract glands. Using fractionally collected semen ejaculates, Lundquist (1949) measured levels of surrogate markers in each fraction that are uniquely produced by specific male accessory glands. To determine the original glandular concentrations of the surrogate markers, Lundquist solved a simultaneous series of linear equations. This method has several limitations. In particular, it does not yield a unique solution, it does not address measurement error, and it disregards inter-subject variability in the parameters. To cope with these limitations, we developed a mechanistic latent variable model based on the physiology of the male genital tract and surrogate markers. We employ a Bayesian approach and perform a sensitivity analysis with regard to the distributional assumptions on the random effects and priors. The model and Bayesian approach is validated on experimental data where the concentration of a drug should be (biologically) differentially distributed between the two glands. In this example, the Bayesian model-based conclusions are found to be robust to model specification and this hierarchical approach leads to more scientifically valid conclusions than the original methodology. In particular, unlike existing methods, the proposed model based approach was not affected by a common form of outliers.
Resumo:
Experimental work and analysis was done to investigate engine startup robustness and emissions of a flex-fuel spark ignition (SI) direct injection (DI) engine. The vaporization and other characteristics of ethanol fuel blends present a challenge at engine startup. Strategies to reduce the enrichment requirements for the first engine startup cycle and emissions for the second and third fired cycle at 25°C ± 1°C engine and intake air temperature were investigated. Research work was conducted on a single cylinder SIDI engine with gasoline and E85 fuels, to study the effect on first fired cycle of engine startup. Piston configurations that included a compression ratio change (11 vs 15.5) and piston geometry change (flattop vs bowl) were tested, along with changes in intake cam timing (95,110,125) and fuel pressure (0.4 MPa vs 3 MPa). The goal was to replicate the engine speed, manifold pressure, fuel pressure and testing temperature from an engine startup trace for investigating the first fired cycle for the engine. Results showed bowl piston was able to enable lower equivalence ratio engine starts with gasoline fuel, while also showing lower IMEP at the same equivalence ratio compared to flat top piston. With E85, bowl piston showed reduced IMEP as compression ratio increased at the same equivalence ratio. A preference for constant intake valve timing across fuels seemed to indicate that flattop piston might be a good flex-fuel piston. Significant improvements were seen with higher CR bowl piston with high fuel pressure starts, but showed no improvement with low fuel pressures. Simulation work was conducted to analyze initial three cycles of engine startup in GT-POWER for the same set of hardware used in the experimentations. A steady state validated model was modified for startup conditions. The results of which allowed an understanding of the relative residual levels and IMEP at the test points in the cam phasing space. This allowed selecting additional test points that enable use of higher residual levels, eliminating those with smaller trapped mass incapable of producing required IMEP for proper engine turnover. The second phase of experimental testing results for 2nd and 3rd startup cycle revealed both E10 and E85 prefer the same SOI of 240°bTDC at second and third startup cycle for the flat top piston and high injection pressures. E85 fuel optimal cam timing for startup showed that it tolerates more residuals compared to E10 fuel. Higher internal residuals drives down the Ø requirement for both fuels up to their combustion stability limit, this is thought to be direct benefit to vaporization due to increased cycle start temperature. Benefits are shown for an advance IMOP and retarded EMOP strategy at engine startup. Overall the amount of residuals preferred by an engine for E10 fuel at startup is thought to be constant across engine speed, thus could enable easier selection of optimized cam positions across the startup speeds.
Resumo:
Invasive species often evolve rapidly in response to the novel biotic and abiotic conditions in their introduced range. Such adaptive evolutionary changes might play an important role in the success of some invasive species. Here, we investigated whether introduced European populations of the South African ragwort Senecio inaequidens (Asteraceae) have genetically diverged from native populations. We carried out a greenhouse experiment where 12 South African and 11 European populations were for several months grown at two levels of nutrient availability, as well as in the presence or absence of a generalist insect herbivore. We found that, in contrast to a current hypothesis, plants from introduced populations had a significantly lower reproductive output, but higher allocation to root biomass, and they were more tolerant to insect herbivory. Moreover, introduced populations were less genetically variable, but displayed greater plasticity in response to fertilization. Finally, introduced populations were phenotypically most similar to a subset of native populations from mountainous regions in southern Africa. Taking into account the species' likely history of introduction, our data support the idea that the invasion success of Senecio inaequidens in Central Europe is based on selective introduction of specific preadapted and plastic genotypes rather than on adaptive evolution in the introduced range.
Resumo:
OBJECTIVE To compare the effects of antiplatelets and anticoagulants on stroke and death in patients with acute cervical artery dissection. DESIGN Systematic review with Bayesian meta-analysis. DATA SOURCES The reviewers searched MEDLINE and EMBASE from inception to November 2012, checked reference lists, and contacted authors. STUDY SELECTION Studies were eligible if they were randomised, quasi-randomised or observational comparisons of antiplatelets and anticoagulants in patients with cervical artery dissection. DATA EXTRACTION Data were extracted by one reviewer and checked by another. Bayesian techniques were used to appropriately account for studies with scarce event data and imbalances in the size of comparison groups. DATA SYNTHESIS Thirty-seven studies (1991 patients) were included. We found no randomised trial. The primary analysis revealed a large treatment effect in favour of antiplatelets for preventing the primary composite outcome of ischaemic stroke, intracranial haemorrhage or death within the first 3 months after treatment initiation (relative risk 0.32, 95% credibility interval 0.12 to 0.63), while the degree of between-study heterogeneity was moderate (τ(2) = 0.18). In an analysis restricted to studies of higher methodological quality, the possible advantage of antiplatelets over anticoagulants was less obvious than in the main analysis (relative risk 0.73, 95% credibility interval 0.17 to 2.30). CONCLUSION In view of these results and the safety advantages, easier usage and lower cost of antiplatelets, we conclude that antiplatelets should be given precedence over anticoagulants as a first line treatment in patients with cervical artery dissection unless results of an adequately powered randomised trial suggest the opposite.
Resumo:
This dissertation explores phase I dose-finding designs in cancer trials from three perspectives: the alternative Bayesian dose-escalation rules, a design based on a time-to-dose-limiting toxicity (DLT) model, and a design based on a discrete-time multi-state (DTMS) model. We list alternative Bayesian dose-escalation rules and perform a simulation study for the intra-rule and inter-rule comparisons based on two statistical models to identify the most appropriate rule under certain scenarios. We provide evidence that all the Bayesian rules outperform the traditional ``3+3'' design in the allocation of patients and selection of the maximum tolerated dose. The design based on a time-to-DLT model uses patients' DLT information over multiple treatment cycles in estimating the probability of DLT at the end of treatment cycle 1. Dose-escalation decisions are made whenever a cycle-1 DLT occurs, or two months after the previous check point. Compared to the design based on a logistic regression model, the new design shows more safety benefits for trials in which more late-onset toxicities are expected. As a trade-off, the new design requires more patients on average. The design based on a discrete-time multi-state (DTMS) model has three important attributes: (1) Toxicities are categorized over a distribution of severity levels, (2) Early toxicity may inform dose escalation, and (3) No suspension is required between accrual cohorts. The proposed model accounts for the difference in the importance of the toxicity severity levels and for transitions between toxicity levels. We compare the operating characteristics of the proposed design with those from a similar design based on a fully-evaluated model that directly models the maximum observed toxicity level within the patients' entire assessment window. We describe settings in which, under comparable power, the proposed design shortens the trial. The proposed design offers more benefit compared to the alternative design as patient accrual becomes slower.
Resumo:
In 2011, there will be an estimated 1,596,670 new cancer cases and 571,950 cancer-related deaths in the US. With the ever-increasing applications of cancer genetics in epidemiology, there is great potential to identify genetic risk factors that would help identify individuals with increased genetic susceptibility to cancer, which could be used to develop interventions or targeted therapies that could hopefully reduce cancer risk and mortality. In this dissertation, I propose to develop a new statistical method to evaluate the role of haplotypes in cancer susceptibility and development. This model will be flexible enough to handle not only haplotypes of any size, but also a variety of covariates. I will then apply this method to three cancer-related data sets (Hodgkin Disease, Glioma, and Lung Cancer). I hypothesize that there is substantial improvement in the estimation of association between haplotypes and disease, with the use of a Bayesian mathematical method to infer haplotypes that uses prior information from known genetics sources. Analysis based on haplotypes using information from publically available genetic sources generally show increased odds ratios and smaller p-values in both the Hodgkin, Glioma, and Lung data sets. For instance, the Bayesian Joint Logistic Model (BJLM) inferred haplotype TC had a substantially higher estimated effect size (OR=12.16, 95% CI = 2.47-90.1 vs. 9.24, 95% CI = 1.81-47.2) and more significant p-value (0.00044 vs. 0.008) for Hodgkin Disease compared to a traditional logistic regression approach. Also, the effect sizes of haplotypes modeled with recessive genetic effects were higher (and had more significant p-values) when analyzed with the BJLM. Full genetic models with haplotype information developed with the BJLM resulted in significantly higher discriminatory power and a significantly higher Net Reclassification Index compared to those developed with haplo.stats for lung cancer. Future analysis for this work could be to incorporate the 1000 Genomes project, which offers a larger selection of SNPs can be incorporated into the information from known genetic sources as well. Other future analysis include testing non-binary outcomes, like the levels of biomarkers that are present in lung cancer (NNK), and extending this analysis to full GWAS studies.
Continental-Scale Footprint of Balancing and Positive Selection in a Small Rodent (Microtus arvalis)
Resumo:
Genetic adaptation to different environmental conditions is expected to lead to large differences between populations at selected loci, thus providing a signature of positive selection. Whereas balancing selection can maintain polymorphisms over long evolutionary periods and even geographic scale, thus leads to low levels of divergence between populations at selected loci. However, little is known about the relative importance of these two selective forces in shaping genomic diversity, partly due to difficulties in recognizing balancing selection in species showing low levels of differentiation. Here we address this problem by studying genomic diversity in the European common vole (Microtus arvalis) presenting high levels of differentiation between populations (average FST = 0.31). We studied 3,839 Amplified Fragment Length Polymorphism (AFLP) markers genotyped in 444 individuals from 21 populations distributed across the European continent and hence over different environmental conditions. Our statistical approach to detect markers under selection is based on a Bayesian method specifically developed for AFLP markers, which treats AFLPs as a nearly codominant marker system, and therefore has increased power to detect selection. The high number of screened populations allowed us to detect the signature of balancing selection across a large geographic area. We detected 33 markers potentially under balancing selection, hence strong evidence of stabilizing selection in 21 populations across Europe. However, our analyses identified four-times more markers (138) being under positive selection, and geographical patterns suggest that some of these markers are probably associated with alpine regions, which seem to have environmental conditions that favour adaptation. We conclude that despite favourable conditions in this study for the detection of balancing selection, this evolutionary force seems to play a relatively minor role in shaping the genomic diversity of the common vole, which is more influenced by positive selection and neutral processes like drift and demographic history.
Resumo:
Many public health agencies and researchers are interested in comparing hospital outcomes, for example, morbidity, mortality, and hospitalization across areas and hospitals. However, since there is variation of rates in clinical trials among hospitals because of several biases, we are interested in controlling for the bias and assessing real differences in clinical practices. In this study, we compared the variations between hospitals in rates of severe Intraventricular Haemorrhage (IVH) infant using Frequentist statistical approach vs. Bayesian hierarchical model through simulation study. The template data set for simulation study was included the number of severe IVH infants of 24 intensive care units in Australian and New Zealand Neonatal Network from 1995 to 1997 in severe IVH rate in preterm babies. We evaluated the rates of severe IVH for 24 hospitals with two hierarchical models in Bayesian approach comparing their performances with the shrunken rates in Frequentist method. Gamma-Poisson (BGP) and Beta-Binomial (BBB) were introduced into Bayesian model and the shrunken estimator of Gamma-Poisson (FGP) hierarchical model using maximum likelihood method were calculated as Frequentist approach. To simulate data, the total number of infants in each hospital was kept and we analyzed the simulated data for both Bayesian and Frequentist models with two true parameters for severe IVH rate. One was the observed rate and the other was the expected severe IVH rate by adjusting for five predictors variables for the template data. The bias in the rate of severe IVH infant estimated by both models showed that Bayesian models gave less variable estimates than Frequentist model. We also discussed and compared the results from three models to examine the variation in rate of severe IVH by 20th centile rates and avoidable number of severe IVH cases. ^