932 resultados para Markov chains. Convergence. Evolutionary Strategy. Large Deviations
Resumo:
Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases. This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. Convergence of the proposed iterative procedure is guaranteed for canonical link functions. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. A simulation study demonstrates the algorithm's reliability in analyzing a data set with 12 million records for a (non-collapsible) logistic regression model.
Resumo:
Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Resumo:
In this dissertation, the problem of creating effective large scale Adaptive Optics (AO) systems control algorithms for the new generation of giant optical telescopes is addressed. The effectiveness of AO control algorithms is evaluated in several respects, such as computational complexity, compensation error rejection and robustness, i.e. reasonable insensitivity to the system imperfections. The results of this research are summarized as follows: 1. Robustness study of Sparse Minimum Variance Pseudo Open Loop Controller (POLC) for multi-conjugate adaptive optics (MCAO). The AO system model that accounts for various system errors has been developed and applied to check the stability and performance of the POLC algorithm, which is one of the most promising approaches for the future AO systems control. It has been shown through numerous simulations that, despite the initial assumption that the exact system knowledge is necessary for the POLC algorithm to work, it is highly robust against various system errors. 2. Predictive Kalman Filter (KF) and Minimum Variance (MV) control algorithms for MCAO. The limiting performance of the non-dynamic Minimum Variance and dynamic KF-based phase estimation algorithms for MCAO has been evaluated by doing Monte-Carlo simulations. The validity of simple near-Markov autoregressive phase dynamics model has been tested and its adequate ability to predict the turbulence phase has been demonstrated both for single- and multiconjugate AO. It has also been shown that there is no performance improvement gained from the use of the more complicated KF approach in comparison to the much simpler MV algorithm in the case of MCAO. 3. Sparse predictive Minimum Variance control algorithm for MCAO. The temporal prediction stage has been added to the non-dynamic MV control algorithm in such a way that no additional computational burden is introduced. It has been confirmed through simulations that the use of phase prediction makes it possible to significantly reduce the system sampling rate and thus overall computational complexity while both maintaining the system stable and effectively compensating for the measurement and control latencies.
Evolutionary demography of long-lived monocarpic perennials: a time-lagged integral projection model
Resumo:
1. The evolution of flowering strategies (when and at what size to flower) in monocarpic perennials is determined by balancing current reproduction with expected future reproduction, and these are largely determined by size-specific patterns of growth and survival. However, because of the difficulty in following long-lived individuals throughout their lives, this theory has largely been tested using short-lived species (< 5 years). 2. Here, we tested this theory using the long-lived monocarpic perennial Campanula thyrsoides which can live up to 16 years. We used a novel approach that combined permanent plot and herb chronology data from a 3-year field study to parameterize and validate integral projection models (IPMs). 3. Similar to other monocarpic species, the rosette leaves of C. thyrsoides wither over winter and so size cannot be measured in the year of flowering. We therefore extended the existing IPM framework to incorporate an additional time delay that arises because flowering demography must be predicted from rosette size in the year before flowering. 4. We found that all main demographic functions (growth, survival probability, flowering probability and fecundity) were strongly size-dependent and there was a pronounced threshold size of flowering. There was good agreement between the predicted distribution of flowering ages obtained from the IPMs and that estimated in the field. Mostly, there was good agreement between the IPM predictions and the direct quantitative field measurements regarding the demographic parameters lambda, R-0 and T. We therefore conclude that the model captures the main demographic features of the field populations. 5. Elasticity analysis indicated that changes in the survival and growth function had the largest effect (c. 80%) on lambda and this was considerably larger than in short-lived monocarps. We found only weak selection pressure operating on the observed flowering strategy which was close to the predicted evolutionary stable strategy. 6. Synthesis. The extended IPM accurately described the demography of a long-lived monocarpic perennial using data collected over a relatively short period. We could show that the evolution of flowering strategies in short- and long-lived monocarps seem to follow the same general rules but with a longevity-related emphasis on survival over fecundity.
Resumo:
According to life-history theory age-dependent investments into reproduction are thought to co-vary with survival and growth of animals. In polygynous species, in which size is an important determinant of reproductive success, male reproduction via alternative mating tactics at young age are consequently expected to be the less frequent in species with higher survival. We tested this hypothesis in male Alpine ibex (Capra ibex), a highly sexually dimorphic mountain ungulate whose males have been reported to exhibit extremely high adult survival rates. Using data from two offspring cohorts in a population in the Swiss Alps, the effects of age, dominance and mating tactic on the likelihood of paternity were inferred within a Bayesian framework. In accordance with our hypothesis, reproductive success in male Alpine ibex was heavily biased towards older, dominant males that monopolized access to receptive females by adopting the 'tending' tactic, while success among young, subordinate males via the sneaking tactic 'coursing' was in general low and rare. In addition, we detected a high reproductive skew in male Alpine ibex, suggesting a large opportunity for selection. Compared with other ungulates with higher mortality rates, reproduction among young male Alpine ibex was much lower and more sporadic. Consistent with that, further examinations on the species level indicated that in polygynous ungulates the significance of early reproduction appears to decrease with increasing survival. Overall, this study supports the theory that survival prospects of males modulate the investments into reproduction via alternative mating tactics early in life. In the case of male Alpine ibex, the results indicate that their life-history strategy targets for long life, slow and prolonged growth and late reproduction.
Resumo:
Recently divergent species that can hybridize are ideal models for investigating the genetic exchanges that can occur while preserving the species boundaries. Petunia exserta is an endemic species from a very limited and specific area that grows exclusively in rocky shelters. These shaded spots are an inhospitable habitat for all other Petunia species, including the closely related and widely distributed species P. axillaris. Individuals with intermediate morphologic characteristics have been found near the rocky shelters and were believed to be putative hybrids between P. exserta and P. axillaris, suggesting a situation where Petunia exserta is losing its genetic identity. In the current study, we analyzed the plastid intergenic spacers trnS/trnG and trnH/psbA and six nuclear CAPS markers in a large sampling design of both species to understand the evolutionary process occurring in this biological system. Bayesian clustering methods, cpDNA haplotype networks, genetic diversity statistics, and coalescence-based analyses support a scenario where hybridization occurs while two genetic clusters corresponding to two species are maintained. Our results reinforce the importance of coupling differentially inherited markers with an extensive geographic sample to assess the evolutionary dynamics of recently diverged species that can hybridize. (C) 2013 Elsevier Inc. All rights reserved.
Resumo:
The success of combination antiretroviral therapy is limited by the evolutionary escape dynamics of HIV-1. We used Isotonic Conjunctive Bayesian Networks (I-CBNs), a class of probabilistic graphical models, to describe this process. We employed partial order constraints among viral resistance mutations, which give rise to a limited set of mutational pathways, and we modeled phenotypic drug resistance as monotonically increasing along any escape pathway. Using this model, the individualized genetic barrier (IGB) to each drug is derived as the probability of the virus not acquiring additional mutations that confer resistance. Drug-specific IGBs were combined to obtain the IGB to an entire regimen, which quantifies the virus' genetic potential for developing drug resistance under combination therapy. The IGB was tested as a predictor of therapeutic outcome using between 2,185 and 2,631 treatment change episodes of subtype B infected patients from the Swiss HIV Cohort Study Database, a large observational cohort. Using logistic regression, significant univariate predictors included most of the 18 drugs and single-drug IGBs, the IGB to the entire regimen, the expert rules-based genotypic susceptibility score (GSS), several individual mutations, and the peak viral load before treatment change. In the multivariate analysis, the only genotype-derived variables that remained significantly associated with virological success were GSS and, with 10-fold stronger association, IGB to regimen. When predicting suppression of viral load below 400 cps/ml, IGB outperformed GSS and also improved GSS-containing predictors significantly, but the difference was not significant for suppression below 50 cps/ml. Thus, the IGB to regimen is a novel data-derived predictor of treatment outcome that has potential to improve the interpretation of genotypic drug resistance tests.
Resumo:
A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a "cosmopolitan" tagging approach to capture the genetic diversity across approximately 2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions.
Resumo:
Clays and claystones are used as backfill and barrier materials in the design of waste repositories, because they act as hydraulic barriers and retain contaminants. Transport through such barriers occurs mainly by molecular diffusion. There is thus an interest to relate the diffusion properties of clays to their structural properties. In previous work, we have developed a concept for up-scaling pore-scale molecular diffusion coefficients using a grid-based model for the sample pore structure. Here we present an operational algorithm which can generate such model pore structures of polymineral materials. The obtained pore maps match the rock’s mineralogical components and its macroscopic properties such as porosity, grain and pore size distributions. Representative ensembles of grains in 2D or 3D are created by a lattice Monte Carlo (MC) method, which minimizes the interfacial energy of grains starting from an initial grain distribution. Pores are generated at grain boundaries and/or within grains. The method is general and allows to generate anisotropic structures with grains of approximately predetermined shapes, or with mixtures of different grain types. A specific focus of this study was on the simulation of clay-like materials. The generated clay pore maps were then used to derive upscaled effective diffusion coefficients for non-sorbing tracers using a homogenization technique. The large number of generated maps allowed to check the relations between micro-structural features of clays and their effective transport parameters, as is required to explain and extrapolate experimental diffusion results. As examples, we present a set of 2D and 3D simulations and investigated the effects of nanopores within particles (interlayer pores) and micropores between particles. Archie’s simple power law is followed in systems with only micropores. When nanopores are present, additional parameters are required; the data reveal that effective diffusion coefficients could be described by a sum of two power functions, related to the micro- and nanoporosity. We further used the model to investigate the relationships between particle orientation and effective transport properties of the sample.
Resumo:
Variable number of tandem repeats (VNTR) are genetic loci at which short sequence motifs are found repeated different numbers of times among chromosomes. To explore the potential utility of VNTR loci in evolutionary studies, I have conducted a series of studies to address the following questions: (1) What are the population genetic properties of these loci? (2) What are the mutational mechanisms of repeat number change at these loci? (3) Can DNA profiles be used to measure the relatedness between a pair of individuals? (4) Can DNA fingerprint be used to measure the relatedness between populations in evolutionary studies? (5) Can microsatellite and short tandem repeat (STR) loci which mutate stepwisely be used in evolutionary analyses?^ A large number of VNTR loci typed in many populations were studied by means of statistical methods developed recently. The results of this work indicate that there is no significant departure from Hardy-Weinberg expectation (HWE) at VNTR loci in most of the human populations examined, and the departure from HWE in some VNTR loci are not solely caused by the presence of population sub-structure.^ A statistical procedure is developed to investigate the mutational mechanisms of VNTR loci by studying the allele frequency distributions of these loci. Comparisons of frequency distribution data on several hundreds VNTR loci with the predictions of two mutation models demonstrated that there are differences among VNTR loci grouped by repeat unit sizes.^ By extending the ITO method, I derived the distribution of the number of shared bands between individuals with any kinship relationship. A maximum likelihood estimation procedure is proposed to estimate the relatedness between individuals from the observed number of shared bands between them.^ It was believed that classical measures of genetic distance are not applicable to analysis of DNA fingerprints which reveal many minisatellite loci simultaneously in the genome, because the information regarding underlying alleles and loci is not available. I proposed a new measure of genetic distance based on band sharing between individuals that is applicable to DNA fingerprint data.^ To address the concern that microsatellite and STR loci may not be useful for evolutionary studies because of the convergent nature of their mutation mechanisms, by a theoretical study as well as by computer simulation, I conclude that the possible bias caused by the convergent mutations can be corrected, and a novel measure of genetic distance that makes the correction is suggested. In summary, I conclude that hypervariable VNTR loci are useful in evolutionary studies of closely related populations or species, especially in the study of human evolution and the history of geographic dispersal of Homo sapiens. (Abstract shortened by UMI.) ^
Resumo:
Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^
Resumo:
PURPOSE Therapeutic drug monitoring of patients receiving once daily aminoglycoside therapy can be performed using pharmacokinetic (PK) formulas or Bayesian calculations. While these methods produced comparable results, their performance has never been checked against full PK profiles. We performed a PK study in order to compare both methods and to determine the best time-points to estimate AUC0-24 and peak concentrations (C max). METHODS We obtained full PK profiles in 14 patients receiving a once daily aminoglycoside therapy. PK parameters were calculated with PKSolver using non-compartmental methods. The calculated PK parameters were then compared with parameters estimated using an algorithm based on two serum concentrations (two-point method) or the software TCIWorks (Bayesian method). RESULTS For tobramycin and gentamicin, AUC0-24 and C max could be reliably estimated using a first serum concentration obtained at 1 h and a second one between 8 and 10 h after start of the infusion. The two-point and the Bayesian method produced similar results. For amikacin, AUC0-24 could reliably be estimated by both methods. C max was underestimated by 10-20% by the two-point method and by up to 30% with a large variation by the Bayesian method. CONCLUSIONS The ideal time-points for therapeutic drug monitoring of once daily administered aminoglycosides are 1 h after start of a 30-min infusion for the first time-point and 8-10 h after start of the infusion for the second time-point. Duration of the infusion and accurate registration of the time-points of blood drawing are essential for obtaining precise predictions.
Resumo:
A search for nonresonant new phenomena, originating from either contact interactions or large extra spatial dimensions, has been carried out using events with two isolated electrons or muons. These events, produced at the LHC in proton-proton collisions at root s = 7 TeV, were recorded by the ATLAS detector. The data sample, collected throughout 2011, corresponds to an integrated luminosity of 4.9 and 5.0 fb(-1) in the e(+)e(-) and mu(+)mu(-) channels, respectively. No significant deviations from the Standard Model expectation are observed. Using a Bayesian approach, 95% confidence level lower limits ranging from 9.0 to 13.9 TeV are placed on the energy scale of llqq contact interactions in the left-left isoscalar model. Lower limits ranging from 2.4 to 3.9 TeV are also set on the string scale in large extra dimension models. After combining these limits with results from a similar search in the diphoton channel, slightly more stringent limits are obtained.