885 resultados para Asymptotic covariance matrix


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion (AIC) have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is no longer an asymptotically unbiased estimator of the Akaike information, and in fact favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that leads to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional AIC, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We develop statistical procedures for estimating shape and orientation of arbitrary three-dimensional particles. We focus on the case where particles cannot be observed directly, but only via sections. Volume tensors are used for describing particle shape and orientation, and we derive stereological estimators of the tensors. These estimators are combined to provide consistent estimators of the moments of the so-called particle cover density. The covariance structure associated with the particle cover density depends on the orientation and shape of the particles. For instance, if the distribution of the typical particle is invariant under rotations, then the covariance matrix is proportional to the identity matrix. We develop a non-parametric test for such isotropy. A flexible Lévy-based particle model is proposed, which may be analysed using a generalized method of moments in which the volume tensors enter. The developed methods are used to study the cell organization in the human brain cortex.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The infant mortality rate (IMR) is considered to be one of the most important indices of a country's well-being. Countries around the world and other health organizations like the World Health Organization are dedicating their resources, knowledge and energy to reduce the infant mortality rates. The well-known Millennium Development Goal 4 (MDG 4), whose aim is to archive a two thirds reduction of the under-five mortality rate between 1990 and 2015, is an example of the commitment. ^ In this study our goal is to model the trends of IMR between the 1950s to 2010s for selected countries. We would like to know how the IMR is changing overtime and how it differs across countries. ^ IMR data collected over time forms a time series. The repeated observations of IMR time series are not statistically independent. So in modeling the trend of IMR, it is necessary to account for these correlations. We proposed to use the generalized least squares method in general linear models setting to deal with the variance-covariance structure in our model. In order to estimate the variance-covariance matrix, we referred to the time-series models, especially the autoregressive and moving average models. Furthermore, we will compared results from general linear model with correlation structure to that from ordinary least squares method without taking into account the correlation structure to check how significantly the estimates change.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many statistical studies feature data with both exact-time and interval-censored events. While a number of methods currently exist to handle interval-censored events and multivariate exact-time events separately, few techniques exist to deal with their combination. This thesis develops a theoretical framework for analyzing a multivariate endpoint comprised of a single interval-censored event plus an arbitrary number of exact-time events. The approach fuses the exact-time events, modeled using the marginal method of Wei, Lin, and Weissfeld, with a piecewise-exponential interval-censored component. The resulting model incorporates more of the information in the data and also removes some of the biases associated with the exclusion of interval-censored events. A simulation study demonstrates that our approach produces reliable estimates for the model parameters and their variance-covariance matrix. As a real-world data example, we apply this technique to the Systolic Hypertension in the Elderly Program (SHEP) clinical trial, which features three correlated events: clinical non-fatal myocardial infarction, fatal myocardial infarction (two exact-time events), and silent myocardial infarction (one interval-censored event). ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a novel graphical user interface program GrafLab (GRAvity Field LABoratory) for spherical harmonic synthesis (SHS) created in MATLAB®. This program allows to comfortably compute 38 various functionals of the geopotential up to ultra-high degrees and orders of spherical harmonic expansion. For the most difficult part of the SHS, namely the evaluation of the fully normalized associated Legendre functions (fnALFs), we used three different approaches according to required maximum degree: (i) the standard forward column method (up to maximum degree 1800, in some cases up to degree 2190); (ii) the modified forward column method combined with Horner's scheme (up to maximum degree 2700); (iii) the extended-range arithmetic (up to an arbitrary maximum degree). For the maximum degree 2190, the SHS with fnALFs evaluated using the extended-range arithmetic approach takes only approximately 2-3 times longer than its standard arithmetic counterpart, i.e. the standard forward column method. In the GrafLab, the functionals of the geopotential can be evaluated on a regular grid or point-wise, while the input coordinates can either be read from a data file or entered manually. For the computation on a regular grid we decided to apply the lumped coefficients approach due to significant time-efficiency of this method. Furthermore, if a full variance-covariance matrix of spherical harmonic coefficients is available, it is possible to compute the commission errors of the functionals. When computing on a regular grid, the output functionals or their commission errors may be depicted on a map using automatically selected cartographic projection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

State-of-the-art process-based models have shown to be applicable to the simulation and prediction of coastal morphodynamics. On annual to decadal temporal scales, these models may show limitations in reproducing complex natural morphological evolution patterns, such as the movement of bars and tidal channels, e.g. the observed decadal migration of the Medem Channel in the Elbe Estuary, German Bight. Here a morphodynamic model is shown to simulate the hydrodynamics and sediment budgets of the domain to some extent, but fails to adequately reproduce the pronounced channel migration, due to the insufficient implementation of bank erosion processes. In order to allow for long-term simulations of the domain, a nudging method has been introduced to update the model-predicted bathymetries with observations. The model-predicted bathymetry is nudged towards true states in annual time steps. Sensitivity analysis of a user-defined correlation length scale, for the definition of the background error covariance matrix during the nudging procedure, suggests that the optimal error correlation length is similar to the grid cell size, here 80-90 m. Additionally, spatially heterogeneous correlation lengths produce more realistic channel depths than do spatially homogeneous correlation lengths. Consecutive application of the nudging method compensates for the (stand-alone) model prediction errors and corrects the channel migration pattern, with a Brier skill score of 0.78. The proposed nudging method in this study serves as an analytical approach to update model predictions towards a predefined 'true' state for the spatiotemporal interpolation of incomplete morphological data in long-term simulations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Finding the degree-constrained minimum spanning tree (DCMST) of a graph is a widely studied NP-hard problem. One of its most important applications is network design. Here we deal with a new variant of the DCMST problem, which consists of finding not only the degree- but also the role-constrained minimum spanning tree (DRCMST), i.e., we add constraints to restrict the role of the nodes in the tree to root, intermediate or leaf node. Furthermore, we do not limit the number of root nodes to one, thereby, generally, building a forest of DRCMSTs. The modeling of network design problems can benefit from the possibility of generating more than one tree and determining the role of the nodes in the network. We propose a novel permutation-based representation to encode these forests. In this new representation, one permutation simultaneously encodes all the trees to be built. We simulate a wide variety of DRCMST problems which we optimize using eight different evolutionary computation algorithms encoding individuals of the population using the proposed representation. The algorithms we use are: estimation of distribution algorithm, generational genetic algorithm, steady-state genetic algorithm, covariance matrix adaptation evolution strategy, differential evolution, elitist evolution strategy, non-elitist evolution strategy and particle swarm optimization. The best results are for the estimation of distribution algorithms and both types of genetic algorithms, although the genetic algorithms are significantly faster.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Encontrar el árbol de expansión mínimo con restricción de grado de un grafo (DCMST por sus siglas en inglés) es un problema NP-complejo ampliamente estudiado. Una de sus aplicaciones más importantes es el dise~no de redes. Aquí nosotros tratamos una nueva variante del problema DCMST, que consiste en encontrar el árbol de expansión mínimo no solo con restricciones de grado, sino también con restricciones de rol (DRCMST), es decir, a~nadimos restricciones para restringir el rol que los nodos tienen en el árbol. Estos roles pueden ser nodo raíz, nodo intermedio o nodo hoja. Por otra parte, no limitamos el número de nodos raíz a uno, por lo que, en general, construiremos bosques de DRCMSTs. El modelado en los problemas de dise~no de redes puede beneficiarse de la posibilidad de generar más de un árbol y determinar el rol de los nodos en la red. Proponemos una nueva representación basada en permutaciones para codificar los bosques de DRCMSTs. En esta nueva representación, una permutación codifica simultáneamente todos los árboles que se construirán. Nosotros simulamos una amplia variedad de problemas DRCMST que optimizamos utilizando ocho algoritmos de computación evolutiva diferentes que codifican los individuos de la población utilizando la representación propuesta. Los algoritmos que utilizamos son: algoritmo de estimación de distribuciones (EDA), algoritmo genético generacional (gGA), algoritmo genético de estado estacionario (ssGA), estrategia evolutiva basada en la matriz de covarianzas (CMAES), evolución diferencial (DE), estrategia evolutiva elitista (ElitistES), estrategia evolutiva no elitista (NonElitistES) y optimización por enjambre de partículas (PSO). Los mejores resultados fueron para el algoritmo de estimación de distribuciones utilizado y ambos tipos de algoritmos genéticos, aunque los algoritmos genéticos fueron significativamente más rápidos.---ABSTRACT---Finding the degree-constrained minimum spanning tree (DCMST) of a graph is a widely studied NP-hard problem. One of its most important applications is network design. Here we deal with a new variant of the DCMST problem, which consists of finding not only the degree- but also the role-constrained minimum spanning tree (DRCMST), i.e., we add constraints to restrict the role of the nodes in the tree to root, intermediate or leaf node. Furthermore, we do not limit the number of root nodes to one, thereby, generally, building a forest of DRCMSTs. The modeling of network design problems can benefit from the possibility of generating more than one tree and determining the role of the nodes in the network. We propose a novel permutation-based representation to encode the forest of DRCMSTs. In this new representation, one permutation simultaneously encodes all the trees to be built. We simulate a wide variety of DRCMST problems which we optimize using eight diferent evolutionary computation algorithms encoding individuals of the population using the proposed representation. The algorithms we use are: estimation of distribution algorithm (EDA), generational genetic algorithm (gGA), steady-state genetic algorithm (ssGA), covariance matrix adaptation evolution strategy (CMAES), diferential evolution (DE), elitist evolution strategy (ElististES), non-elitist evolution strategy (NonElististES) and particle swarm optimization (PSO). The best results are for the estimation of distribution algorithm and both types of genetic algorithms, although the genetic algorithms are significantly faster. iv

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Patterns in sequences of amino acid hydrophobic free energies predict secondary structures in proteins. In protein folding, matches in hydrophobic free energy statistical wavelengths appear to contribute to selective aggregation of secondary structures in “hydrophobic zippers.” In a similar setting, the use of Fourier analysis to characterize the dominant statistical wavelengths of peptide ligands’ and receptor proteins’ hydrophobic modes to predict such matches has been limited by the aliasing and end effects of short peptide lengths, as well as the broad-band, mode multiplicity of many of their frequency (power) spectra. In addition, the sequence locations of the matching modes are lost in this transformation. We make new use of three techniques to address these difficulties: (i) eigenfunction construction from the linear decomposition of the lagged covariance matrices of the ligands and receptors as hydrophobic free energy sequences; (ii) maximum entropy, complex poles power spectra, which select the dominant modes of the hydrophobic free energy sequences or their eigenfunctions; and (iii) discrete, best bases, trigonometric wavelet transformations, which confirm the dominant spectral frequencies of the eigenfunctions and locate them as (absolute valued) moduli in the peptide or receptor sequence. The leading eigenfunction of the covariance matrix of a transmembrane receptor sequence locates the same transmembrane segments seen in n-block-averaged hydropathy plots while leaving the remaining hydrophobic modes unsmoothed and available for further analyses as secondary eigenfunctions. In these receptor eigenfunctions, we find a set of statistical wavelength matches between peptide ligands and their G-protein and tyrosine kinase coupled receptors, ranging across examples from 13.10 amino acids in acid fibroblast growth factor to 2.18 residues in corticotropin releasing factor. We find that the wavelet-located receptor modes in the extracellular loops are compatible with studies of receptor chimeric exchanges and point mutations. A nonbinding corticotropin-releasing factor receptor mutant is shown to have lost the signatory mode common to the normal receptor and its ligand. Hydrophobic free energy eigenfunctions and their transformations offer new quantitative physical homologies in database searches for peptide-receptor matches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As additivity is a very useful property for a distance measure, a general additive distance is proposed under the stationary time-reversible (SR) model of nucleotide substitution or, more generally, under the stationary, time-reversible, and rate variable (SRV) model, which allows rate variation among nucleotide sites. A method for estimating the mean distance and the sampling variance is developed. In addition, a method is developed for estimating the variance-covariance matrix of distances, which is useful for the statistical test of phylogenies and molecular clocks. Computer simulation shows (i) if the sequences are longer than, say, 1000 bp, the SR method is preferable to simpler methods; (ii) the SR method is robust against deviations from time-reversibility; (iii) when the rate varies among sites, the SRV method is much better than the SR method because the distance is seriously underestimated by the SR method; and (iv) our method for estimating the sampling variance is accurate for sequences longer than 500 bp. Finally, a test is constructed for testing whether DNA evolution follows a general Markovian model.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

devcon transforms the coefficients of 0/1 dummy variables so that they reflect deviations from the "grand mean" rather than deviations from the reference category (the transformed coefficients are equivalent to those obtained by the so called "effects coding") and adds the coefficient for the reference category. The variance-covariance matrix of the estimates is transformed accordingly. The transformed estimated can be used with post estimation procedures. In particular, devcon can be used to solve the identification problem for dummy variable effects in the so-called Blinder-Oaxaca decomposition (see the oaxaca package).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

While the feasibility of bottleneck-induced speciation is in doubt, population bottlenecks may still affect the speciation process by interacting with divergent selection. To explore this possibility, I conducted a laboratory speciation experiment using Drosophila pseudoobscura involving 78 replicate populations assigned in a two-way factorial design to both bottleneck (present vs. absent) and environment (ancestral vs. novel) treatments. Populations independently evolved under these treatments and were then tested for assortative mating and male mating success against their common ancestor. Bottlenecks alone did not generate any premating isolation, despite an experimental design that was conducive to bottleneck-induced speciation. Premating isolation also did not evolve in the novel environment treatment, neither in the presence nor absence of bottlenecks. However, male mating success was significantly reduced in the novel environment treatment, both as a plastic response to this environment and as a result of environment-dependent inbreeding effects in the bottlenecked populations. Reduced mating success of derived males will hamper speciation by enhancing the mating success of immigrant, ancestral males. Novel environments are generally thought to promote ecological speciation by generating divergent natural selection. In the current experiment, however, the novel environment did not cause the evolution of any premating isolation and it reduced the likelihood of speciation through its effects on male mating success.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Single male sexually selected traits have been found to exhibit substantial genetic variance, even though natural and sexual selection are predicted to deplete genetic variance in these traits. We tested whether genetic variance in multiple male display traits of Drosophila serrata was maintained under field conditions. A breeding design involving 300 field-reared males and their laboratory-reared offspring allowed the estimation of the genetic variance-covariance matrix for six male cuticular hydrocarbons (CHCs) under field conditions. Despite individual CHCs displaying substantial genetic variance under field conditions, the vast majority of genetic variance in CHCs was not closely associated with the direction of sexual selection measured on field phenotypes. Relative concentrations of three CHCs correlated positively with body size in the field, but not under laboratory conditions, suggesting condition-dependent expression of CHCs under field conditions. Therefore condition dependence may not maintain genetic variance in preferred combinations of male CHCs under field conditions, suggesting that the large mutational target supplied by the evolution of condition dependence may not provide a solution to the lek paradox in this species. Sustained sexual selection may be adequate to deplete genetic variance in the direction of selection, perhaps as a consequence of the low rate of favorable mutations expected in multiple trait systems.