4 resultados para Nonstationary variance

em DigitalCommons@The Texas Medical Center


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although many family-based genetic studies have collected dietary data, very few have used the dietary information in published findings. No single solution has been presented or discussed in the literature to deal with the problem of using factor analyses for the analyses of dietary data from several related individuals from a given household. The standard statistical approach of factor analysis cannot be applied to the VIVA LA FAMILIA Study diet data to ascertain dietary patterns since this population consists of three children from each family, thus the dietary patterns of the related children may be correlated and non-independent. Addressing this problem in this project will enable us to describe the dietary patterns in Hispanic families and to explore the relationships between dietary patterns and childhood obesity. ^ In the VIVA LA FAMILIA Study, an overweight child was first identified and then his/her siblings and parents were brought in for data collection which included 24 hour recalls and food frequency questionnaire (FFQ). Dietary intake data were collected using FFQ and 24 hour recalls on 1030 Hispanic children from 319 families. ^ The design of the VIVA LA FAMILIA Study has important and unique statistical considerations since its participants are related to each other, the majority form distinct nuclear families. Thus, the standard approach of factor analysis cannot be applied to these diet data to ascertain dietary patterns. In this project we propose to investigate whether the determinants of the correlation matrix of each family unit will allow us to adjust the original correlation matrix of the dietary intake data prior to ascertaining dietary intake patterns. If these methods are appropriate, then in the future the dietary patterns among related individuals could be assessed by standard orthogonal principal component factor analysis.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The electroencephalogram (EEG) is a physiological time series that measures electrical activity at different locations in the brain, and plays an important role in epilepsy research. Exploring the variance and/or volatility may yield insights for seizure prediction, seizure detection and seizure propagation/dynamics.^ Maximal Overlap Discrete Wavelet Transforms (MODWTs) and ARMA-GARCH models were used to determine variance and volatility characteristics of 66 channels for different states of an epileptic EEG – sleep, awake, sleep-to-awake and seizure. The wavelet variances, changes in wavelet variances and volatility half-lives for the four states were compared for possible differences between seizure and non-seizure channels.^ The half-lives of two of the three seizure channels were found to be shorter than all of the non-seizure channels, based on 95% CIs for the pre-seizure and awake signals. No discernible patterns were found the wavelet variances of the change points for the different signals. ^