13 resultados para Variance-covariance Matrices
em DigitalCommons@The Texas Medical Center
Resumo:
Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^
Resumo:
It is widely acknowledged in theoretical and empirical literature that social relationships, comprising of structural measures (social networks) and functional measures (perceived social support) have an undeniable effect on health outcomes. However, the actual mechanism of this effect has yet to be clearly understood or explicated. In addition, comorbidity is found to adversely affect social relationships and health related quality of life (a valued outcome measure in cancer patients and survivors). ^ This cross sectional study uses selected baseline data (N=3088) from the Women's Healthy Eating and Living (WHEL) study. Lisrel 8.72 was used for the latent variable structural equation modeling. Due to the ordinal nature of the data, Weighted Least Squares (WLS) method of estimation using Asymptotic Distribution Free covariance matrices was chosen for this analysis. The primary exogenous predictor variables are Social Networks and Comorbidity; Perceived Social Support is the endogenous predictor variable. Three dimensions of HRQoL, physical, mental and satisfaction with current quality of life were the outcome variables. ^ This study hypothesizes and tests the mechanism and pathways between comorbidity, social relationships and HRQoL using latent variable structural equation modeling. After testing the measurement models of social networks and perceived social support, a structural model hypothesizing associations between the latent exogenous and endogenous variables was tested. The results of the study after listwise deletion (N=2131) mostly confirmed the hypothesized relationships (TLI, CFI >0.95, RMSEA = 0.05, p=0.15). Comorbidity was adversely associated with all three HRQoL outcomes. Strong ties were negatively associated with perceived social support; social network had a strong positive association with perceived social support, which served as a mediator between social networks and HRQoL. Mental health quality of life was the most adversely affected by the predictor variables. ^ This study is a preliminary look at the integration of structural and functional measures of social relationships, comorbidity and three HRQoL indicators using LVSEM. Developing stronger social networks and forming supportive relationships is beneficial for health outcomes such as HRQoL of cancer survivors. Thus, the medical community treating cancer survivors as well as the survivor's social networks need to be informed and cognizant of these possible relationships. ^
Resumo:
The infant mortality rate (IMR) is considered to be one of the most important indices of a country's well-being. Countries around the world and other health organizations like the World Health Organization are dedicating their resources, knowledge and energy to reduce the infant mortality rates. The well-known Millennium Development Goal 4 (MDG 4), whose aim is to archive a two thirds reduction of the under-five mortality rate between 1990 and 2015, is an example of the commitment. ^ In this study our goal is to model the trends of IMR between the 1950s to 2010s for selected countries. We would like to know how the IMR is changing overtime and how it differs across countries. ^ IMR data collected over time forms a time series. The repeated observations of IMR time series are not statistically independent. So in modeling the trend of IMR, it is necessary to account for these correlations. We proposed to use the generalized least squares method in general linear models setting to deal with the variance-covariance structure in our model. In order to estimate the variance-covariance matrix, we referred to the time-series models, especially the autoregressive and moving average models. Furthermore, we will compared results from general linear model with correlation structure to that from ordinary least squares method without taking into account the correlation structure to check how significantly the estimates change.^
Resumo:
Many statistical studies feature data with both exact-time and interval-censored events. While a number of methods currently exist to handle interval-censored events and multivariate exact-time events separately, few techniques exist to deal with their combination. This thesis develops a theoretical framework for analyzing a multivariate endpoint comprised of a single interval-censored event plus an arbitrary number of exact-time events. The approach fuses the exact-time events, modeled using the marginal method of Wei, Lin, and Weissfeld, with a piecewise-exponential interval-censored component. The resulting model incorporates more of the information in the data and also removes some of the biases associated with the exclusion of interval-censored events. A simulation study demonstrates that our approach produces reliable estimates for the model parameters and their variance-covariance matrix. As a real-world data example, we apply this technique to the Systolic Hypertension in the Elderly Program (SHEP) clinical trial, which features three correlated events: clinical non-fatal myocardial infarction, fatal myocardial infarction (two exact-time events), and silent myocardial infarction (one interval-censored event). ^
Resumo:
Gene silencing due to epigenetic mechanisms shows evidence of significant contributions to cancer development. We hypothesis that the genetic architecture based on retrotransposon elements surrounding the transcription start site, plays an important role in the suppression and promotion of DNA methylation. In our investigation we found a high rate of SINE and LINEs retrotransposon elements near the transcription start site of unmethylated genes when compared to methylated genes. The presence of these elements were positively associated with promoter methylation, contrary to logical expectations, due to the malicious effects of retrotransposon elements which insert themselves randomly into the genome causing possible loss of gene function. In our genome wide analysis of human genes, results suggested that 22% of the genes in cancer were predicted to be methylation-prone; in cancer these genes are generally down-regulated and function in the development process. In summary, our investigation validated our hypothesis and showed that these widespread genomic elements in cancer are highly associated with promoter DNA methylation and may further participate in influencing epigenetic regulation.
Resumo:
A non-parametric method was developed and tested to compare the partial areas under two correlated Receiver Operating Characteristic curves. Based on the theory of generalized U-statistics the mathematical formulas have been derived for computing ROC area, and the variance and covariance between the portions of two ROC curves. A practical SAS application also has been developed to facilitate the calculations. The accuracy of the non-parametric method was evaluated by comparing it to other methods. By applying our method to the data from a published ROC analysis of CT image, our results are very close to theirs. A hypothetical example was used to demonstrate the effects of two crossed ROC curves. The two ROC areas are the same. However each portion of the area between two ROC curves were found to be significantly different by the partial ROC curve analysis. For computation of ROC curves with large scales, such as a logistic regression model, we applied our method to the breast cancer study with Medicare claims data. It yielded the same ROC area computation as the SAS Logistic procedure. Our method also provides an alternative to the global summary of ROC area comparison by directly comparing the true-positive rates for two regression models and by determining the range of false-positive values where the models differ. ^
Resumo:
With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^
Resumo:
Although many family-based genetic studies have collected dietary data, very few have used the dietary information in published findings. No single solution has been presented or discussed in the literature to deal with the problem of using factor analyses for the analyses of dietary data from several related individuals from a given household. The standard statistical approach of factor analysis cannot be applied to the VIVA LA FAMILIA Study diet data to ascertain dietary patterns since this population consists of three children from each family, thus the dietary patterns of the related children may be correlated and non-independent. Addressing this problem in this project will enable us to describe the dietary patterns in Hispanic families and to explore the relationships between dietary patterns and childhood obesity. ^ In the VIVA LA FAMILIA Study, an overweight child was first identified and then his/her siblings and parents were brought in for data collection which included 24 hour recalls and food frequency questionnaire (FFQ). Dietary intake data were collected using FFQ and 24 hour recalls on 1030 Hispanic children from 319 families. ^ The design of the VIVA LA FAMILIA Study has important and unique statistical considerations since its participants are related to each other, the majority form distinct nuclear families. Thus, the standard approach of factor analysis cannot be applied to these diet data to ascertain dietary patterns. In this project we propose to investigate whether the determinants of the correlation matrix of each family unit will allow us to adjust the original correlation matrix of the dietary intake data prior to ascertaining dietary intake patterns. If these methods are appropriate, then in the future the dietary patterns among related individuals could be assessed by standard orthogonal principal component factor analysis.^
Resumo:
The role of clinical chemistry has traditionally been to evaluate acutely ill or hospitalized patients. Traditional statistical methods have serious drawbacks in that they use univariate techniques. To demonstrate alternative methodology, a multivariate analysis of covariance model was developed and applied to the data from the Cooperative Study of Sickle Cell Disease.^ The purpose of developing the model for the laboratory data from the CSSCD was to evaluate the comparability of the results from the different clinics. Several variables were incorporated into the model in order to control for possible differences among the clinics that might confound any real laboratory differences.^ Differences for LDH, alkaline phosphatase and SGOT were identified which will necessitate adjustments by clinic whenever these data are used. In addition, aberrant clinic values for LDH, creatinine and BUN were also identified.^ The use of any statistical technique including multivariate analysis without thoughtful consideration may lead to spurious conclusions that may not be corrected for some time, if ever. However, the advantages of multivariate analysis far outweigh its potential problems. If its use increases as it should, the applicability to the analysis of laboratory data in prospective patient monitoring, quality control programs, and interpretation of data from cooperative studies could well have a major impact on the health and well being of a large number of individuals. ^
Resumo:
The purpose of this study was to investigate whether an incongruence between personality characteristics of individuals and concomitant charcteristics of health professional training environments on salient dimensions contributes to aspects of mental health. The dimensions examined were practical-theoretical orientation and the degree of structure-unstructure. They were selected for study as they are particularly important attributes of students and of learning environments. It was proposed that when the demand of the environment is disparate from the proclivities of the individual, strain arises. This strain was hypothesized to contribute to anxiety, depression, and subjective distress.^ Select subscales on the Omnibus Personality Inventory (OPI) were the operationalized measures for the personality component of the dimensions studied. An environmental index was developed to assess students' perceptions of the learning environment on these same dimensions. The Beck Depression Inventory, State-Trait Anxiety Inventory and General Well-Being schedule measured the outcome variables.^ A congruence model was employed to determine person-environment (P-E) interaction. Scores on the scales of the OPI and the environmental index were divided into high, medium, and low based on the range of scores. Congruence was defined as a match between the level of personality need and the complementary level of the perception of the environment. Alternatively, incongruence was defined as a mismatch between the person and the environment. The consistent category was compared to the inconsistent categories by an analysis of variance procedure. Furthermore, analyses of covariance were conducted with perceived supportiveness of the learning environment and life events external to the learning environment as the covariates. These factors were considered critical influences affecting the outcome measures.^ One hundred and eighty-five students (49% of the population) at the College of Optometry at the University of Houston participated in the study. Students in all four years of the program were equally represented in the study. However, the sample differed from the total population on representation by sex, marital status, and undergraduate major.^ The results of the study did not support the hypotheses. Further, after having adjusted for perceived supportiveness and life events external to the learning environment, there were no statistically significant differences between the congruent category and incongruent categories. Means indicated than the study sample experienced significantly lower depression and subjective distress than the normative samples.^ Results are interpreted in light of their utility for future study design in the investigation of the effects of P-E interaction. Emphasized is the question of the feasibility of testing a P-E interaction model with extant groups. Recommendations for subsequent research are proposed in light of the exploratory nature of the methodology. ^
Resumo:
Path analysis has been applied to components of the iron metabolic system with the intent of suggesting an integrated procedure for better evaluating iron nutritional status at the community level. The primary variables of interest in this study were (1) iron stores, (2) total iron-binding capacity, (3) serum ferritin, (4) serum iron, (5) transferrin saturation, and (6) hemoglobin concentration. Correlation coefficients for relationships among these variables were obtained from published literature and postulated in a series of models using measures of those variables that are feasible to include in a community nutritional survey. Models were built upon known information about the metabolism of iron and were limited by what had been reported in the literature in terms of correlation coefficients or quantitative relationships. Data were pooled from various studies and correlations of the same bivariate relationships were averaged after z- transformations. Correlation matrices were then constructed by transforming the average values back into correlation coefficients. The results of path analysis in this study indicate that hemoglobin is not a good indicator of early iron deficiency. It does not account for variance in iron stores. On the other hand, 91% of the variance in iron stores is explained by serum ferritin and total iron-binding capacity. In addition, the magnitude of the path coefficient (.78) of the serum ferritin-iron stores relationship signifies that serum ferritin is the most important predictor of iron stores in the proposed model. Finally, drawing upon known relations among variables and the amount of variance explained in path models, it is suggested that the following blood measures should be made in assessing community iron deficiency: (1) serum ferritin, (2) total iron-binding capacity, (3) serum iron, (4) transferrin saturation, and (5) hemoglobin concentration. These measures (with acceptable ranges and cut-off points) could make possible the complete evaluation of all three stages of iron deficiency in those persons surveyed at the community level. ^
Resumo:
The electroencephalogram (EEG) is a physiological time series that measures electrical activity at different locations in the brain, and plays an important role in epilepsy research. Exploring the variance and/or volatility may yield insights for seizure prediction, seizure detection and seizure propagation/dynamics.^ Maximal Overlap Discrete Wavelet Transforms (MODWTs) and ARMA-GARCH models were used to determine variance and volatility characteristics of 66 channels for different states of an epileptic EEG – sleep, awake, sleep-to-awake and seizure. The wavelet variances, changes in wavelet variances and volatility half-lives for the four states were compared for possible differences between seizure and non-seizure channels.^ The half-lives of two of the three seizure channels were found to be shorter than all of the non-seizure channels, based on 95% CIs for the pre-seizure and awake signals. No discernible patterns were found the wavelet variances of the change points for the different signals. ^