14 resultados para Genetic Variance-covariance Matrix

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The infant mortality rate (IMR) is considered to be one of the most important indices of a country's well-being. Countries around the world and other health organizations like the World Health Organization are dedicating their resources, knowledge and energy to reduce the infant mortality rates. The well-known Millennium Development Goal 4 (MDG 4), whose aim is to archive a two thirds reduction of the under-five mortality rate between 1990 and 2015, is an example of the commitment. ^ In this study our goal is to model the trends of IMR between the 1950s to 2010s for selected countries. We would like to know how the IMR is changing overtime and how it differs across countries. ^ IMR data collected over time forms a time series. The repeated observations of IMR time series are not statistically independent. So in modeling the trend of IMR, it is necessary to account for these correlations. We proposed to use the generalized least squares method in general linear models setting to deal with the variance-covariance structure in our model. In order to estimate the variance-covariance matrix, we referred to the time-series models, especially the autoregressive and moving average models. Furthermore, we will compared results from general linear model with correlation structure to that from ordinary least squares method without taking into account the correlation structure to check how significantly the estimates change.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many statistical studies feature data with both exact-time and interval-censored events. While a number of methods currently exist to handle interval-censored events and multivariate exact-time events separately, few techniques exist to deal with their combination. This thesis develops a theoretical framework for analyzing a multivariate endpoint comprised of a single interval-censored event plus an arbitrary number of exact-time events. The approach fuses the exact-time events, modeled using the marginal method of Wei, Lin, and Weissfeld, with a piecewise-exponential interval-censored component. The resulting model incorporates more of the information in the data and also removes some of the biases associated with the exclusion of interval-censored events. A simulation study demonstrates that our approach produces reliable estimates for the model parameters and their variance-covariance matrix. As a real-world data example, we apply this technique to the Systolic Hypertension in the Elderly Program (SHEP) clinical trial, which features three correlated events: clinical non-fatal myocardial infarction, fatal myocardial infarction (two exact-time events), and silent myocardial infarction (one interval-censored event). ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Any functionally important mutation is embedded in an evolutionary matrix of other mutations. Cladistic analysis, based on this, is a method of investigating gene effects using a haplotype phylogeny to define a set of tests which localize causal mutations to branches of the phylogeny. Previous implementations of cladistic analysis have not addressed the issue of analyzing data from related individuals, though in human studies, family data are usually needed to obtain unambiguous haplotypes. In this study, a method of cladistic analysis is described in which haplotype effects are parameterized in a linear model which accounts for familial correlations. The method was used to study the effect of apolipoprotein (Apo) B gene variation on total-, LDL-, and HDL-cholesterol, triglyceride, and Apo B levels in 121 French families. Five polymorphisms defined Apo B haplotypes: the signal peptide Insertion/deletion, Bsp 1286I, XbaI, MspI, and EcoRI. Eleven haplotypes were found, and a haplotype phylogeny was constructed and used to define a set of tests of haplotype effects on lipid and apo B levels.^ This new method of cladistic analysis, the parametric method, found significant effects for single haplotypes for all variables. For HDL-cholesterol, 3 clusters of evolutionarily-related haplotypes affecting levels were found. Haplotype effects accounted for about 10% of the genetic variance of triglyceride and HDL-cholesterol levels. The results of the parametric method were compared to those of a method of cladistic analysis based on permutational testing. The permutational method detected fewer haplotype effects, even when modified to account for correlations within families. Simulation studies exploring these differences found evidence of systematic errors in the permutational method due to the process by which haplotype groups were selected for testing.^ The applicability of cladistic analysis to human data was shown. The parametric method is suggested as an improvement over the permutational method. This study has identified candidate haplotypes for sequence comparisons in order to locate the functional mutations in the Apo B gene which may influence plasma lipid levels. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Linkage disequilibrium methods can be used to find genes influencing quantitative trait variation in humans. Linkage disequilibrium methods can require smaller sample sizes than linkage equilibrium methods, such as the variance component approach to find loci with a specific effect size. The increase in power is at the expense of requiring more markers to be typed to scan the entire genome. This thesis compares different linkage disequilibrium methods to determine which factors influence the power to detect disequilibrium. The costs of disequilibrium and equilibrium tests were compared to determine whether the savings in phenotyping costs when using disequilibrium methods outweigh the additional genotyping costs.^ Nine linkage disequilibrium tests were examined by simulation. Five tests involve selecting isolated unrelated individuals while four involved the selection of parent child trios (TDT). All nine tests were found to be able to identify disequilibrium with the correct significance level in Hardy-Weinberg populations. Increasing linked genetic variance and trait allele frequency were found to increase the power to detect disequilibrium, while increasing the number of generations and distance between marker and trait loci decreased the power to detect disequilibrium. Discordant sampling was used for several of the tests. It was found that the more stringent the sampling, the greater the power to detect disequilibrium in a sample of given size. The power to detect disequilibrium was not affected by the presence of polygenic effects.^ When the trait locus had more than two trait alleles, the power of the tests maximized to less than one. For the simulation methods used here, when there were more than two-trait alleles there was a probability equal to 1-heterozygosity of the marker locus that both trait alleles were in disequilibrium with the same marker allele, resulting in the marker being uninformative for disequilibrium.^ The five tests using isolated unrelated individuals were found to have excess error rates when there was disequilibrium due to population admixture. Increased error rates also resulted from increased unlinked major gene effects, discordant trait allele frequency, and increased disequilibrium. Polygenic effects did not affect the error rates. The TDT, Transmission Disequilibrium Test, based tests were not liable to any increase in error rates.^ For all sample ascertainment costs, for recent mutations ($<$100 generations) linkage disequilibrium tests were less expensive than the variance component test to carry out. Candidate gene scans saved even more money. The use of recently admixed populations also decreased the cost of performing a linkage disequilibrium test. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Although many family-based genetic studies have collected dietary data, very few have used the dietary information in published findings. No single solution has been presented or discussed in the literature to deal with the problem of using factor analyses for the analyses of dietary data from several related individuals from a given household. The standard statistical approach of factor analysis cannot be applied to the VIVA LA FAMILIA Study diet data to ascertain dietary patterns since this population consists of three children from each family, thus the dietary patterns of the related children may be correlated and non-independent. Addressing this problem in this project will enable us to describe the dietary patterns in Hispanic families and to explore the relationships between dietary patterns and childhood obesity. ^ In the VIVA LA FAMILIA Study, an overweight child was first identified and then his/her siblings and parents were brought in for data collection which included 24 hour recalls and food frequency questionnaire (FFQ). Dietary intake data were collected using FFQ and 24 hour recalls on 1030 Hispanic children from 319 families. ^ The design of the VIVA LA FAMILIA Study has important and unique statistical considerations since its participants are related to each other, the majority form distinct nuclear families. Thus, the standard approach of factor analysis cannot be applied to these diet data to ascertain dietary patterns. In this project we propose to investigate whether the determinants of the correlation matrix of each family unit will allow us to adjust the original correlation matrix of the dietary intake data prior to ascertaining dietary intake patterns. If these methods are appropriate, then in the future the dietary patterns among related individuals could be assessed by standard orthogonal principal component factor analysis.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The molecular mechanisms controlling bone extracellular matrix (ECM) deposition by differentiated osteoblasts in postnatal life, called hereafter bone formation, are unknown. This contrasts with the growing knowledge about the genetic control of osteoblast differentiation during embryonic development. Cbfa1, a transcriptional activator of osteoblast differentiation during embryonic development, is also expressed in differentiated osteoblasts postnatally. The perinatal lethality occurring in Cbfa1-deficient mice has prevented so far the study of its function after birth. To determine if Cbfa1 plays a role during bone formation we generated transgenic mice overexpressing Cbfa1 DNA-binding domain (DeltaCbfa1) in differentiated osteoblasts only postnatally. DeltaCbfa1 has a higher affinity for DNA than Cbfa1 itself, has no transcriptional activity on its own, and can act in a dominant-negative manner in DNA cotransfection assays. DeltaCbfa1-expressing mice have a normal skeleton at birth but develop an osteopenic phenotype thereafter. Dynamic histomorphometric studies show that this phenotype is caused by a major decrease in the bone formation rate in the face of a normal number of osteoblasts thus indicating that once osteoblasts are differentiated Cbfa1 regulates their function. Molecular analyses reveal that the expression of the genes expressed in osteoblasts and encoding bone ECM proteins is nearly abolished in transgenic mice, and ex vivo assays demonstrated that DeltaCbfa1-expressing osteoblasts were less active than wild-type osteoblasts. We also show that Cbfa1 regulates positively the activity of its own promoter, which has the highest affinity Cbfa1-binding sites characterized. This study demonstrates that beyond its differentiation function Cbfa1 is the first transcriptional activator of bone formation identified to date and illustrates that developmentally important genes control physiological processes postnatally.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A UV-induced mutation of the enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPD) was characterized in the CHO clone A24. The asymmetric 4-banded zymogram and an in vitro GAPD activity equal to that of wild type cells were not consistent with models of a mutant heterozygote producing equal amounts of wild type and either catalytically active or inactive mutant subunits that interacted randomly. Cumulative evidence indicated that the site of the mutation was the GAPD structural locus expressed in CHO wild type cells, and that the mutant allele coded for a subunit that differed from the wild type subunit in stability and kinetics. The evidence included the appearance of a fifth band, the putative mutant homotetramer, after addition of the substrate glyceraldehyde-3-phosphate (GAP) to the gel matrix; dilution experiments indicating stability differences between the subunits; experiments with subsaturating levels of GAP indicating differences in affinity for the substrate; GAPD zymograms of A24 x mouse hybrids that were consistent with the presence of two distinct A24 subunits; independent segregation of A24 wild type and mutant electrophoretic bands from the hybrids, which was inconsistent with models of mutation of a locus involved in posttranslational modification; the mapping of both wild type and mutant forms of GAPD to chromosome 8; and the failure to detect any evidence of posttranslational modification (of other A24 isozymes, or through mixing of homogenates of A24 and mouse).^ The extent of skewing of the zymogram toward the wild type band, and the unreduced in vitro activity were inconsistent with models based solely on differences in activity of the two subunits. Comparison of wild type homotetramer bands in wild type cells and A24 suggested the latter had a preponderance of wild type subunits over mutant subunits, and had more GAPD tetramers than did CHO controls.^ Two CHO linkages, GAPD-triose phosphate isomerase, and acid phosphatase 2-adenosine deaminase were reported provisionally, and several others were confirmed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thoracic aortic aneurysms leading to aortic dissections (TAAD) are a major cause of morbidity and mortality in the United States. TAAD is a complication of some known genetic disorders, such as Marfan syndrome and Turner syndrome, but the majority of familial cases are not due to a known genetic syndrome. Previous studies by our group have established that nonsyndromic, familial TAAD is inherited in an autosomal dominant manner with decreased penetrance and variable expression. Using one large family with multiple members with TAAD for the genome wide scan, a major locus for familial TAAD was mapped to 5q13–14 (TAAD1). Nine out of 15 families studied were linked to this locus, establishing that TAAD1 was a major locus, and that there was genetic heterogeneity for the condition. Mapping of TAAD2 locus was accomplished using a single large family with multiple members with TAAD not linked to known loci of aneurysm formation. This established a second novel locus for familial TAAD on 3p24–25 (LOD score of 4.3), termed the TAAD2 locus. Two putative loci with suggestive LOD scores were mapped on 4q and 12q through a genome scan carried out using three families. TAAD phenotype in 12 families did not segregate with known loci, indicating further genetic heterogeneity. An STS-tagged BAC based contig was constructed for 7.8Mb and 25Mb critical interval of TAAD1 and TAAD2 respectively and characterized to identify the defective gene. The hypothesis that the defective genes responsible for the TAAD1 and TAAD2 encoded extracellular matrix (ECM) proteins, the major components of the elastic fiber system in the aortic media was tested. Four genes encoding ECM proteins, versican, thrombospondin-3, CRTL1, on TAAD1 and FBLN2 at TAAD2 were sequenced, but no disease-causing mutations were identified. Studies to identify the defective gene are initiated through the positional candidate gene approach using combination of bioinformatics and expression studies. The identification of the TAAD susceptibility genes will allow for presymptomatic diagnosis of individuals at risk for this life threatening disease. The identification of the molecular defects that contribute to TAAD will also further our understanding of the proteins that provide structural integrity to the aortic wall. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Apolipoprotein E (ApoE) plays a major role in the metabolism of high density and low density lipoproteins (HDL and LDL). Its common protein isoforms (E2, E3, E4) are risk factors for coronary artery disease (CAD) and explain between 16 to 23% of the inter-individual variation in plasma apoE levels. Linkage analysis has been completed for plasma apoE levels in the GENOA study (Genetic Epidemiology Network of Atherosclerosis). After stratification of the population by lipoprotein levels and body mass index (BMI) to create more homogeneity with regard to biological context for apoE levels, Hispanic families showed significant linkage on chromosome 17q for two strata (LOD=2.93 at 104 cM for a low cholesterol group, LOD=3.04 at 111 cM for a low cholesterol, high HDLC group). Replication of 17q linkage was observed for apoB and apoE levels in the unstratified Hispanic and African-American populations, and for apoE levels in African-American families. Replication of this 17q linkage in different populations and strata provides strong support for the presence of gene(s) in this region with significant roles in the determination of inter-individual variation in plasma apoE levels. Through a positional and functional candidate gene approach, ten genes were identified in the 17q linked region, and 62 polymorphisms in these genes were genotyped in the GENOA families. Association analysis was performed with FBAT, GEE, and variance-component based tests followed by conditional linkage analysis. Association studies with partial coverage of TagSNPs in the gene coding for apolipoprotein H (APOH) were performed, and significant results were found for 2 SNPs (APOH_20951 and APOH_05407) in the Hispanic low cholesterol strata accounting for 3.49% of the inter-individual variation in plasma apoE levels. Among the other candidate genes, we identified a haplotype block in the ACE1 gene that contains two major haplotypes associated with apoE levels as well as total cholesterol, apoB and LDLC levels in the unstratified Hispanic population. Identifying genes responsible for the remaining 60% of inter-individual variation in plasma apoE level, will yield new insights into the understanding of genetic interactions involved in the lipid metabolism, and a more precise understanding of the risk factors leading to CAD. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The overall purpose of this study was to assess the relationship between the promoter region polymorphism (-2607 1G/2G) of matrix metalloproteinase-1 (MMP-1) polymorphism and outcome in brain tumor patients diagnosed with a primary brain tumor between 1994 and 2000 at The University of Texas M. D. Anderson Cancer Center. The MMP-1 polymorphism was genotyped for all brain tumor patients who participated in the Family Brain Tumor Study and for whom blood samples were available. Relevant covariates were abstracted from medical records for all cases from the original protocol, including information on demographics, tumor histology, therapy and outcome was obtained. The hypothesis was that brain tumor patients with the 2G allele have a poorer prognosis and shorter survival than brain tumor patients with the 1G allele. ^ Experimental Design: Genetic variants for the MMP-1 enzyme were determined by a polymerase chain reaction-restriction fragment length polymorphism assay. Comparison was made between the overall survival for cases with the 2G polymorphism and overall survival for cases with the 1G polymorphism using multivariable Cox Proportional-Hazard analysis, controlling for age, sex, Karnofsky Performance Scale (KPS), extent of surgery, tumor histology and treatment received. Kaplan-Meier and Cox Proportional-Hazard analyses were utilized to assess if the MMP-1 polymorphisms were related to overall survival. Results: Overall survival was not statistically significantly different between the 2G allele brain tumor patients and the 1G allele patients and there was no statistically significant difference between tumor types. ^ Conclusions: No association was found between MMP-1 polymorphisms and survival in patients with malignant gliomas. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diabetes mellitus occurs in two forms, insulin-dependent (IDDM, formerly called juvenile type) and non-insulin dependent (NIDDM, formerly called adult type). Prevalence figures from around the world for NIDDM, show that all societies and all races are affected; although uncommon in some populations (.4%), it is common (10%) or very common (40%) in others (Tables 1 and 2).^ In Mexican-Americans in particular, the prevalence rates (7-10%) are intermediate to those in Caucasians (1-2%) and Amerindians (35%). Information about the distribution of the disease and identification of high risk groups for developing glucose intolerance or its vascular manifestations by the study of genetic markers will help to clarify and solve some of the problems from the public health and the genetic point of view.^ This research was designed to examine two general areas in relation to NIDDM. The first aims to determine the prevalence of polymorphic genetic markers in two groups distinguished by the presence or absence of diabetes and to observe if there are any genetic marker-disease association (univariate analysis using two by two tables and logistic regression to study the individual and joint effects of the different variables). The second deals with the effect of genetic differences on the variation in fasting plasma glucose and percent glycosylated hemoglobin (HbAl) (analysis of Covariance for each marker, using age and sex as covariates).^ The results from the first analysis were not statistically significant at the corrected p value of 0.003 given the number of tests that were performed. From the analysis of covariance of all the markers studied, only Duffy and Phosphoglucomutase were statistically significant but poor predictors, given that the amount they explain in terms of variation in glycosylated hemoglobin is very small.^ Trying to determine the polygenic component of chronic disease is not an easy task. This study confirms the fact that a larger and random or representative sample is needed to be able to detect differences in the prevalence of a marker for association studies and in the genetic contribution to the variation in glucose and glycosylated hemoglobin. The importance that ethnic homogeneity in the groups studied and standardization in the methodology will have on the results has been stressed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Departmento de Arica in northern Chile was chosen as the investigation site for a study of the role of certain hematologic and glycolytic variables in the physiological and genetic adaptation to hypoxia.^ The population studied comprised 876 individuals, residents of seven villages at three altitudes: coast (0-500m), sierra (2,500-3,500m) and altiplano (> 4,000m). There was an equal number of males and females ranging in ages from six to 90 years. Although predominantly Aymara, those of mixed or Spanish origin were also examined. The specimens were collected in heparinized vacutainers precipitated with cold trichloroacetic acid (TCA) and immediately frozen to -196(DEGREES)C. Six variables were measured. Three were hematologic: hemoglobin, hematocrit and mean cell hemoglobin concentration. The three others were glycolytic: erythrocyte 2,3-diphosphoglycerate (DPG), adenosine triphosphate (ATP) and the percentage of phosphates (DPG + ATP) in the form of DPG.^ Hemoglobin and hematocrit were measured on site. The DPG and ATP content was assayed in specimens which had been frozen at -196(DEGREES)C and transported to Houston. Structured interviews on site provided information as to lifestyle and family pedigrees.^ The following results were obtained: (1) The actual village, rather than the altitude, of examination accounted for the greatest proportion of the variance in all variables. In the coast, a large difference in levels of ionic lithium in the drinking water exists. The chemical environment of food and drink is postulated to account, in part, for the importance of geographic location in explaining the observed variance. (2) Measurements of individuals from the two extreme altitudes, coast and altiplano, did not exhibit the same relationship with age and body mass. The hematologic variables were significantly related to both age and body build in the coast. The glycolytic variables were significantly related to age and body mass in the altiplano. (3) The environment modified male values more than female values in all variables. The two sexes responded quite differently to age and changes in body mass as well. The question of differing adaptability of the two sexes is discussed. (4) Environmental factors explained a significantly higher proportion of total variability in the altiplano than in the coast for hemoglobin, hematocrit and DPG. Most of the ATP variability at both altitudes is explained by genetic factors. ^