836 resultados para Mixed linear models
Resumo:
The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.
Resumo:
2000 Mathematics Subject Classification: 62H12, 62P99
Resumo:
To provide biological insights into transcriptional regulation, a couple of groups have recently presented models relating the promoter DNA-bound transcription factors (TFs) to downstream gene’s mean transcript level or transcript production rates over time. However, transcript production is dynamic in response to changes of TF concentrations over time. Also, TFs are not the only factors binding to promoters; other DNA binding factors (DBFs) bind as well, especially nucleosomes, resulting in competition between DBFs for binding at same genomic location. Additionally, not only TFs, but also some other elements regulate transcription. Within core promoter, various regulatory elements influence RNAPII recruitment, PIC formation, RNAPII searching for TSS, and RNAPII initiating transcription. Moreover, it is proposed that downstream from TSS, nucleosomes resist RNAPII elongation.
Here, we provide a machine learning framework to predict transcript production rates from DNA sequences. We applied this framework in the S. cerevisiae yeast for two scenarios: a) to predict the dynamic transcript production rate during the cell cycle for native promoters; b) to predict the mean transcript production rate over time for synthetic promoters. As far as we know, our framework is the first successful attempt to have a model that can predict dynamic transcript production rates from DNA sequences only: with cell cycle data set, we got Pearson correlation coefficient Cp = 0.751 and coefficient of determination r2 = 0.564 on test set for predicting dynamic transcript production rate over time. Also, for DREAM6 Gene Promoter Expression Prediction challenge, our fitted model outperformed all participant teams, best of all teams, and a model combining best team’s k-mer based sequence features and another paper’s biologically mechanistic features, in terms of all scoring metrics.
Moreover, our framework shows its capability of identifying generalizable fea- tures by interpreting the highly predictive models, and thereby provide support for associated hypothesized mechanisms about transcriptional regulation. With the learned sparse linear models, we got results supporting the following biological insights: a) TFs govern the probability of RNAPII recruitment and initiation possibly through interactions with PIC components and transcription cofactors; b) the core promoter amplifies the transcript production probably by influencing PIC formation, RNAPII recruitment, DNA melting, RNAPII searching for and selecting TSS, releasing RNAPII from general transcription factors, and thereby initiation; c) there is strong transcriptional synergy between TFs and core promoter elements; d) the regulatory elements within core promoter region are more than TATA box and nucleosome free region, suggesting the existence of still unidentified TAF-dependent and cofactor-dependent core promoter elements in yeast S. cerevisiae; e) nucleosome occupancy is helpful for representing +1 and -1 nucleosomes’ regulatory roles on transcription.
Resumo:
Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.
Resumo:
Includes index.
Resumo:
Motivation: Unravelling the genetic architecture of complex traits requires large amounts of data, sophisticated models and large computational resources. The lack of user-friendly software incorporating all these requisites is delaying progress in the analysis of complex traits. Methods: Linkage disequilibrium and linkage analysis (LDLA) is a high-resolution gene mapping approach based on sophisticated mixed linear models, applicable to any population structure. LDLA can use population history information in addition to pedigree and molecular markers to decompose traits into genetic components. Analyses are distributed in parallel over a large public grid of computers in the UK. Results: We have proven the performance of LDLA with analyses of simulated data. There are real gains in statistical power to detect quantitative trait loci when using historical information compared with traditional linkage analysis. Moreover, the use of a grid of computers significantly increases computational speed, hence allowing analyses that would have been prohibitive on a single computer. © The Author 2009. Published by Oxford University Press. All rights reserved.
Resumo:
Population-wide associations between loci due to linkage disequilibrium can be used to map quantitative trait loci (QTL) with high resolution. However, spurious associations between markers and QTL can also arise as a consequence of population stratification. Statistical methods that cannot differentiate between loci associations due to linkage disequilibria from those caused in other ways can render false-positive results. The transmission-disequilibrium test (TDT) is a robust test for detecting QTL. The TDT exploits within-family associations that are not affected by population stratification. However, some TDTs are formulated in a rigid-form, with reduced potential applications. In this study we generalize TDT using mixed linear models to allow greater statistical flexibility. Allelic effects are estimated with two independent parameters: one exploiting the robust within-family information and the other the potentially biased between-family information. A significant difference between these two parameters can be used as evidence for spurious association. This methodology was then used to test the effects of the fourth melanocortin receptor (MC4R) on production traits in the pig. The new analyses supported the previously reported results; i.e., the studied polymorphism is either causal of in very strong linkage disequilibrium with the causal mutation, and provided no evidence for spurious association.
Resumo:
The objectives of this study were to investigate the stand structure and succession dynamics in Scots pine (Pinus sylvestris L.) stands on pristine peatlands and in Scots pine and Norway spruce (Picea abies (L.) Karst.) dominated stands on drained peatlands. Furthermore, my focus was on characterising how the inherent and environmental factors and the intermediate thinnings modify the stand structure and succession. For pristine peatlands, the study was based on inventorial stand data, while for drained peatlands, longitudinal data from repeatedly measured stands were utilised. The studied sites covered the most common peatland site types in Finland. They were classified into two categories according to the ecohydrological properties related to microsite variation and nutrient levels within sites. Tree DBH and age distributions in relation to climate and site type were used to study the stand dynamics on pristine sites. On drained sites, the Weibull function was used to parameterise the DBH distributions and mixed linear models were constructed to characterise the impacts of different ecological factors on stand dynamics. On pristine peatlands, both climate and the ecohydrology of the site proved to be crucial factors determining the stand structure and its dynamics. Irrespective of the vegetation succession, enhanced site productivity and increased stand stocking they significantly affected the stand dynamics also on drained sites. On the most stocked sites on pristine peatlands the inter-tree competition seemed to also be a significant factor modifying stand dynamics. Tree age and size diversity increased with stand age, but levelled out in the long term. After drainage, the stand structural unevenness increased due to the regeneration and/or ingrowth of the trees. This increase was more pronounced on sparsely forested composite sites than on more fully stocked genuine forested sites in Scots pine stands, which further undergo the formation of birch and spruce undergrowth beneath the overstory as succession proceeds. At 20-30 years after drainage the structural heterogeneity started to decrease, indicating increased inter-tree competition, which increased the mortality of suppressed trees within stand. Peatland stands are more dynamic than anticipated and are generally not characterized by a balanced, self-perpetuating structure. On pristine sites, various successional pathways are possible, whereas on drained sites the succession has more uniform trend. Typically, stand succession proceeds without any distinct developmental stages on pristine peatlands, whereas on drained peatlands, at least three distinct stages could be identified. Thinnings had only little impact on the stand succession. The new information on stand dynamics may be utilised, e.g. in forest management planning to facilitate the allocation of the growth resources to the desired crop component by appropriate silvicultural treatments, as well as assist in assessing the effects of the climate change on the forested boreal peatlands.
Resumo:
Key message Eucalyptus pellita demonstrated good growth and wood quality traits in this study, with young plantation grown timber being suitable for both solid and pulp wood products. All traits examined were under moderate levels of genetic control with little genotype by environment interaction when grown on two contrasting sites in Vietnam. Context Eucalyptus pellita currently has a significant role in reforestation in the tropics. Research to support expanded of use of this species is needed: particularly, research to better understand the genetic control of key traits will facilitate the development of genetically improved planting stock. Aims This study aimed to provide estimates of the heritability of diameter at breast height over bark, wood basic density, Kraft pulp yield, modulus of elasticity and microfibril angle, and the genetic correlations among these traits, and understand the importance of genotype by environment interactions in Vietnam. Methods Data for diameter and wood properties were collected from two 10-year-old, open-pollinated progeny trials of E. pellita in Vietnam that evaluated 104 families from six native range and three orchard sources. Wood properties were estimated from wood samples using near-infrared (NIR) spectroscopy. Data were analysed using mixed linear models to estimate genetic parameters (heritability, proportion of variance between seed sources and genetic correlations). Results Variation among the nine sources was small compared to additive variance. Narrow-sense heritability and genetic correlation estimates indicated that simultaneous improvements in most traits could be achieved from selection among and within families as the genetic correlations among traits were either favourable or close to zero. Type B genetic correlations approached one for all traits suggesting that genotype by environment interactions were of little importance. These results support a breeding strategy utilizing a single breeding population advanced by selecting the best individuals across all seed sources. Conclusion Both growth and wood properties have been evaluated. Multi-trait selection for growth and wood property traits will lead to more productive populations of E. pellita both with improved productivity and improved timber and pulp properties.
Resumo:
Adiposity, low aerobic fitness and low levels of activity are all associated with clustered cardiovascular disease risk in children and their high prevalence represents a major public health concern. The aim of this study is to investigate the relationship of objectively measured physical activity (PA) with motor skills (agility and balance), aerobic fitness and %body fat in young children. This study is a cross-sectional and longitudinal analyses using mixed linear models. Longitudinal data were adjusted for baseline outcome parameters. In all, 217 healthy preschool children (age 4-6 years, 48% boys) participated in this study. PA (accelerometers), agility (obstacle course), dynamic balance (balance beam), aerobic fitness (20-m shuttle run) and %body fat (bioelectric impedance) at baseline and 9 months later. PA was positively associated with both motor skills and aerobic fitness at baseline as well as with their longitudinal changes. Specifically, only vigorous, but not total or moderate PA, was related to changes in aerobic fitness. Higher PA was associated with less %body fat at baseline, but not with its change. Conversely, baseline motor skills, aerobic fitness or %body fat were not related to changes in PA. In young children, baseline PA was associated with improvements in motor skills and in aerobic fitness, an important determinant of cardiovascular risk.
Resumo:
Predictors of random effects are usually based on the popular mixed effects (ME) model developed under the assumption that the sample is obtained from a conceptual infinite population; such predictors are employed even when the actual population is finite. Two alternatives that incorporate the finite nature of the population are obtained from the superpopulation model proposed by Scott and Smith (1969. Estimation in multi-stage surveys. J. Amer. Statist. Assoc. 64, 830-840) or from the finite population mixed model recently proposed by Stanek and Singer (2004. Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 1119-1130). Predictors derived under the latter model with the additional assumptions that all variance components are known and that within-cluster variances are equal have smaller mean squared error (MSE) than the competitors based on either the ME or Scott and Smith`s models. As population variances are rarely known, we propose method of moment estimators to obtain empirical predictors and conduct a simulation study to evaluate their performance. The results suggest that the finite population mixed model empirical predictor is more stable than its competitors since, in terms of MSE, it is either the best or the second best and when second best, its performance lies within acceptable limits. When both cluster and unit intra-class correlation coefficients are very high (e.g., 0.95 or more), the performance of the empirical predictors derived under the three models is similar. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
Validity of comparisons between expected breeding values obtained from best linear unbiased prediction procedures in genetic evaluations is dependent on genetic connectedness among herds. Different cattle breeding programmes have their own particular features that distinguish their database structure and can affect connectedness. Thus, the evolution of these programmes can also alter the connectedness measures. This study analysed the evolution of the genetic connectedness measures among Brazilian Nelore cattle herds from 1999 to 2008, using the French Criterion of Admission to the group of Connected Herds (CACO) method, based on coefficients of determination (CD) of contrasts. Genetic connectedness levels were analysed by using simple and multiple regression analyses on herd descriptors to understand their relationship and their temporal trends from the 19992003 to the 20042008 period. The results showed a high level of genetic connectedness, with CACO estimates higher than 0.4 for the majority of them. Evaluation of the last 5-year period showed only a small increase in average CACO measures compared with the first 5 years, from 0.77 to 0.80. The percentage of herds with CACO estimates lower than 0.7 decreased from 27.5% in the first period to 16.2% in the last one. The connectedness measures were correlated with percentage of progeny from connecting sires, and the artificial insemination spread among Brazilian herds in recent years. But changes in connectedness levels were shown to be more complex, and their complete explanation cannot consider only herd descriptors. They involve more comprehensive changes in the relationship matrix, which can be only fully expressed by the CD of contrasts.
Resumo:
INTRODUCTION Our objective was to investigate potential associations between maxillary sinus floor extension and inclination of maxillary second premolars and second molars in patients with Class II Division 1 malocclusion whose orthodontic treatment included maxillary first molar extractions. METHODS The records of 37 patients (18 boys, 19 girls; mean age, 13.2 years; SD, 1.62 years) treated between 1998 and 2004 by 1 orthodontist with full Begg appliances were used in this study. Inclusion criteria were white patients with Class II Division 1 malocclusion, sagittal overjet of ≥4 mm, treatment plan including extraction of the maxillary first permanent molars, no missing teeth, and no agenesis. Maxillary posterior tooth inclination and lower maxillary sinus area in relation to the palatal plane were measured on lateral cephalograms at 3 time points: at the start and end of treatment, and on average 2.5 years posttreatment. Data were analyzed for the second premolar and second molar inclinations by using mixed linear models. RESULTS The analysis showed that the second molar inclination angle decreased by 7° after orthodontic treatment, compared with pretreatment values, and by 11.5° at the latest follow-up, compared with pretreatment. There was evidence that maxillary sinus volume was negatively correlated with second molar inclination angle; the greater the volume, the smaller the inclination angle. For premolars, inclination increased by 15.4° after orthodontic treatment compared with pretreatment, and by 8.1° at the latest follow-up compared with baseline. The volume of the maxillary sinus was not associated with premolar inclination. CONCLUSIONS We found evidence of an association between maxillary second molar inclination and surface area of the lower sinus in patients treated with maxillary first molar extractions. Clinicians who undertake such an extraction scheme in Class II patients should be aware of this potential association and consider appropriate biomechanics to control root uprighting.