27 resultados para Conditional autoregressive random effects model
em University of Queensland eSpace - Australia
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Finite mixture regression model with random effects: application to neonatal hospital length of stay
Resumo:
A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.
Resumo:
The estimated parameters of output distance functions frequently violate the monotonicity, quasi-convexity and convexity constraints implied by economic theory, leading to estimated elasticities and shadow prices that are incorrectly signed, and ultimately to perverse conclusions concerning the effects of input and output changes on productivity growth and relative efficiency levels. We show how a Bayesian approach can be used to impose these constraints on the parameters of a translog output distance function. Implementing the approach involves the use of a Gibbs sampler with data augmentation. A Metropolis-Hastings algorithm is also used within the Gibbs to simulate observations from truncated pdfs. Our methods are developed for the case where panel data is available and technical inefficiency effects are assumed to be time-invariant. Two models-a fixed effects model and a random effects model-are developed and applied to panel data on 17 European railways. We observe significant changes in estimated elasticities and shadow price ratios when regularity restrictions are imposed. (c) 2004 Elsevier B.V. All rights reserved.
Resumo:
The Professions in Australia Study is the first longitudinal investigation of the professions in Australia; it spans 33 years. Self-administered questionnaires were distributed on at least eight occasions between 1965 and 1998 to cohorts of students and later practitioners from the professions of engineering, law and medicine. The longitudinal design of this study has allowed for an investigation of individual change over time of three archetypal characteristics of the professions, service, knowledge and autonomy and two of the benefits of professional work, financial rewards and prestige. A cumulative logit random effects model was used to statistically assess changes in the ordinal response scores for measuring importance of the characteristics and benefits through stages of the career path. Individuals were also classified by average trends in response scores over time and hence professions are described through their members' tendency to follow a particular path in attitudes either of change or constancy, in relation to the importance of the five elements (characteristics and benefits). Comparisons in trends are also made between the three professions.
Resumo:
Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.
Resumo:
When studying genotype X environment interaction in multi-environment trials, plant breeders and geneticists often consider one of the effects, environments or genotypes, to be fixed and the other to be random. However, there are two main formulations for variance component estimation for the mixed model situation, referred to as the unconstrained-parameters (UP) and constrained-parameters (CP) formulations. These formulations give different estimates of genetic correlation and heritability as well as different tests of significance for the random effects factor. The definition of main effects and interactions and the consequences of such definitions should be clearly understood, and the selected formulation should be consistent for both fixed and random effects. A discussion of the practical outcomes of using the two formulations in the analysis of balanced data from multi-environment trials is presented. It is recommended that the CP formulation be used because of the meaning of its parameters and the corresponding variance components. When managed (fixed) environments are considered, users will have more confidence in prediction for them but will not be overconfident in prediction in the target (random) environments. Genetic gain (predicted response to selection in the target environments from the managed environments) is independent of formulation.
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
Patient outcomes in transplantation would improve if dosing of immunosuppressive agents was individualized. The aim of this study is to develop a population pharmacokinetic model of tacrolimus in adult liver transplant recipients and test this model in individualizing therapy. Population analysis was performed on data from 68 patients. Estimates were sought for apparent clearance (CL/F) and apparent volume of distribution (V/F) using the nonlinear mixed effects model program (NONMEM). Factors screened for influence on these parameters were weight, age, sex, transplant type, biliary reconstructive procedure, postoperative day, days of therapy, liver function test results, creatinine clearance, hematocrit, corticosteroid dose, and interacting drugs. The predictive performance of the developed model was evaluated through Bayesian forecasting in an independent cohort of 36 patients. No linear correlation existed between tacrolimus dosage and trough concentration (r(2) = 0.005). Mean individual Bayesian estimates for CL/F and V/F were 26.5 8.2 (SD) L/hr and 399 +/- 185 L, respectively. CL/F was greater in patients with normal liver function. V/F increased with patient weight. CL/F decreased with increasing hematocrit. Based on the derived model, a 70-kg patient with an aspartate aminotransferase (AST) level less than 70 U/L would require a tacrolimus dose of 4.7 mg twice daily to achieve a steady-state trough concentration of 10 ng/mL. A 50-kg patient with an AST level greater than 70 U/L would require a dose of 2.6 mg. Marked interindividual variability (43% to 93%) and residual random error (3.3 ng/mL) were observed. Predictions made using the final model were reasonably nonbiased (0.56 ng/mL), but imprecise (4.8 ng/mL). Pharmacokinetic information obtained will assist in tacrolimus dosing; however, further investigation into reasons for the pharmacokinetic variability of tacrolimus is required.
Resumo:
The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
Many studies on birds focus on the collection of data through an experimental design, suitable for investigation in a classical analysis of variance (ANOVA) framework. Although many findings are confirmed by one or more experts, expert information is rarely used in conjunction with the survey data to enhance the explanatory and predictive power of the model. We explore this neglected aspect of ecological modelling through a study on Australian woodland birds, focusing on the potential impact of different intensities of commercial cattle grazing on bird density in woodland habitat. We examine a number of Bayesian hierarchical random effects models, which cater for overdispersion and a high frequency of zeros in the data using WinBUGS and explore the variation between and within different grazing regimes and species. The impact and value of expert information is investigated through the inclusion of priors that reflect the experience of 20 experts in the field of bird responses to disturbance. Results indicate that expert information moderates the survey data, especially in situations where there are little or no data. When experts agreed, credible intervals for predictions were tightened considerably. When experts failed to agree, results were similar to those evaluated in the absence of expert information. Overall, we found that without expert opinion our knowledge was quite weak. The fact that the survey data is quite consistent, in general, with expert opinion shows that we do know something about birds and grazing and we could learn a lot faster if we used this approach more in ecology, where data are scarce. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
The effect of the tumour-forming disease, fibropapillomatosis, on the somatic growth dynamics of green turtles resident in the Pala'au foraging grounds (Moloka'i, Hawai'i) was evaluated using a Bayesian generalised additive mixed modelling approach. This regression model enabled us to account for fixed effects (fibropapilloma tumour severity), nonlinear covariate functional form (carapace size, sampling year) as well as random effects due to individual heterogeneity and correlation between repeated growth measurements on some turtles. Somatic growth rates were found to be nonlinear functions of carapace size and sampling year but were not a function of low-to-moderate tumour severity. On the other hand, growth rates were significantly lower for turtles with advanced fibropapillomatosis, which suggests a limited or threshold-specific disease effect. However, tumour severity was an increasing function of carapace size-larger turtles tended to have higher tumour severity scores, presumably due to longer exposure of larger (older) turtles to the factors that cause the disease. Hence turtles with advanced fibropapillomatosis tended to be the larger turtles, which confounds size and tumour severity in this study. But somatic growth rates for the Pala'au population have also declined since the mid-1980s (sampling year effect) while disease prevalence and severity increased from the mid-1980s before levelling off by the mid-1990s. It is unlikely that this decline was related to the increasing tumour severity because growth rates have also declined over the last 10-20 years for other green turtle populations resident in Hawaiian waters that have low or no disease prevalence. The declining somatic growth rate trends evident in the Hawaiian stock are more likely a density-dependent effect caused by a dramatic increase in abundance by this once-seriously-depleted stock since the mid-1980s. So despite increasing fibropapillomatosis risk over the last 20 years, only a limited effect on somatic growth dynamics was apparent and the Hawaiian green turtle stock continues to increase in abundance.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
Niche apportionment models have only been applied once to parasite communities. Only the random assortment model (RA), which indicates that species abundances are independent from each other and that interspecific competition is unimportant, provided a good fit to 3 out of 6 parasite communities investigated. The generality of this result needs to be validated, however. In this study we apply 5 niche apportionment models to the parasite communities of 14 fish species from the Great Barrier Reef. We determined which model fitted the data when using either numerical abundance or biomass as an estimate of parasite abundance, and whether the fit of niche apportionment models depends on how the parasite community is defined (e.g. ecto, endoparasites or all parasites considered together). The RA model provided a good fit for the whole community of parasites in 7 fish species when using biovolume (as a surrogate of biomass) as a measure of species abundance. The RA model also fitted observed data when ecto- and endoparasites were considered separately, using abundance or biovolume, but less frequently. Variation in fish sizes among species was not associated with the probability of a model fitting the data. Total numerical abundance and biovolume of parasites were not related across host species, suggesting that they capture different aspects of abundance. Biovolume is not only a better measurement to use with niche-orientated models, it should also be the preferred descriptor to analyse parasite community structure in other contexts. Most of the biological assumptions behind the RA model, i.e. randomness in apportioning niche space, lack of interspecific competition, independence of abundance among different species, and species with variable niches in changeable environments, are in accordance with some previous findings on parasite communities. Thus, parasite communities may generally be unsaturated with species, with empty niches, and interspecific interactions may generally be unimportant in determining parasite community structure.