34 resultados para Hierarchical regression model
em CentAUR: Central Archive University of Reading - UK
A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms
Resumo:
Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.
Resumo:
In this paper we focus on the one year ahead prediction of the electricity peak-demand daily trajectory during the winter season in Central England and Wales. We define a Bayesian hierarchical model for predicting the winter trajectories and present results based on the past observed weather. Thanks to the flexibility of the Bayesian approach, we are able to produce the marginal posterior distributions of all the predictands of interest. This is a fundamental progress with respect to the classical methods. The results are encouraging in both skill and representation of uncertainty. Further extensions are straightforward at least in principle. The main two of those consist in conditioning the weather generator model with respect to additional information like the knowledge of the first part of the winter and/or the seasonal weather forecast. Copyright (C) 2006 John Wiley & Sons, Ltd.
Resumo:
In this paper we focus on the one year ahead prediction of the electricity peak-demand daily trajectory during the winter season in Central England and Wales. We define a Bayesian hierarchical model for predicting the winter trajectories and present results based on the past observed weather. Thanks to the flexibility of the Bayesian approach, we are able to produce the marginal posterior distributions of all the predictands of interest. This is a fundamental progress with respect to the classical methods. The results are encouraging in both skill and representation of uncertainty. Further extensions are straightforward at least in principle. The main two of those consist in conditioning the weather generator model with respect to additional information like the knowledge of the first part of the winter and/or the seasonal weather forecast. Copyright (C) 2006 John Wiley & Sons, Ltd.
Resumo:
A new class of parameter estimation algorithms is introduced for Gaussian process regression (GPR) models. It is shown that the integration of the GPR model with probability distance measures of (i) the integrated square error and (ii) Kullback–Leibler (K–L) divergence are analytically tractable. An efficient coordinate descent algorithm is proposed to iteratively estimate the kernel width using golden section search which includes a fast gradient descent algorithm as an inner loop to estimate the noise variance. Numerical examples are included to demonstrate the effectiveness of the new identification approaches.
Resumo:
Survival times for the Acacia mangium plantation in the Segaliud Lokan Project, Sabah, East Malaysia were analysed based on 20 permanent sample plots (PSPs) established in 1988 as a spacing experiment. The PSPs were established following a complete randomized block design with five levels of spacing randomly assigned to units within four blocks at different sites. The survival times of trees in years are of interest. Since the inventories were only conducted annually, the actual survival time for each tree was not observed. Hence, the data set comprises censored survival times. Initial analysis of the survival of the Acacia mangium plantation suggested there is block by spacing interaction; a Weibull model gives a reasonable fit to the replicate survival times within each PSP; but a standard Weibull regression model is inappropriate because the shape parameter differs between PSPs. In this paper we investigate the form of the non-constant Weibull shape parameter. Parsimonious models for the Weibull survival times have been derived using maximum likelihood methods. The factor selection for the parameters is based on a backward elimination procedure. The models are compared using likelihood ratio statistics. The results suggest that both Weibull parameters depend on spacing and block.
Resumo:
Objectives: To assess the potential source of variation that surgeon may add to patient outcome in a clinical trial of surgical procedures. Methods: Two large (n = 1380) parallel multicentre randomized surgical trials were undertaken to compare laparoscopically assisted hysterectomy with conventional methods of abdominal and vaginal hysterectomy; involving 43 surgeons. The primary end point of the trial was the occurrence of at least one major complication. Patients were nested within surgeons giving the data set a hierarchical structure. A total of 10% of patients had at least one major complication, that is, a sparse binary outcome variable. A linear mixed logistic regression model (with logit link function) was used to model the probability of a major complication, with surgeon fitted as a random effect. Models were fitted using the method of maximum likelihood in SAS((R)). Results: There were many convergence problems. These were resolved using a variety of approaches including; treating all effects as fixed for the initial model building; modelling the variance of a parameter on a logarithmic scale and centring of continuous covariates. The initial model building process indicated no significant 'type of operation' across surgeon interaction effect in either trial, the 'type of operation' term was highly significant in the abdominal trial, and the 'surgeon' term was not significant in either trial. Conclusions: The analysis did not find a surgeon effect but it is difficult to conclude that there was not a difference between surgeons. The statistical test may have lacked sufficient power, the variance estimates were small with large standard errors, indicating that the precision of the variance estimates may be questionable.
Resumo:
This paper derives some exact power properties of tests for spatial autocorrelation in the context of a linear regression model. In particular, we characterize the circumstances in which the power vanishes as the autocorrelation increases, thus extending the work of Krämer (2005). More generally, the analysis in the paper sheds new light on how the power of tests for spatial autocorrelation is affected by the matrix of regressors and by the spatial structure. We mainly focus on the problem of residual spatial autocorrelation, in which case it is appropriate to restrict attention to the class of invariant tests, but we also consider the case when the autocorrelation is due to the presence of a spatially lagged dependent variable among the regressors. A numerical study aimed at assessing the practical relevance of the theoretical results is included
Resumo:
The survival of Bifidobacterium longum NCIMB 8809 was studied during refrigerated storage for 6 weeks in model solutions, based on which a mathematical model was constructed describing cell survival as a function of pH, citric acid, protein and dietary fibre. A Central Composite Design (CCD) was developed studying the influence of four factors at three levels, i.e., pH (3.2–4), citric acid (2–15 g/l), protein (0–10 g/l), and dietary fibre (0–8 g/l). In total, 31 experimental runs were carried out. Analysis of variance (ANOVA) of the regression model demonstrated that the model fitted well the data. From the regression coefficients it was deduced that all four factors had a statistically significant (P < 0.05) negative effect on the log decrease [log10N0 week−log10N6 week], with the pH and citric acid being the most influential ones. Cell survival during storage was also investigated in various types of juices, including orange, grapefruit, blackcurrant, pineapple, pomegranate and strawberry. The highest cell survival (less than 0.4 log decrease) after 6 weeks of storage was observed in orange and pineapple, both of which had a pH of about 3.8. Although the pH of grapefruit and blackcurrant was similar (pH ∼3.2), the log decrease of the former was ∼0.5 log, whereas of the latter was ∼0.7 log. One reason for this could be the fact that grapefruit contained a high amount of citric acid (15.3 g/l). The log decrease in pomegranate and strawberry juices was extremely high (∼8 logs). The mathematical model was able to predict adequately the cell survival in orange, grapefruit, blackcurrant, and pineapple juices. However, the model failed to predict the cell survival in pomegranate and strawberry, most likely due to the very high levels of phenolic compounds in these two juices.
Resumo:
This work proposes a unified neurofuzzy modelling scheme. To begin with, the initial fuzzy base construction method is based on fuzzy clustering utilising a Gaussian mixture model (GMM) combined with the analysis of covariance (ANOVA) decomposition in order to obtain more compact univariate and bivariate membership functions over the subspaces of the input features. The mean and covariance of the Gaussian membership functions are found by the expectation maximisation (EM) algorithm with the merit of revealing the underlying density distribution of system inputs. The resultant set of membership functions forms the basis of the generalised fuzzy model (GFM) inference engine. The model structure and parameters of this neurofuzzy model are identified via the supervised subspace orthogonal least square (OLS) learning. Finally, instead of providing deterministic class label as model output by convention, a logistic regression model is applied to present the classifier’s output, in which the sigmoid type of logistic transfer function scales the outputs of the neurofuzzy model to the class probability. Experimental validation results are presented to demonstrate the effectiveness of the proposed neurofuzzy modelling scheme.
Resumo:
An online national survey among the Spanish population (n = 602) was conducted to examine the factors underlying a person’s support for commitments to global climate change reductions. Multiple hierarchical regression analysis was conducted in four steps and a structural equations model was tested. A survey tool designed by the Yale Project on Climate Change Communication was applied in order to build scales for the variables introduced in the study. The results show that perceived consumer effectiveness and risk perception are determinant factors of commitment to mitigating global climate change. However, there are differences in the influence that other factors, such as socio-demographics, view of nature and cultural cognition, have on the last predicted variable.
Resumo:
Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.
Resumo:
This study investigates variability in the intensity of the wintertime Siberian high (SH) by defining a robust SH index (SHI) and correlating it with selected meteorological fields and teleconnection indices. A dramatic trend of -2.5 hPa decade(-1) has been found in the SHI between 1978 and 2001 with unprecedented (since 1871) low values of the SHI. The weakening of the SH has been confirmed by analyzing different historical gridded analyses and individual station observations of sea level pressure (SLP) and excluding possible effects from the conversion of surface pressure to SLP. SHI correlation maps with various meteorological fields show that SH impacts on circulation and temperature patterns extend far outside the SH source area extending from the Arctic to the tropical Pacific. Advection of warm air from eastern Europe has been identified as the main mechanism causing milder than normal conditions over the Kara and Laptev Seas in association with a strong SH. Despite the strong impacts of the variability in the SH on climatic variability across the Northern Hemisphere, correlations between the SHI and the main teleconnection indices of the Northern Hemisphere are weak. Regression analysis has shown that teleconnection indices are not able to reproduce the interannual variability and trends in the SH. The inclusion of regional surface temperature in the regression model provides closer agreement between the original and reconstructed SHI.
Resumo:
The degree to which perceived controllability alters the way a stressor is experienced varies greatly among individuals. We used functional magnetic resonance imaging to examine the neural activation associated with individual differences in the impact of perceived controllability on self-reported pain perception. Subjects with greater activation in response to uncontrollable (UC) rather than controllable (C) pain in the pregenual anterior cingulate cortex (pACC), periaqueductal gray (PAG), and posterior insula/SII reported higher levels of pain during the UC versus C conditions. Conversely, subjects with greater activation in the ventral lateral prefrontal cortex (VLPFC) in anticipation of pain in the UC versus C conditions reported less pain in response to UC versus C pain. Activation in the VLPFC was significantly correlated with the acceptance and denial subscales of the COPE inventory [Carver, C. S., Scheier, M. F., & Weintraub, J. K. Assessing coping strategies: A theoretically based approach. Journal of Personality and Social Psychology, 56, 267–283, 1989], supporting the interpretation that this anticipatory activation was associated with an attempt to cope with the emotional impact of uncontrollable pain. A regression model containing the two prefrontal clusters (VLPFC and pACC) predicted 64% of the variance in pain rating difference, with activation in the two additional regions (PAG and insula/SII) predicting almost no additional variance. In addition to supporting the conclusion that the impact of perceived controllability on pain perception varies highly between individuals, these findings suggest that these effects are primarily top-down, driven by processes in regions of the prefrontal cortex previously associated with cognitive modulation of pain and emotion regulation.
Resumo:
Given the growing impact of human activities on the sea, managers are increasingly turning to marine protected areas (MPAs) to protect marine habitats and species. Many MPAs have been unsuccessful, however, and lack of income has been identified as a primary reason for failure. In this study, data from a global survey of 79 MPAs in 36 countries were analysed and attempts made to construct predictive models to determine the income requirements of any given MPA. Statistical tests were used to uncover possible patterns and relationships in the data, with two basic approaches. In the first of these, an attempt was made to build an explanatory "bottom-up" model of the cost structures that might be required to pursue various management activities. This proved difficult in practice owing to the very broad range of applicable data, spanning many orders of magnitude. In the second approach, a "top-down" regression model was constructed using logarithms of the base data, in order to address the breadth of the data ranges. This approach suggested that MPA size and visitor numbers together explained 46% of the minimum income requirements (P < 0.001), with area being the slightly more influential factor. The significance of area to income requirements was of little surprise, given its profile in the literature. However, the relationship between visitors and income requirements might go some way to explaining why northern hemisphere MPAs with apparently high incomes still claim to be under-funded. The relationship between running costs and visitor numbers has important implications not only in determining a realistic level of funding for MPAs, but also in assessing from where funding might be obtained. Since a substantial proportion of the income of many MPAs appears to be utilized for amenity purposes, a case may be made for funds to be provided from the typically better resourced government social and educational budgets as well as environmental budgets. Similarly visitor fees, already an important source of funding for some MPAs, might have a broader role to play in how MPAs are financed in the future. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Fixed transactions costs that prohibit exchange engender bias in supply analysis due to censoring of the sample observations. The associated bias in conventional regression procedures applied to censored data and the construction of robust methods for mitigating bias have been preoccupations of applied economists since Tobin [Econometrica 26 (1958) 24]. This literature assumes that the true point of censoring in the data is zero and, when this is not the case, imparts a bias to parameter estimates of the censored regression model. We conjecture that this bias can be significant; affirm this from experiments; and suggest techniques for mitigating this bias using Bayesian procedures. The bias-mitigating procedures are based on modifications of the key step that facilitates Bayesian estimation of the censored regression model; are easy to implement; work well in both small and large samples; and lead to significantly improved inference in the censored regression model. These findings are important in light of the widespread use of the zero-censored Tobit regression and we investigate their consequences using data on milk-market participation in the Ethiopian highlands. (C) 2004 Elsevier B.V. All rights reserved.