937 resultados para multiple linear regression models
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Resumo:
2002 Mathematics Subject Classification: 62J05, 62G35.
Resumo:
The main topic of this thesis is confounding in linear regression models. It arises when a relationship between an observed process, the covariate, and an outcome process, the response, is influenced by an unmeasured process, the confounder, associated with both. Consequently, the estimators for the regression coefficients of the measured covariates might be severely biased, less efficient and characterized by misleading interpretations. Confounding is an issue when the primary target of the work is the estimation of the regression parameters. The central point of the dissertation is the evaluation of the sampling properties of parameter estimators. This work aims to extend the spatial confounding framework to general structured settings and to understand the behaviour of confounding as a function of the data generating process structure parameters in several scenarios focusing on the joint covariate-confounder structure. In line with the spatial statistics literature, our purpose is to quantify the sampling properties of the regression coefficient estimators and, in turn, to identify the most prominent quantities depending on the generative mechanism impacting confounding. Once the sampling properties of the estimator conditionally on the covariate process are derived as ratios of dependent quadratic forms in Gaussian random variables, we provide an analytic expression of the marginal sampling properties of the estimator using Carlson’s R function. Additionally, we propose a representative quantity for the magnitude of confounding as a proxy of the bias, its first-order Laplace approximation. To conclude, we work under several frameworks considering spatial and temporal data with specific assumptions regarding the covariance and cross-covariance functions used to generate the processes involved. This study allows us to claim that the variability of the confounder-covariate interaction and of the covariate plays the most relevant role in determining the principal marker of the magnitude of confounding.
Resumo:
In health related research it is common to have multiple outcomes of interest in a single study. These outcomes are often analysed separately, ignoring the correlation between them. One would expect that a multivariate approach would be a more efficient alternative to individual analyses of each outcome. Surprisingly, this is not always the case. In this article we discuss different settings of linear models and compare the multivariate and univariate approaches. We show that for linear regression models, the estimates of the regression parameters associated with covariates that are shared across the outcomes are the same for the multivariate and univariate models while for outcome-specific covariates the multivariate model performs better in terms of efficiency.
Resumo:
Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included). This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.
Resumo:
In this paper, we compare three residuals to assess departures from the error assumptions as well as to detect outlying observations in log-Burr XII regression models with censored observations. These residuals can also be used for the log-logistic regression model, which is a special case of the log-Burr XII regression model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the modified martingale-type residual in log-Burr XII regression models with censored data.
Resumo:
This paper proposes a regression model considering the modified Weibull distribution. This distribution can be used to model bathtub-shaped failure rate functions. Assuming censored data, we consider maximum likelihood and Jackknife estimators for the parameters of the model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and we also present some ways to perform global influence. Besides, for different parameter settings, sample sizes and censoring percentages, various simulations are performed and the empirical distribution of the modified deviance residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for a martingale-type residual in log-modified Weibull regression models with censored data. Finally, we analyze a real data set under log-modified Weibull regression models. A diagnostic analysis and a model checking based on the modified deviance residual are performed to select appropriate models. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
We introduce the log-beta Weibull regression model based on the beta Weibull distribution (Famoye et al., 2005; Lee et al., 2007). We derive expansions for the moment generating function which do not depend on complicated functions. The new regression model represents a parametric family of models that includes as sub-models several widely known regression models that can be applied to censored survival data. We employ a frequentist analysis, a jackknife estimator, and a parametric bootstrap for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Further, for different parameter settings, sample sizes, and censoring percentages, several simulations are performed. In addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We define martingale and deviance residuals to evaluate the model assumptions. The extended regression model is very useful for the analysis of real data and could give more realistic fits than other special regression models.
Resumo:
The ecotoxicological response of the living organisms in an aquatic system depends on the physical, chemical and bacteriological variables, as well as the interactions between them. An important challenge to scientists is to understand the interaction and behaviour of factors involved in a multidimensional process such as the ecotoxicological response.With this aim, multiple linear regression (MLR) and principal component regression were applied to the ecotoxicity bioassay response of Chlorella vulgaris and Vibrio fischeri in water collected at seven sites of Leça river during five monitoring campaigns (February, May, June, August and September of 2006). The river water characterization included the analysis of 22 physicochemical and 3 microbiological parameters. The model that best fitted the data was MLR, which shows: (i) a negative correlation with dissolved organic carbon, zinc and manganese, and a positive one with turbidity and arsenic, regarding C. vulgaris toxic response; (ii) a negative correlation with conductivity and turbidity and a positive one with phosphorus, hardness, iron, mercury, arsenic and faecal coliforms, concerning V. fischeri toxic response. This integrated assessment may allow the evaluation of the effect of future pollution abatement measures over the water quality of Leça River.
Resumo:
Statistical models allow the representation of data sets and the estimation and/or prediction of the behavior of a given variable through its interaction with the other variables involved in a phenomenon. Among other different statistical models, are the autoregressive state-space models (ARSS) and the linear regression models (LR), which allow the quantification of the relationships among soil-plant-atmosphere system variables. To compare the quality of the ARSS and LR models for the modeling of the relationships between soybean yield and soil physical properties, Akaike's Information Criterion, which provides a coefficient for the selection of the best model, was used in this study. The data sets were sampled in a Rhodic Acrudox soil, along a spatial transect with 84 points spaced 3 m apart. At each sampling point, soybean samples were collected for yield quantification. At the same site, soil penetration resistance was also measured and soil samples were collected to measure soil bulk density in the 0-0.10 m and 0.10-0.20 m layers. Results showed autocorrelation and a cross correlation structure of soybean yield and soil penetration resistance data. Soil bulk density data, however, were only autocorrelated in the 0-0.10 m layer and not cross correlated with soybean yield. The results showed the higher efficiency of the autoregressive space-state models in relation to the equivalent simple and multiple linear regression models using Akaike's Information Criterion. The resulting values were comparatively lower than the values obtained by the regression models, for all combinations of explanatory variables.
Resumo:
Is it possible to build predictive models (PMs) of soil particle-size distribution (psd) in a region with complex geology and a young and unstable land-surface? The main objective of this study was to answer this question. A set of 339 soil samples from a small slope catchment in Southern Brazil was used to build PMs of psd in the surface soil layer. Multiple linear regression models were constructed using terrain attributes (elevation, slope, catchment area, convergence index, and topographic wetness index). The PMs explained more than half of the data variance. This performance is similar to (or even better than) that of the conventional soil mapping approach. For some size fractions, the PM performance can reach 70 %. Largest uncertainties were observed in geologically more complex areas. Therefore, significant improvements in the predictions can only be achieved if accurate geological data is made available. Meanwhile, PMs built on terrain attributes are efficient in predicting the particle-size distribution (psd) of soils in regions of complex geology.
Resumo:
In this article, we compare three residuals based on the deviance component in generalised log-gamma regression models with censored observations. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. For all cases studied, the empirical distributions of the proposed residuals are in general symmetric around zero, but only a martingale-type residual presented negligible kurtosis for the majority of the cases studied. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for the martingale-type residual in generalised log-gamma regression models with censored data. A lifetime data set is analysed under log-gamma regression models and a model checking based on the martingale-type residual is performed.
Resumo:
We consider the issue of assessing influence of observations in the class of Birnbaum-Saunders nonlinear regression models, which is useful in lifetime data analysis. Our results generalize those in Galea et al. [8] which are confined to Birnbaum-Saunders linear regression models. Some influence methods, such as the local influence, total local influence of an individual and generalized leverage are discussed. Additionally, the normal curvatures for studying local influence are derived under some perturbation schemes. We also give an application to a real fatigue data set.
Resumo:
Lemonte and Cordeiro [Birnbaum-Saunders nonlinear regression models, Comput. Stat. Data Anal. 53 (2009), pp. 4441-4452] introduced a class of Birnbaum-Saunders (BS) nonlinear regression models potentially useful in lifetime data analysis. We give a general matrix Bartlett correction formula to improve the likelihood ratio (LR) tests in these models. The formula is simple enough to be used analytically to obtain several closed-form expressions in special cases. Our results generalize those in Lemonte et al. [Improved likelihood inference in Birnbaum-Saunders regressions, Comput. Stat. DataAnal. 54 (2010), pp. 1307-1316], which hold only for the BS linear regression models. We consider Monte Carlo simulations to show that the corrected tests work better than the usual LR tests.
Resumo:
In the Bayesian framework, predictions for a regression problem are expressed in terms of a distribution of output values. The mode of this distribution corresponds to the most probable output, while the uncertainty associated with the predictions can conveniently be expressed in terms of error bars. In this paper we consider the evaluation of error bars in the context of the class of generalized linear regression models. We provide insights into the dependence of the error bars on the location of the data points and we derive an upper bound on the true error bars in terms of the contributions from individual data points which are themselves easily evaluated.