802 resultados para multiple linear regression
Resumo:
In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields, surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain) during the 1961-2001 period. Three types of downscaling models have been built: (i) analogues, (ii) analogues followed by random forests and (iii) analogues followed by multiple linear regression. The inputs consist of data (predictor fields) taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961-1996) and a test period (19972001), where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributed to the analogues-calculation stage.
Resumo:
The paper describes the development and application of a multiple linear regression model to identify how the key elements of waste and recycling infrastructure, namely container capacity and frequency of collection affect the yield from municipal kerbside recycling programmes. The overall aim of the research was to gain an understanding of the factors affecting the yield from municipal kerbside recycling programmes in Scotland. The study isolates the principal kerbside collection service offered by 32 councils across Scotland, eliminating those recycling programmes associated with flatted properties or multi occupancies. The results of a regression analysis model has identified three principal factors which explain 80% of the variability in the average yield of the principal dry recyclate services: weekly residual waste capacity, number of materials collected and the weekly recycling capacity. The use of the model has been evaluated and recommendations made on ongoing methodological development and the use of the results in informing the design of kerbside recycling programmes. The authors hope that the research can provide insights for the ongoing development of methods to optimise the design and operation of kerbside recycling programmes.
Resumo:
The prediction of the time and the efficiency of the remediation of contaminated soils using soil vapor extraction remain a difficult challenge to the scientific community and consultants. This work reports the development of multiple linear regression and artificial neural network models to predict the remediation time and efficiency of soil vapor extractions performed in soils contaminated separately with benzene, toluene, ethylbenzene, xylene, trichloroethylene, and perchloroethylene. The results demonstrated that the artificial neural network approach presents better performances when compared with multiple linear regression models. The artificial neural network model allowed an accurate prediction of remediation time and efficiency based on only soil and pollutants characteristics, and consequently allowing a simple and quick previous evaluation of the process viability.
Resumo:
Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.
Resumo:
A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Resumo:
2002 Mathematics Subject Classification: 62J05, 62G35.
Resumo:
This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.
Resumo:
The ecotoxicological response of the living organisms in an aquatic system depends on the physical, chemical and bacteriological variables, as well as the interactions between them. An important challenge to scientists is to understand the interaction and behaviour of factors involved in a multidimensional process such as the ecotoxicological response.With this aim, multiple linear regression (MLR) and principal component regression were applied to the ecotoxicity bioassay response of Chlorella vulgaris and Vibrio fischeri in water collected at seven sites of Leça river during five monitoring campaigns (February, May, June, August and September of 2006). The river water characterization included the analysis of 22 physicochemical and 3 microbiological parameters. The model that best fitted the data was MLR, which shows: (i) a negative correlation with dissolved organic carbon, zinc and manganese, and a positive one with turbidity and arsenic, regarding C. vulgaris toxic response; (ii) a negative correlation with conductivity and turbidity and a positive one with phosphorus, hardness, iron, mercury, arsenic and faecal coliforms, concerning V. fischeri toxic response. This integrated assessment may allow the evaluation of the effect of future pollution abatement measures over the water quality of Leça River.
Resumo:
Multiple linear regression model plays a key role in statistical inference and it has extensive applications in business, environmental, physical and social sciences. Multicollinearity has been a considerable problem in multiple regression analysis. When the regressor variables are multicollinear, it becomes difficult to make precise statistical inferences about the regression coefficients. There are some statistical methods that can be used, which are discussed in this thesis are ridge regression, Liu, two parameter biased and LASSO estimators. Firstly, an analytical comparison on the basis of risk was made among ridge, Liu and LASSO estimators under orthonormal regression model. I found that LASSO dominates least squares, ridge and Liu estimators over a significant portion of the parameter space for large dimension. Secondly, a simulation study was conducted to compare performance of ridge, Liu and two parameter biased estimator by their mean squared error criterion. I found that two parameter biased estimator performs better than its corresponding ridge regression estimator. Overall, Liu estimator performs better than both ridge and two parameter biased estimator.
Resumo:
Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO3, NO3 and Mg do not change much from coast to inland while, SO4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent (R-2 similar to 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of similar to 5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Multiple input multiple output (MIMO) systems with large number of antennas have been gaining wide attention as they enable very high throughputs. A major impediment is the complexity at the receiver needed to detect the transmitted data. To this end we propose a new receiver, called LRR (Linear Regression of MMSE Residual), which improves the MMSE receiver by learning a linear regression model for the error of the MMSE receiver. The LRR receiver uses pilot data to estimate the channel, and then uses locally generated training data (not transmitted over the channel), to find the linear regression parameters. The proposed receiver is suitable for applications where the channel remains constant for a long period (slow-fading channels) and performs quite well: at a bit error rate (BER) of 10(-3), the SNR gain over MMSE receiver is about 7 dB for a 16 x 16 system; for a 64 x 64 system the gain is about 8.5 dB. For large coherence time, the complexity order of the LRR receiver is the same as that of the MMSE receiver, and in simulations we find that it needs about 4 times as many floating point operations. We also show that further gain of about 4 dB is obtained by local search around the estimate given by the LRR receiver.
Resumo:
In this thesis, we consider Bayesian inference on the detection of variance change-point models with scale mixtures of normal (for short SMN) distributions. This class of distributions is symmetric and thick-tailed and includes as special cases: Gaussian, Student-t, contaminated normal, and slash distributions. The proposed models provide greater flexibility to analyze a lot of practical data, which often show heavy-tail and may not satisfy the normal assumption. As to the Bayesian analysis, we specify some prior distributions for the unknown parameters in the variance change-point models with the SMN distributions. Due to the complexity of the joint posterior distribution, we propose an efficient Gibbs-type with Metropolis- Hastings sampling algorithm for posterior Bayesian inference. Thereafter, following the idea of [1], we consider the problems of the single and multiple change-point detections. The performance of the proposed procedures is illustrated and analyzed by simulation studies. A real application to the closing price data of U.S. stock market has been analyzed for illustrative purposes.