17 resultados para Regression methods
em CentAUR: Central Archive University of Reading - UK
Resumo:
Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.
Resumo:
The current energy requirements system used in the United Kingdom for lactating dairy cows utilizes key parameters such as metabolizable energy intake (MEI) at maintenance (MEm), the efficiency of utilization of MEI for 1) maintenance, 2) milk production (k(l)), 3) growth (k(g)), and the efficiency of utilization of body stores for milk production (k(t)). Traditionally, these have been determined using linear regression methods to analyze energy balance data from calorimetry experiments. Many studies have highlighted a number of concerns over current energy feeding systems particularly in relation to these key parameters, and the linear models used for analyzing. Therefore, a database containing 652 dairy cow observations was assembled from calorimetry studies in the United Kingdom. Five functions for analyzing energy balance data were considered: straight line, two diminishing returns functions, (the Mitscherlich and the rectangular hyperbola), and two sigmoidal functions (the logistic and the Gompertz). Meta-analysis of the data was conducted to estimate k(g) and k(t). Values of 0.83 to 0.86 and 0.66 to 0.69 were obtained for k(g) and k(t) using all the functions (with standard errors of 0.028 and 0.027), respectively, which were considerably different from previous reports of 0.60 to 0.75 for k(g) and 0.82 to 0.84 for k(t). Using the estimated values of k(g) and k(t), the data were corrected to allow for body tissue changes. Based on the definition of k(l) as the derivative of the ratio of milk energy derived from MEI to MEI directed towards milk production, MEm and k(l) were determined. Meta-analysis of the pooled data showed that the average k(l) ranged from 0.50 to 0.58 and MEm ranged between 0.34 and 0.64 MJ/kg of BW0.75 per day. Although the constrained Mitscherlich fitted the data as good as the straight line, more observations at high energy intakes (above 2.4 MJ/kg of BW0.75 per day) are required to determine conclusively whether milk energy is related to MEI linearly or not.
Resumo:
The Normal Quantile Transform (NQT) has been used in many hydrological and meteorological applications in order to make the Cumulated Distribution Function (CDF) of the observed, simulated and forecast river discharge, water level or precipitation data Gaussian. It is also the heart of the meta-Gaussian model for assessing the total predictive uncertainty of the Hydrological Uncertainty Processor (HUP) developed by Krzysztofowicz. In the field of geo-statistics this transformation is better known as the Normal-Score Transform. In this paper some possible problems caused by small sample sizes when applying the NQT in flood forecasting systems will be discussed and a novel way to solve the problem will be outlined by combining extreme value analysis and non-parametric regression methods. The method will be illustrated by examples of hydrological stream-flow forecasts.
Resumo:
Sixteen years (1994 – 2009) of ozone profiling by ozonesondes at Valentia Meteorological and Geophysical Observatory, Ireland (51.94° N, 10.23° W) along with a co-located MkIV Brewer spectrophotometer for the period 1993–2009 are analyzed. Simple and multiple linear regression methods are used to infer the recent trend, if any, in stratospheric column ozone over the station. The decadal trend from 1994 to 2010 is also calculated from the monthly mean data of Brewer and column ozone data derived from satellite observations. Both of these show a 1.5 % increase per decade during this period with an uncertainty of about ±0.25 %. Monthly mean data for March show a much stronger trend of ~ 4.8 % increase per decade for both ozonesonde and Brewer data. The ozone profile is divided between three vertical slots of 0–15 km, 15–26 km, and 26 km to the top of the atmosphere and a 11-year running average is calculated. Ozone values for the month of March only are observed to increase at each level with a maximum change of +9.2 ± 3.2 % per decade (between years 1994 and 2009) being observed in the vertical region from 15 to 26 km. In the tropospheric region from 0 to 15 km, the trend is positive but with a poor statistical significance. However, for the top level of above 26 km the trend is significantly positive at about 4 % per decade. The March integrated ozonesonde column ozone during this period is found to increase at a rate of ~6.6 % per decade compared with the Brewer and satellite positive trends of ~5 % per decade.
Resumo:
Although medieval rentals have been extensively studied, few scholars have used them to analyse variations in the rents paid on individual properties within a town. It has been claimed that medieval rents did not reflect economic values or market forces, but were set according to social and political rather than economic criteria, and remained ossified at customary levels. This paper uses hedonic regression methods to test whether property rents in medieval Gloucester were influenced by classic economic factors such as the location and use of a property. It investigates both rents and local rates (landgavel), and explores the relationship between the two. It also examines spatial autocorrelation. It finds significant relationships between urban rents and property characteristics that are similar to those found in modern studies. The findings are consistent with the view that, in Gloucester at least, medieval rents were strongly influenced by classical economic factors working through a competitive urban property market.
Resumo:
Although the sunspot-number series have existed since the mid-19th century, they are still the subject of intense debate, with the largest uncertainty being related to the "calibration" of the visual acuity of individual observers in the past. Daisy-chain regression methods are applied to inter-calibrate the observers which may lead to significant bias and error accumulation. Here we present a novel method to calibrate the visual acuity of the key observers to the reference data set of Royal Greenwich Observatory sunspot groups for the period 1900-1976, using the statistics of the active-day fraction. For each observer we independently evaluate their observational thresholds [S_S] defined such that the observer is assumed to miss all of the groups with an area smaller than S_S and report all the groups larger than S_S. Next, using a Monte-Carlo method we construct, from the reference data set, a correction matrix for each observer. The correction matrices are significantly non-linear and cannot be approximated by a linear regression or proportionality. We emphasize that corrections based on a linear proportionality between annually averaged data lead to serious biases and distortions of the data. The correction matrices are applied to the original sunspot group records for each day, and finally the composite corrected series is produced for the period since 1748. The corrected series displays secular minima around 1800 (Dalton minimum) and 1900 (Gleissberg minimum), as well as the Modern grand maximum of activity in the second half of the 20th century. The uniqueness of the grand maximum is confirmed for the last 250 years. It is shown that the adoption of a linear relationship between the data of Wolf and Wolfer results in grossly inflated group numbers in the 18th and 19th centuries in some reconstructions.
Resumo:
In this correspondence new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness via combined parameter regularization and new robust structural selective criteria. In parallel to parameter regularization, we use two classes of robust model selection criteria based on either experimental design criteria that optimizes model adequacy, or the predicted residual sums of squares (PRESS) statistic that optimizes model generalization capability, respectively. Three robust identification algorithms are introduced, i.e., combined A- and D-optimality with regularized orthogonal least squares algorithm, respectively; and combined PRESS statistic with regularized orthogonal least squares algorithm. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalization scheme in orthogonal least squares or regularized orthogonal least squares has been extended such that the new algorithms are computationally efficient. Numerical examples are included to demonstrate effectiveness of the algorithms.
Resumo:
Procedures for routine analysis of soil phosphorus (P) have been used for assessment of P status, distribution and P losses from cultivated mineral soils. No similar studies have been carried out on wetland peat soils. The objective was to compare extraction efficiency of ammonium lactate (PAL), sodium bicarbonate (P-Olsen), and double calcium lactate (P-DCaL) and P distribution in the soil profile of wetland peat soils. For this purpose, 34 samples of the 0-30, 30-60 and 60-90 cm layers were collected from peat soils in Germany, Israel, Poland, Slovenia, Sweden and the United Kingdom and analysed for P. Mean soil pH (CaCl2, 0.01 M) was 5.84, 5.51 and 5.47 in the 0-30, 30-60 and 60-90 cm layers, respectively. The P-DCaL was consistently about half the magnitude of either P-AL or P-Olsen. The efficiency of P extraction increased in the order P-DCaL < P-AL &LE; P-Olsen, with corresponding means (mg kg(-1)) for all soils (34 samples) of 15.32, 33.49 and 34.27 in 0-30 cm; 8.87, 17.30 and 21.46 in 30-60 cm; and 5.69, 14.00 and 21.40 in 60-90 cm. The means decreased with depth. When examining soils for each country separately, P-Olsen was relatively evenly distributed in the German, UK and Slovenian soils. P-Olsen was linearly correlated (r = 0.594, P = 0.0002) with pH, whereas the three P tests (except P-Olsen vs P-DCaL) significantly correlated with each other (P = 0.017850.0001). The strongest correlation (r = 0.617, P = 0.0001) was recorded for P-AL vs P-DCaL) and the two methods were inter-convertible using a regression equation: P-AL = -22.593 + 5.353 pH + 1.423 P-DCaL, R-2 = 0.550.
Resumo:
Multiple regression analysis is a statistical technique which allows to predict a dependent variable from m ore than one independent variable and also to determine influential independent variables. Using experimental data, in this study the multiple regression analysis is applied to predict the room mean velocity and determine the most influencing parameters on the velocity. More than 120 experiments for four different heat source locations were carried out in a test chamber with a high level wall mounted air supply terminal at air change rates 3-6 ach. The influence of the environmental parameters such as supply air momentum, room heat load, Archimedes number and local temperature ratio, were examined by two methods: a simple regression analysis incorporated into scatter matrix plots and multiple stepwise regression analysis. It is concluded that, when a heat source is located along the jet centre line, the supply momentum mainly influences the room mean velocity regardless of the plume strength. However, when the heat source is located outside the jet region, the local temperature ratio (the inverse of the local heat removal effectiveness) is a major influencing parameter.
Resumo:
The comparison of cognitive and linguistic skills in individuals with developmental disorders is fraught with methodological and psychometric difficulties. In this paper, we illustrate some of these issues by comparing the receptive vocabulary knowledge and non-verbal reasoning abilities of 41 children with Williams syndrome, a genetic disorder in which language abilities are often claimed to be relatively strong. Data from this group were compared with data from typically developing children, children with Down syndrome, and children with non-specific learning difficulties using a number of approaches including comparison of age-equivalent scores, matching, analysis of covariance, and regression-based standardization. Across these analyses children with Williams syndrome consistently demonstrated relatively good receptive vocabulary knowledge, although this effect appeared strongest in the oldest children.
Resumo:
A novel sparse kernel density estimator is derived based on a regression approach, which selects a very small subset of significant kernels by means of the D-optimality experimental design criterion using an orthogonal forward selection procedure. The weights of the resulting sparse kernel model are calculated using the multiplicative nonnegative quadratic programming algorithm. The proposed method is computationally attractive, in comparison with many existing kernel density estimation algorithms. Our numerical results also show that the proposed method compares favourably with other existing methods, in terms of both test accuracy and model sparsity, for constructing kernel density estimates.
Resumo:
The paper introduces an efficient construction algorithm for obtaining sparse linear-in-the-weights regression models based on an approach of directly optimizing model generalization capability. This is achieved by utilizing the delete-1 cross validation concept and the associated leave-one-out test error also known as the predicted residual sums of squares (PRESS) statistic, without resorting to any other validation data set for model evaluation in the model construction process. Computational efficiency is ensured using an orthogonal forward regression, but the algorithm incrementally minimizes the PRESS statistic instead of the usual sum of the squared training errors. A local regularization method can naturally be incorporated into the model selection procedure to further enforce model sparsity. The proposed algorithm is fully automatic, and the user is not required to specify any criterion to terminate the model construction procedure. Comparisons with some of the existing state-of-art modeling methods are given, and several examples are included to demonstrate the ability of the proposed algorithm to effectively construct sparse models that generalize well.
Resumo:
A fundamental principle in practical nonlinear data modeling is the parsimonious principle of constructing the minimal model that explains the training data well. Leave-one-out (LOO) cross validation is often used to estimate generalization errors by choosing amongst different network architectures (M. Stone, "Cross validatory choice and assessment of statistical predictions", J. R. Stast. Soc., Ser. B, 36, pp. 117-147, 1974). Based upon the minimization of LOO criteria of either the mean squares of LOO errors or the LOO misclassification rate respectively, we present two backward elimination algorithms as model post-processing procedures for regression and classification problems. The proposed backward elimination procedures exploit an orthogonalization procedure to enable the orthogonality between the subspace as spanned by the pruned model and the deleted regressor. Subsequently, it is shown that the LOO criteria used in both algorithms can be calculated via some analytic recursive formula, as derived in this contribution, without actually splitting the estimation data set so as to reduce computational expense. Compared to most other model construction methods, the proposed algorithms are advantageous in several aspects; (i) There are no tuning parameters to be optimized through an extra validation data set; (ii) The procedure is fully automatic without an additional stopping criteria; and (iii) The model structure selection is directly based on model generalization performance. The illustrative examples on regression and classification are used to demonstrate that the proposed algorithms are viable post-processing methods to prune a model to gain extra sparsity and improved generalization.
Resumo:
This paper derives an efficient algorithm for constructing sparse kernel density (SKD) estimates. The algorithm first selects a very small subset of significant kernels using an orthogonal forward regression (OFR) procedure based on the D-optimality experimental design criterion. The weights of the resulting sparse kernel model are then calculated using a modified multiplicative nonnegative quadratic programming algorithm. Unlike most of the SKD estimators, the proposed D-optimality regression approach is an unsupervised construction algorithm and it does not require an empirical desired response for the kernel selection task. The strength of the D-optimality OFR is owing to the fact that the algorithm automatically selects a small subset of the most significant kernels related to the largest eigenvalues of the kernel design matrix, which counts for the most energy of the kernel training data, and this also guarantees the most accurate kernel weight estimate. The proposed method is also computationally attractive, in comparison with many existing SKD construction algorithms. Extensive numerical investigation demonstrates the ability of this regression-based approach to efficiently construct a very sparse kernel density estimate with excellent test accuracy, and our results show that the proposed method compares favourably with other existing sparse methods, in terms of test accuracy, model sparsity and complexity, for constructing kernel density estimates.
Resumo:
(ABR) is of fundamental importance to the investiga- tion of the auditory system behavior, though its in- terpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analyzing the ABR, clinicians are often interested in the identi- fication of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave la- tency) is a practical tool for the diagnosis of disorders affecting the auditory system. In this context, the aim of this research is to compare ABR manual/visual analysis provided by different examiners. Methods: The ABR data were collected from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years). A total of 160 data samples were analyzed and a pair- wise comparison between four distinct examiners was executed. We carried out a statistical study aiming to identify significant differences between assessments provided by the examiners. For this, we used Linear Regression in conjunction with Bootstrap, as a me- thod for evaluating the relation between the responses given by the examiners. Results: The analysis sug- gests agreement among examiners however reveals differences between assessments of the variability of the waves. We quantified the magnitude of the ob- tained wave latency differences and 18% of the inves- tigated waves presented substantial differences (large and moderate) and of these 3.79% were considered not acceptable for the clinical practice. Conclusions: Our results characterize the variability of the manual analysis of ABR data and the necessity of establishing unified standards and protocols for the analysis of these data. These results may also contribute to the validation and development of automatic systems that are employed in the early diagnosis of hearing loss.