51 resultados para Lanczos, Linear systems, Generalized cross validation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new parameter-estimation algorithm, which minimises the cross-validated prediction error for linear-in-the-parameter models, is proposed, based on stacked regression and an evolutionary algorithm. It is initially shown that cross-validation is very important for prediction in linear-in-the-parameter models using a criterion called the mean dispersion error (MDE). Stacked regression, which can be regarded as a sophisticated type of cross-validation, is then introduced based on an evolutionary algorithm, to produce a new parameter-estimation algorithm, which preserves the parsimony of a concise model structure that is determined using the forward orthogonal least-squares (OLS) algorithm. The PRESS prediction errors are used for cross-validation, and the sunspot and Canadian lynx time series are used to demonstrate the new algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study investigated the potential application of mid-infrared spectroscopy (MIR 4,000–900 cm−1) for the determination of milk coagulation properties (MCP), titratable acidity (TA), and pH in Brown Swiss milk samples (n = 1,064). Because MCP directly influence the efficiency of the cheese-making process, there is strong industrial interest in developing a rapid method for their assessment. Currently, the determination of MCP involves time-consuming laboratory-based measurements, and it is not feasible to carry out these measurements on the large numbers of milk samples associated with milk recording programs. Mid-infrared spectroscopy is an objective and nondestructive technique providing rapid real-time analysis of food compositional and quality parameters. Analysis of milk rennet coagulation time (RCT, min), curd firmness (a30, mm), TA (SH°/50 mL; SH° = Soxhlet-Henkel degree), and pH was carried out, and MIR data were recorded over the spectral range of 4,000 to 900 cm−1. Models were developed by partial least squares regression using untreated and pretreated spectra. The MCP, TA, and pH prediction models were improved by using the combined spectral ranges of 1,600 to 900 cm−1, 3,040 to 1,700 cm−1, and 4,000 to 3,470 cm−1. The root mean square errors of cross-validation for the developed models were 2.36 min (RCT, range 24.9 min), 6.86 mm (a30, range 58 mm), 0.25 SH°/50 mL (TA, range 3.58 SH°/50 mL), and 0.07 (pH, range 1.15). The most successfully predicted attributes were TA, RCT, and pH. The model for the prediction of TA provided approximate prediction (R2 = 0.66), whereas the predictive models developed for RCT and pH could discriminate between high and low values (R2 = 0.59 to 0.62). It was concluded that, although the models require further development to improve their accuracy before their application in industry, MIR spectroscopy has potential application for the assessment of RCT, TA, and pH during routine milk analysis in the dairy industry. The implementation of such models could be a means of improving MCP through phenotypic-based selection programs and to amend milk payment systems to incorporate MCP into their payment criteria.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current methods for estimating vegetation parameters are generally sub-optimal in the way they exploit information and do not generally consider uncertainties. We look forward to a future where operational dataassimilation schemes improve estimates by tracking land surface processes and exploiting multiple types of observations. Dataassimilation schemes seek to combine observations and models in a statistically optimal way taking into account uncertainty in both, but have not yet been much exploited in this area. The EO-LDAS scheme and prototype, developed under ESA funding, is designed to exploit the anticipated wealth of data that will be available under GMES missions, such as the Sentinel family of satellites, to provide improved mapping of land surface biophysical parameters. This paper describes the EO-LDAS implementation, and explores some of its core functionality. EO-LDAS is a weak constraint variational dataassimilationsystem. The prototype provides a mechanism for constraint based on a prior estimate of the state vector, a linear dynamic model, and EarthObservationdata (top-of-canopy reflectance here). The observation operator is a non-linear optical radiative transfer model for a vegetation canopy with a soil lower boundary, operating over the range 400 to 2500 nm. Adjoint codes for all model and operator components are provided in the prototype by automatic differentiation of the computer codes. In this paper, EO-LDAS is applied to the problem of daily estimation of six of the parameters controlling the radiative transfer operator over the course of a year (> 2000 state vector elements). Zero and first order process model constraints are implemented and explored as the dynamic model. The assimilation estimates all state vector elements simultaneously. This is performed in the context of a typical Sentinel-2 MSI operating scenario, using synthetic MSI observations simulated with the observation operator, with uncertainties typical of those achieved by optical sensors supposed for the data. The experiments consider a baseline state vector estimation case where dynamic constraints are applied, and assess the impact of dynamic constraints on the a posteriori uncertainties. The results demonstrate that reductions in uncertainty by a factor of up to two might be obtained by applying the sorts of dynamic constraints used here. The hyperparameter (dynamic model uncertainty) required to control the assimilation are estimated by a cross-validation exercise. The result of the assimilation is seen to be robust to missing observations with quite large data gaps.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an efficient graph-based algorithm for quantifying the similarity of household-level energy use profiles, using a notion of similarity that allows for small time–shifts when comparing profiles. Experimental results on a real smart meter data set demonstrate that in cases of practical interest our technique is far faster than the existing method for computing the same similarity measure. Having a fast algorithm for measuring profile similarity improves the efficiency of tasks such as clustering of customers and cross-validation of forecasting methods using historical data. Furthermore, we apply a generalisation of our algorithm to produce substantially better household-level energy use forecasts from historical smart meter data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this paper is essentially twofold: first, to describe the use of spherical nonparametric estimators for determining statistical diagnostic fields from ensembles of feature tracks on a global domain, and second, to report the application of these techniques to data derived from a modern general circulation model. New spherical kernel functions are introduced that are more efficiently computed than the traditional exponential kernels. The data-driven techniques of cross-validation to determine the amount elf smoothing objectively, and adaptive smoothing to vary the smoothing locally, are also considered. Also introduced are techniques for combining seasonal statistical distributions to produce longer-term statistical distributions. Although all calculations are performed globally, only the results for the Northern Hemisphere winter (December, January, February) and Southern Hemisphere winter (June, July, August) cyclonic activity are presented, discussed, and compared with previous studies. Overall, results for the two hemispheric winters are in good agreement with previous studies, both for model-based studies and observational studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diffuse reflectance spectroscopy (DRS) is increasingly being used to predict numerous soil physical, chemical and biochemical properties. However, soil properties and processes vary at different scales and, as a result, relationships between soil properties often depend on scale. In this paper we report on how the relationship between one such property, cation exchange capacity (CEC), and the DRS of the soil depends on spatial scale. We show this by means of a nested analysis of covariance of soils sampled on a balanced nested design in a 16 km × 16 km area in eastern England. We used principal components analysis on the DRS to obtain a reduced number of variables while retaining key variation. The first principal component accounted for 99.8% of the total variance, the second for 0.14%. Nested analysis of the variation in the CEC and the two principal components showed that the substantial variance components are at the > 2000-m scale. This is probably the result of differences in soil composition due to parent material. We then developed a model to predict CEC from the DRS and used partial least squares (PLS) regression do to so. Leave-one-out cross-validation results suggested a reasonable predictive capability (R2 = 0.71 and RMSE = 0.048 molc kg− 1). However, the results from the independent validation were not as good, with R2 = 0.27, RMSE = 0.056 molc kg− 1 and an overall correlation of 0.52. This would indicate that DRS may not be useful for predictions of CEC. When we applied the analysis of covariance between predicted and observed we found significant scale-dependent correlations at scales of 50 and 500 m (0.82 and 0.73 respectively). DRS measurements can therefore be useful to predict CEC if predictions are required, for example, at the field scale (50 m). This study illustrates that the relationship between DRS and soil properties is scale-dependent and that this scale dependency has important consequences for prediction of soil properties from DRS data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maps of kriged soil properties for precision agriculture are often based on a variogram estimated from too few data because the costs of sampling and analysis are often prohibitive. If the variogram has been computed by the usual method of moments, it is likely to be unstable when there are fewer than 100 data. The scale of variation in soil properties should be investigated prior to sampling by computing a variogram from ancillary data, such as an aerial photograph of the bare soil. If the sampling interval suggested by this is large in relation to the size of the field there will be too few data to estimate a reliable variogram for kriging. Standardized variograms from aerial photographs can be used with standardized soil data that are sparse, provided the data are spatially structured and the nugget:sill ratio is similar to that of a reliable variogram of the property. The problem remains of how to set this ratio in the absence of an accurate variogram. Several methods of estimating the nugget:sill ratio for selected soil properties are proposed and evaluated. Standardized variograms with nugget:sill ratios set by these methods are more similar to those computed from intensive soil data than are variograms computed from sparse soil data. The results of cross-validation and mapping show that the standardized variograms provide more accurate estimates, and preserve the main patterns of variation better than those computed from sparse data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Matheron's usual variogram estimator can result in unreliable variograms when data are strongly asymmetric or skewed. Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. This paper examines the effects of underlying asymmetry on the variogram and on the accuracy of prediction, and the second one examines the effects arising from outliers. Standard geostatistical texts suggest ways of dealing with underlying asymmetry; however, this is based on informed intuition rather than detailed investigation. To determine whether the methods generally used to deal with underlying asymmetry are appropriate, the effects of different coefficients of skewness on the shape of the experimental variogram and on the model parameters were investigated. Simulated annealing was used to create normally distributed random fields of different size from variograms with different nugget:sill ratios. These data were then modified to give different degrees of asymmetry and the experimental variogram was computed in each case. The effects of standard data transformations on the form of the variogram were also investigated. Cross-validation was used to assess quantitatively the performance of the different variogram models for kriging. The results showed that the shape of the variogram was affected by the degree of asymmetry, and that the effect increased as the size of data set decreased. Transformations of the data were more effective in reducing the skewness coefficient in the larger sets of data. Cross-validation confirmed that variogram models from transformed data were more suitable for kriging than were those from the raw asymmetric data. The results of this study have implications for the 'standard best practice' in dealing with asymmetry in data for geostatistical analyses. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. The first paper of this series examined the effects of the former on the variogram and this paper examines the effects of asymmetry arising from outliers. Simulated annealing was used to create normally distributed random fields of different size that are realizations of known processes described by variograms with different nugget:sill ratios. These primary data sets were then contaminated with randomly located and spatially aggregated outliers from a secondary process to produce different degrees of asymmetry. Experimental variograms were computed from these data by Matheron's estimator and by three robust estimators. The effects of standard data transformations on the coefficient of skewness and on the variogram were also investigated. Cross-validation was used to assess the performance of models fitted to experimental variograms computed from a range of data contaminated by outliers for kriging. The results showed that where skewness was caused by outliers the variograms retained their general shape, but showed an increase in the nugget and sill variances and nugget:sill ratios. This effect was only slightly more for the smallest data set than for the two larger data sets and there was little difference between the results for the latter. Overall, the effect of size of data set was small for all analyses. The nugget:sill ratio showed a consistent decrease after transformation to both square roots and logarithms; the decrease was generally larger for the latter, however. Aggregated outliers had different effects on the variogram shape from those that were randomly located, and this also depended on whether they were aggregated near to the edge or the centre of the field. The results of cross-validation showed that the robust estimators and the removal of outliers were the most effective ways of dealing with outliers for variogram estimation and kriging. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the application of the conjugate gradient method to the solution of large, symmetric indefinite linear systems. Special emphasis is put on the use of constraint preconditioners and a new factorization that can reduce the number of flops required by the preconditioning step. Results concerning the eigenvalues of the preconditioned matrix and its minimum polynomial are given. Numerical experiments validate these conclusions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of special units for logarithmic ratio quantities is reviewed. The neper is used with a natural logarithm (logarithm to the base e) to express the logarithm of the amplitude ratio of two pure sinusoidal signals, particularly in the context of linear systems where it is desired to represent the gain or loss in amplitude of a single-frequency signal between the input and output. The bel, and its more commonly used submultiple, the decibel, are used with a decadic logarithm (logarithm to the base 10) to measure the ratio of two power-like quantities, such as a mean square signal or a mean square sound pressure in acoustics. Thus two distinctly different quantities are involved. In this review we define the quantities first, without reference to the units, as is standard practice in any system of quantities and units. We show that two different definitions of the quantity power level, or logarithmic power ratio, are possible. We show that this leads to two different interpretations for the meaning and numerical values of the units bel and decibel. We review the question of which of these alternative definitions is actually used, or is used by implication, by workers in the field. Finally, we discuss the relative advantages of the alternative definitions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The serum peptidome may be a valuable source of diagnostic cancer biomarkers. Previous mass spectrometry (MS) studies have suggested that groups of related peptides discriminatory for different cancer types are generated ex vivo from abundant serum proteins by tumor-specific exopeptidases. We tested 2 complementary serum profiling strategies to see if similar peptides could be found that discriminate ovarian cancer from benign cases and healthy controls. METHODS: We subjected identically collected and processed serum samples from healthy volunteers and patients to automated polypeptide extraction on octadecylsilane-coated magnetic beads and separately on ZipTips before MALDI-TOF MS profiling at 2 centers. The 2 platforms were compared and case control profiling data analyzed to find altered MS peak intensities. We tested models built from training datasets for both methods for their ability to classify a blinded test set. RESULTS: Both profiling platforms had CVs of approximately 15% and could be applied for high-throughput analysis of clinical samples. The 2 methods generated overlapping peptide profiles, with some differences in peak intensity in different mass regions. In cross-validation, models from training data gave diagnostic accuracies up to 87% for discriminating malignant ovarian cancer from healthy controls and up to 81% for discriminating malignant from benign samples. Diagnostic accuracies up to 71% (malignant vs healthy) and up to 65% (malignant vs benign) were obtained when the models were validated on the blinded test set. CONCLUSIONS: For ovarian cancer, altered MALDI-TOF MS peptide profiles alone cannot be used for accurate diagnoses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.