920 resultados para Least-Squares prediction
Resumo:
The scaling of metabolic rates to body size is widely considered to be of great biological and ecological importance, and much attention has been devoted to determining its theoretical and empirical value. Most debate centers on whether the underlying power law describing metabolic rates is 2/3 (as predicted by scaling of surface area/volume relationships) or 3/4 ("Kleiber's law"). Although recent evidence suggests that empirically derived exponents vary among clades with radically different metabolic strategies, such as ectotherms and endotherms, models, such as the metabolic theory of ecology, depend on the assumption that there is at least a predominant, if not universal, metabolic scaling exponent. Most analyses claimed to support the predictions of general models, however, failed to control for phylogeny. We used phylogenetic generalized least-squares models to estimate allometric slopes for both basal metabolic rate (BMR) and field metabolic rate (FMR) in mammals. Metabolic rate scaling conformed to no single theoretical prediction, but varied significantly among phylogenetic lineages. In some lineages we found a 3/4 exponent, in others a 2/3 exponent, and in yet others exponents differed significantly from both theoretical values. Analysis of the phylogenetic signal in the data indicated that the assumptions of neither species-level analysis nor independent contrasts were met. Analyses that assumed no phylogenetic signal in the data (species-level analysis) or a strong phylogenetic signal (independent contrasts), therefore, returned estimates of allometric slopes that were erroneous in 30% and 50% of cases, respectively. Hence, quantitative estimation of the phylogenetic signal is essential for determining scaling exponents. The lack of evidence for a predominant scaling exponent in these analyses suggests that general models of metabolic scaling, and macro-ecological theories that depend on them, have little explanatory power.
Resumo:
A common problem in many data based modelling algorithms such as associative memory networks is the problem of the curse of dimensionality. In this paper, a new two-stage neurofuzzy system design and construction algorithm (NeuDeC) for nonlinear dynamical processes is introduced to effectively tackle this problem. A new simple preprocessing method is initially derived and applied to reduce the rule base, followed by a fine model detection process based on the reduced rule set by using forward orthogonal least squares model structure detection. In both stages, new A-optimality experimental design-based criteria we used. In the preprocessing stage, a lower bound of the A-optimality design criterion is derived and applied as a subset selection metric, but in the later stage, the A-optimality design criterion is incorporated into a new composite cost function that minimises model prediction error as well as penalises the model parameter variance. The utilisation of NeuDeC leads to unbiased model parameters with low parameter variance and the additional benefit of a parsimonious model structure. Numerical examples are included to demonstrate the effectiveness of this new modelling approach for high dimensional inputs.
Resumo:
A new parameter-estimation algorithm, which minimises the cross-validated prediction error for linear-in-the-parameter models, is proposed, based on stacked regression and an evolutionary algorithm. It is initially shown that cross-validation is very important for prediction in linear-in-the-parameter models using a criterion called the mean dispersion error (MDE). Stacked regression, which can be regarded as a sophisticated type of cross-validation, is then introduced based on an evolutionary algorithm, to produce a new parameter-estimation algorithm, which preserves the parsimony of a concise model structure that is determined using the forward orthogonal least-squares (OLS) algorithm. The PRESS prediction errors are used for cross-validation, and the sunspot and Canadian lynx time series are used to demonstrate the new algorithms.
Resumo:
The potential of visible-near infrared spectra, obtained using a light backscatter sensor, in conjunction with chemometrics, to predict curd moisture and whey fat content in a cheese vat was examined. A three-factor (renneting temperature, calcium chloride, cutting time), central composite design was carried out in triplicate. Spectra (300–1,100 nm) of the product in the cheese vat were captured during syneresis using a prototype light backscatter sensor. Stirring followed upon cutting the gel, and samples of curd and whey were removed at 10 min intervals and analyzed for curd moisture and whey fat content. Spectral data were used to develop models for predicting curd moisture and whey fat contents using partial least squares regression. Subjecting the spectral data set to Jack-knifing improved the accuracy of the models. The whey fat models (R = 0.91, 0.95) and curd moisture model (R = 0.86, 0.89) provided good and approximate predictions, respectively. Visible-near infrared spectroscopy was found to have potential for the prediction of important syneresis indices in stirred cheese vats.
Resumo:
Numerical weather prediction (NWP) centres use numerical models of the atmospheric flow to forecast future weather states from an estimate of the current state. Variational data assimilation (VAR) is used commonly to determine an optimal state estimate that miminizes the errors between observations of the dynamical system and model predictions of the flow. The rate of convergence of the VAR scheme and the sensitivity of the solution to errors in the data are dependent on the condition number of the Hessian of the variational least-squares objective function. The traditional formulation of VAR is ill-conditioned and hence leads to slow convergence and an inaccurate solution. In practice, operational NWP centres precondition the system via a control variable transform to reduce the condition number of the Hessian. In this paper we investigate the conditioning of VAR for a single, periodic, spatially-distributed state variable. We present theoretical bounds on the condition number of the original and preconditioned Hessians and hence demonstrate the improvement produced by the preconditioning. We also investigate theoretically the effect of observation position and error variance on the preconditioned system and show that the problem becomes more ill-conditioned with increasingly dense and accurate observations. Finally, we confirm the theoretical results in an operational setting by giving experimental results from the Met Office variational system.
Resumo:
The aim of this study was to investigate the effects of numerous milk compositional factors on milk coagulation properties using Partial Least Squares (PLS). Milk from herds of Jersey and Holstein- Friesian cattle was collected across the year and blended (n=55), to maximise variation in composition and coagulation. The milk was analysed for casein, protein, fat, titratable acidity, lactose, Ca2+, urea content, micelles size, fat globule size, somatic cell count and pH. Milk coagulation properties were defined as coagulation time, curd firmness and curd firmness rate measured by a controlled strain rheometer. The models derived from PLS had higher predictive power than previous models demonstrating the value of measuring more milk components. In addition to the well-established relationships with casein and protein levels, CMS and fat globule size were found to have as strong impact on all of the three models. The study also found a positive impact of fat on milk coagulation properties and a strong relationship between lactose and curd firmness, and urea and curd firmness rate, all of which warrant further investigation due to current lack of knowledge of the underlying mechanism. These findings demonstrate the importance of using a wider range of milk compositional variables for the prediction of the milk coagulation properties, and hence as indicators of milk suitability for cheese making.
Resumo:
Drinking water distribution networks risk exposure to malicious or accidental contamination. Several levels of responses are conceivable. One of them consists to install a sensor network to monitor the system on real time. Once a contamination has been detected, this is also important to take appropriate counter-measures. In the SMaRT-OnlineWDN project, this relies on modeling to predict both hydraulics and water quality. An online model use makes identification of the contaminant source and simulation of the contaminated area possible. The objective of this paper is to present SMaRT-OnlineWDN experience and research results for hydraulic state estimation with sampling frequency of few minutes. A least squares problem with bound constraints is formulated to adjust demand class coefficient to best fit the observed values at a given time. The criterion is a Huber function to limit the influence of outliers. A Tikhonov regularization is introduced for consideration of prior information on the parameter vector. Then the Levenberg-Marquardt algorithm is applied that use derivative information for limiting the number of iterations. Confidence intervals for the state prediction are also given. The results are presented and discussed on real networks in France and Germany.
Resumo:
In this work calibration models were constructed to determine the content of total lipids and moisture in powdered milk samples. For this, used the near-infrared spectroscopy by diffuse reflectance, combined with multivariate calibration. Initially, the spectral data were submitted to correction of multiplicative light scattering (MSC) and Savitzsky-Golay smoothing. Then, the samples were divided into subgroups by application of hierarchical clustering analysis of the classes (HCA) and Ward Linkage criterion. Thus, it became possible to build regression models by partial least squares (PLS) that allowed the calibration and prediction of the content total lipid and moisture, based on the values obtained by the reference methods of Soxhlet and 105 ° C, respectively . Therefore, conclude that the NIR had a good performance for the quantification of samples of powdered milk, mainly by minimizing the analysis time, not destruction of the samples and not waste. Prediction models for determination of total lipids correlated (R) of 0.9955, RMSEP of 0.8952, therefore the average error between the Soxhlet and NIR was ± 0.70%, while the model prediction to content moisture correlated (R) of 0.9184, RMSEP, 0.3778 and error of ± 0.76%
Resumo:
This work is combined with the potential of the technique of near infrared spectroscopy - NIR and chemometrics order to determine the content of diclofenac tablets, without destruction of the sample, to which was used as the reference method, ultraviolet spectroscopy, which is one of the official methods. In the construction of multivariate calibration models has been studied several types of pre-processing of NIR spectral data, such as scatter correction, first derivative. The regression method used in the construction of calibration models is the PLS (partial least squares) using NIR spectroscopic data of a set of 90 tablets were divided into two sets (calibration and prediction). 54 were used in the calibration samples and the prediction was used 36, since the calibration method used was crossvalidation method (full cross-validation) that eliminates the need for a validation set. The evaluation of the models was done by observing the values of correlation coefficient R 2 and RMSEC mean square error (calibration error) and RMSEP (forecast error). As the forecast values estimated for the remaining 36 samples, which the results were consistent with the values obtained by UV spectroscopy
Resumo:
In this work, the quantitative analysis of glucose, triglycerides and cholesterol (total and HDL) in both rat and human blood plasma was performed without any kind of pretreatment of samples, by using near infrared spectroscopy (NIR) combined with multivariate methods. For this purpose, different techniques and algorithms used to pre-process data, to select variables and to build multivariate regression models were compared between each other, such as partial least squares regression (PLS), non linear regression by artificial neural networks, interval partial least squares regression (iPLS), genetic algorithm (GA), successive projections algorithm (SPA), amongst others. Related to the determinations of rat blood plasma samples, the variables selection algorithms showed satisfactory results both for the correlation coefficients (R²) and for the values of root mean square error of prediction (RMSEP) for the three analytes, especially for triglycerides and cholesterol-HDL. The RMSEP values for glucose, triglycerides and cholesterol-HDL obtained through the best PLS model were 6.08, 16.07 e 2.03 mg dL-1, respectively. In the other case, for the determinations in human blood plasma, the predictions obtained by the PLS models provided unsatisfactory results with non linear tendency and presence of bias. Then, the ANN regression was applied as an alternative to PLS, considering its ability of modeling data from non linear systems. The root mean square error of monitoring (RMSEM) for glucose, triglycerides and total cholesterol, for the best ANN models, were 13.20, 10.31 e 12.35 mg dL-1, respectively. Statistical tests (F and t) suggest that NIR spectroscopy combined with multivariate regression methods (PLS and ANN) are capable to quantify the analytes (glucose, triglycerides and cholesterol) even when they are present in highly complex biological fluids, such as blood plasma
Resumo:
The aim of this study was to evaluate the potential of near-infrared reflectance spectroscopy (NIRS) as a rapid and non-destructive method to determine the soluble solid content (SSC), pH and titratable acidity of intact plums. Samples of plum with a total solids content ranging from 5.7 to 15%, pH from 2.72 to 3.84 and titratable acidity from 0.88 a 3.6% were collected from supermarkets in Natal-Brazil, and NIR spectra were acquired in the 714 2500 nm range. A comparison of several multivariate calibration techniques with respect to several pre-processing data and variable selection algorithms, such as interval Partial Least Squares (iPLS), genetic algorithm (GA), successive projections algorithm (SPA) and ordered predictors selection (OPS), was performed. Validation models for SSC, pH and titratable acidity had a coefficient of correlation (R) of 0.95 0.90 and 0.80, as well as a root mean square error of prediction (RMSEP) of 0.45ºBrix, 0.07 and 0.40%, respectively. From these results, it can be concluded that NIR spectroscopy can be used as a non-destructive alternative for measuring the SSC, pH and titratable acidity in plums
Resumo:
Aiming to consumer s safety the presence of pathogenic contaminants in foods must be monitored because they are responsible for foodborne outbreaks that depending on the level of contamination can ultimately cause the death of those who consume them. In industry is necessary that this identification be fast and profitable. This study shows the utility and application of near-infrared (NIR) transflectance spectroscopy as an alternative method for the identification and classification of Escherichia coli and Salmonella Enteritidis in commercial fruit pulp (pineapple). Principal Component Analysis (PCA), Independent Modeling of Class Analogy (SIMCA) and Discriminant Analysis Partial Least Squares (PLS-DA) were used in the analysis. It was not possible to obtain total separation between samples using PCA and SIMCA. The PLS-DA showed good performance in prediction capacity reaching 87.5% for E. coli and 88.3% for S. Enteritides, respectively. The best models were obtained for the PLS-DA with second derivative spectra treated with a sensitivity and specificity of 0.87 and 0.83, respectively. These results suggest that the NIR spectroscopy and PLS-DA can be used to discriminate and detect bacteria in the fruit pulp
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The code STATFLUX, implementing a new and simple statistical procedure for the calculation of transfer coefficients in radionuclide transport to animals and plants, is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. Flow parameters were estimated by employing two different least-squares procedures: Derivative and Gauss-Marquardt methods, with the available experimental data of radionuclide concentrations as the input functions of time. The solution of the inverse problem, which relates a given set of flow parameter with the time evolution of concentration functions, is achieved via a Monte Carlo Simulation procedure.Program summaryTitle of program: STATFLUXCatalogue identifier: ADYS_v1_0Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADYS_v1_0Program obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions: noneComputer for which the program is designed and others on which it has been tested: Micro-computer with Intel Pentium III, 3.0 GHzInstallation: Laboratory of Linear Accelerator, Department of Experimental Physics, University of São Paulo, BrazilOperating system: Windows 2000 and Windows XPProgramming language used: Fortran-77 as implemented in Microsoft Fortran 4.0. NOTE: Microsoft Fortran includes non-standard features which are used in this program. Standard Fortran compilers such as, g77, f77, ifort and NAG95, are not able to compile the code and therefore it has not been possible for the CPC Program Library to test the program.Memory, required to execute with typical data: 8 Mbytes of RAM memory and 100 MB of Hard disk memoryNo. of bits in a word: 16No. of lines in distributed program, including test data, etc.: 6912No. of bytes in distributed Program, including test data, etc.: 229 541Distribution format: tar.gzNature of the physical problem: the investigation of transport mechanisms for radioactive substances, through environmental pathways, is very important for radiological protection of populations. One such pathway, associated with the food chain, is the grass-animal-man sequence. The distribution of trace elements in humans and laboratory animals has been intensively studied over the past 60 years [R.C. Pendlenton, C.W. Mays, R.D. Lloyd, A.L. Brooks, Differential accumulation of iodine-131 from local fallout in people and milk, Health Phys. 9 (1963) 1253-1262]. In addition, investigations on the incidence of cancer in humans, and a possible causal relationship to radioactive fallout, have been undertaken [E.S. Weiss, M.L. Rallison, W.T. London, W.T. Carlyle Thompson, Thyroid nodularity in southwestern Utah school children exposed to fallout radiation, Amer. J. Public Health 61 (1971) 241-249; M.L. Rallison, B.M. Dobyns, F.R. Keating, J.E. Rall, F.H. Tyler, Thyroid diseases in children, Amer. J. Med. 56 (1974) 457-463; J.L. Lyon, M.R. Klauber, J.W. Gardner, K.S. Udall, Childhood leukemia associated with fallout from nuclear testing, N. Engl. J. Med. 300 (1979) 397-402]. From the pathways of entry of radionuclides in the human (or animal) body, ingestion is the most important because it is closely related to life-long alimentary (or dietary) habits. Those radionuclides which are able to enter the living cells by either metabolic or other processes give rise to localized doses which can be very high. The evaluation of these internally localized doses is of paramount importance for the assessment of radiobiological risks and radiological protection. The time behavior of trace concentration in organs is the principal input for prediction of internal doses after acute or chronic exposure. The General Multiple-Compartment Model (GMCM) is the powerful and more accepted method for biokinetical studies, which allows the calculation of concentration of trace elements in organs as a function of time, when the flow parameters of the model are known. However, few biokinetics data exist in the literature, and the determination of flow and transfer parameters by statistical fitting for each system is an open problem.Restriction on the complexity of the problem: This version of the code works with the constant volume approximation, which is valid for many situations where the biological half-live of a trace is lower than the volume rise time. Another restriction is related to the central flux model. The model considered in the code assumes that exist one central compartment (e.g., blood), that connect the flow with all compartments, and the flow between other compartments is not included.Typical running time: Depends on the choice for calculations. Using the Derivative Method the time is very short (a few minutes) for any number of compartments considered. When the Gauss-Marquardt iterative method is used the calculation time can be approximately 5-6 hours when similar to 15 compartments are considered. (C) 2006 Elsevier B.V. All rights reserved.
Resumo:
Additive and nonadditive genetic effects on preweaning weight gain (PWG) of a commercial crossbred population were estimated using different genetic models and estimation methods. The data set consisted of 103,445 records on purebred and crossbred Nelore-Hereford calves raised under pasture conditions on farms located in south, southeast, and middle west Brazilian regions. In addition to breed additive and dominance effects, the models including different epistasis covariables were tested. Models considering joint additive and environment (latitude) by genetic effects interactions were also applied. In a first step, analyses were carried out under animal models. In a second step, preadjusted records were analyzed using ordinary least squares (OLS) and ridge regression (RR). The results reinforced evidence that breed additive and dominance effects are not sufficient to explain the observed variability in preweaning traits of Bos taurus x Bos indicus calves, and that genotype x environment interaction plays an important role in the evaluation of crossbred calves. Data were ill-conditioned to estimate the effects of genotype x environment interactions. Models including these effects presented multicolinearity problems. In this case, RR seemed to be a powerful tool for obtaining more plausible and stable estimates. Estimated prediction error variances and variance inflation factors were drastically reduced, and many effects that were not significant under ordinary least squares became significant under RR. Predictions of PWG based on RR estimates were more acceptable from a biological perspective. In temperate and subtropical regions, calves with intermediate genetic compositions (close to 1/2 Nelore) exhibited greater predicted PWG. In the tropics, predicted PWG increased linearly as genotype got closer to Nelore. ©2006 American Society of Animal Science. All rights reserved.