874 resultados para cross validation


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Volatile chemical compounds responsible for the aroma of wine are derived from a number of different biochemical and chemical pathways. These chemical compounds are formed during grape berry metabolism, crushing of the berries, fermentation processes (i.e. yeast and malolactic bacteria) and also from the ageing and storage of wine. Not surprisingly, there are a large number of chemical classes of compounds found in wine which are present at varying concentrations (ng L-1 to mg L-1), exhibit differing potencies, and have a broad range of volatilities and boiling points. The aim of this work was to investigate the potential use of near infrared (NIR) spectroscopy combined with chemometrics as a rapid and low-cost technique to measure volatile compounds in Riesling wines. Samples of commercial Riesling wine were analyzed using an NIR instrument and volatile compounds by gas chromatography (GC) coupled with selected ion monitoring mass spectrometry. Correlation between the NIR and GC data were developed using partial least-squares (PLS) regression with full cross validation (leave one out). Coefficients of determination in cross validation (R 2) and the standard error in cross validation (SECV) were 0.74 (SECV: 313.6 μg L−1) for esters, 0.90 (SECV: 20.9 μg L−1) for monoterpenes and 0.80 (SECV: 1658 ?g L-1) for short-chain fatty acids. This study has shown that volatile chemical compounds present in wine can be measured by NIR spectroscopy. Further development with larger data sets will be required to test the predictive ability of the NIR calibration models developed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To facilitate marketing and export, the Australian macadamia industry requires accurate crop forecasts. Each year, two levels of crop predictions are produced for this industry. The first is an overall longer-term forecast based on tree census data of growers in the Australian Macadamia Society (AMS). This data set currently accounts for around 70% of total production, and is supplemented by our best estimates of non-AMS orchards. Given these total tree numbers, average yields per tree are needed to complete the long-term forecasts. Yields from regional variety trials were initially used, but were found to be consistently higher than the average yields that growers were obtaining. Hence, a statistical model was developed using growers' historical yields, also taken from the AMS database. This model accounted for the effects of tree age, variety, year, region and tree spacing, and explained 65% of the total variation in the yield per tree data. The second level of crop prediction is an annual climate adjustment of these overall long-term estimates, taking into account the expected effects on production of the previous year's climate. This adjustment is based on relative historical yields, measured as the percentage deviance between expected and actual production. The dominant climatic variables are observed temperature, evaporation, solar radiation and modelled water stress. Initially, a number of alternate statistical models showed good agreement within the historical data, with jack-knife cross-validation R2 values of 96% or better. However, forecasts varied quite widely between these alternate models. Exploratory multivariate analyses and nearest-neighbour methods were used to investigate these differences. For 2001-2003, the overall forecasts were in the right direction (when compared with the long-term expected values), but were over-estimates. In 2004 the forecast was well under the observed production, and in 2005 the revised models produced a forecast within 5.1% of the actual production. Over the first five years of forecasting, the absolute deviance for the climate-adjustment models averaged 10.1%, just outside the targeted objective of 10%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Grass (monocots) and non-grass (dicots) proportions in ruminant diets are important nutritionally because the non-grasses are usually higher in nutritive value, particularly protein, than the grasses, especially in tropical pastures. For ruminants grazing tropical pastures where the grasses are C-4 species and most non-grasses are C-3 species, the ratio of C-13/C-12 in diet and faeces, measured as delta C-13 parts per thousand, is proportional to dietary non-grass%. This paper describes the development of a faecal near infrared (NIR) spectroscopy calibration equation for predicting faecal delta C-13 from which dietary grass and non-grass proportions can be calculated. Calibration development used cattle faeces derived from diets containing only C-3 non-grass and C-4 grass components, and a series of expansion and validation steps was employed to develop robustness and predictive reliability. The final calibration equation contained 1637 samples and faecal delta C-13 range (parts per thousand) of [12.27]-[27.65]. Calibration statistics were: standard error of calibration (SEC) of 0.78, standard error of cross-validation (SECV) of 0.80, standard deviation (SD) of reference values of 3.11 and R-2 of 0.94. Validation statistics for the final calibration equation applied to 60 samples were: standard error of prediction (SEP) of 0.87, bias of -0.15, R-2 of 0.92 and RPD of 3.16. The calibration equation was also tested on faeces from diets containing C-4 non-grass species or temperate C-3 grass species. Faecal delta C-13 predictions indicated that the spectral basis of the calibration was not related to C-13/C-12 ratios per se but to consistent differences between grasses and non-grasses in chemical composition and that the differences were modified by photosynthetic pathway. Thus, although the calibration equation could not be used to make valid faecal delta C-13 predictions when the diet contained either C-3 grass or C-4 non-grass, it could be used to make useful estimates of dietary non-grass proportions. It could also be ut :sed to make useful estimates of non-grass in mixed C-3 grass/non-grass diets by applying a modified formula to calculate non-grass from predicted faecal delta C-13. The development of a robust faecal-NIR calibration equation for estimating non-grass proportions in the diets of grazing cattle demonstrated a novel and useful application of NIR spectroscopy in agriculture.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acidity in terms of pH and titratable acids influences the texture and flavour of fermented dairy products, such as Kefir. However, the methods for determining pH and titratable acidity (TA) are time consuming. Near infrared (NIR) spectroscopy is a non-destructive method, which simultaneously predicts multiple traits from a single scan and can be used to predict pH and TA. The best pH NIR calibration model was obtained with no spectral pre-treatment applied, whereas smoothing was found to be the best pre-treatment to develop the TA calibration model. Using cross-validation, the prediction results were found acceptable for both pH and TA. With external validation, similar results were found for pH and TA, and both models were found to be acceptable for screening purposes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hydrogen cyanide (HCN) is a toxic chemical that can potentially cause mild to severe reactions in animals when grazing forage sorghum. Developing technologies to monitor the level of HCN in the growing crop would benefit graziers, so that they can move cattle into paddocks with acceptable levels of HCN. In this study, we developed near-infrared spectroscopy (MRS) calibrations to estimate HCN in forage sorghum and hay. The full spectral NIRS range (400-2498 nm) was used as well as specific spectral ranges within the full spectral range, i.e., visible (400-750 nm), shortwave (800-1100 nm) and near-infrared (NIR) (1100-2498 nm). Using the full spectrum approach and partial least-squares (PLS), the calibration produced a coefficient of determination (R-2) = 0.838 and standard error of cross-validation (SECV) = 0.040%, while the validation set had a R-2 = 0.824 with a low standard error of prediction (SEP = 0.047%). When using a multiple linear regression (MLR) approach, the best model (NIR spectra) produced a R-2 = 0.847 and standard error of calibration (SEC) = 0.050% and a R-2 = 0.829 and SEP = 0.057% for the validation set. The MLR models built from these spectral regions all used nine wavelengths. Two specific wavelengths 2034 and 2458 nm were of interest, with the former associated with C=O carbonyl stretch and the latter associated with C-N-C stretching. The most accurate PLS and MLR models produced a ratio of standard error of prediction to standard deviation of 3.4 and 3.0, respectively, suggesting that the calibrations could be used for screening breeding material. The results indicated that it should be feasible to develop calibrations using PLS or MLR models for a number of users, including breeding programs to screen for genotypes with low HCN, as well as graziers to monitor crop status to help with grazing efficiency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Context. Irregular plagues of house mice cause high production losses in grain crops in Australia. If plagues can be forecast through broad-scale monitoring or model-based prediction, then mice can be proactively controlled by poison baiting. Aims. To predict mouse plagues in grain crops in Queensland and assess the value of broad-scale monitoring. Methods. Regular trapping of mice at the same sites on the Darling Downs in southern Queensland has been undertaken since 1974. This provides an index of abundance over time that can be related to rainfall, crop yield, winter temperature and past mouse abundance. Other sites have been trapped over a shorter time period elsewhere on the Darling Downs and in central Queensland, allowing a comparison of mouse population dynamics and cross-validation of models predicting mouse abundance. Key results. On the regularly trapped 32-km transect on the Darling Downs, damaging mouse densities occur in 50% of years and a plague in 25% of years, with no detectable increase in mean monthly mouse abundance over the past 35 years. High mouse abundance on this transect is not consistently matched by high abundance in the broader area. Annual maximum mouse abundance in autumn–winter can be predicted (R2 = 57%) from spring mouse abundance and autumn–winter rainfall in the previous year. In central Queensland, mouse dynamics contrast with those on the Darling Downs and lack the distinct annual cycle, with peak abundance occurring in any month outside early spring.Onaverage, damaging mouse densities occur in 1 in 3 years and a plague occurs in 1 in 7 years. The dynamics of mouse populations on two transects ~70 km apart were rarely synchronous. Autumn–winter rainfall can indicate mouse abundance in some seasons (R2 = ~52%). Conclusion. Early warning of mouse plague formation in Queensland grain crops from regional models should trigger farm-based monitoring. This can be incorporated with rainfall into a simple model predicting future abundance that will determine any need for mouse control. Implications. A model-based warning of a possible mouse plague can highlight the need for local monitoring of mouse activity, which in turn could trigger poison baiting to prevent further mouse build-up.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

PURPOSE To develop and test decision tree (DT) models to classify physical activity (PA) intensity from accelerometer output and Gross Motor Function Classification System (GMFCS) classification level in ambulatory youth with cerebral palsy (CP); and 2) compare the classification accuracy of the new DT models to that achieved by previously published cut-points for youth with CP. METHODS Youth with CP (GMFCS Levels I - III) (N=51) completed seven activity trials with increasing PA intensity while wearing a portable metabolic system and ActiGraph GT3X accelerometers. DT models were used to identify vertical axis (VA) and vector magnitude (VM) count thresholds corresponding to sedentary (SED) (<1.5 METs), light PA (LPA) (>/=1.5 and <3 METs) and moderate-to-vigorous PA (MVPA) (>/=3 METs). Models were trained and cross-validated using the 'rpart' and 'caret' packages within R. RESULTS For the VA (VA_DT) and VM decision trees (VM_DT), a single threshold differentiated LPA from SED, while the threshold for differentiating MVPA from LPA decreased as the level of impairment increased. The average cross-validation accuracy for the VC_DT was 81.1%, 76.7%, and 82.9% for GMFCS levels I, II, and III, respectively. The corresponding cross-validation accuracy for the VM_DT was 80.5%, 75.6%, and 84.2%, respectively. Within each GMFCS level, the decision tree models achieved better PA intensity recognition than previously published cut-points. The accuracy differential was greatest among GMFCS level III participants, in whom the previously published cut-points misclassified 40% of the MVPA activity trials. CONCLUSION GMFCS-specific cut-points provide more accurate assessments of MVPA levels in youth with CP across the full spectrum of ambulatory ability.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Four species of large mackerels (Scomberomorus spp.) co-occur in the waters off northern Australia and are important to fisheries in the region. State fisheries agencies monitor these species for fisheries assessment; however, data inaccuracies may exist due to difficulties with identification of these closely related species, particularly when specimens are incomplete from fish processing. This study examined the efficacy of using otolith morphometrics to differentiate and predict among the four mackerel species off northeastern Australia. Seven otolith measurements and five shape indices were recorded from 555 mackerel specimens. Multivariate modelling including linear discriminant analysis (LDA) and support vector machines, successfully differentiated among the four species based on otolith morphometrics. Cross validation determined a predictive accuracy of at least 96% for both models. An optimum predictive model for the four mackerel species was an LDA model that included fork length, feret length, feret width, perimeter, area, roundness, form factor and rectangularity as explanatory variables. This analysis may improve the accuracy of fisheries monitoring, the estimates based on this monitoring (i.e. mortality rate) and the overall management of mackerel species in Australia.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND Polygenic risk scores comprising established susceptibility variants have shown to be informative classifiers for several complex diseases including prostate cancer. For prostate cancer it is unknown if inclusion of genetic markers that have so far not been associated with prostate cancer risk at a genome-wide significant level will improve disease prediction. METHODS We built polygenic risk scores in a large training set comprising over 25,000 individuals. Initially 65 established prostate cancer susceptibility variants were selected. After LD pruning additional variants were prioritized based on their association with prostate cancer. Six-fold cross validation was performed to assess genetic risk scores and optimize the number of additional variants to be included. The final model was evaluated in an independent study population including 1,370 cases and 1,239 controls. RESULTS The polygenic risk score with 65 established susceptibility variants provided an area under the curve (AUC) of 0.67. Adding an additional 68 novel variants significantly increased the AUC to 0.68 (P = 0.0012) and the net reclassification index with 0.21 (P = 8.5E-08). All novel variants were located in genomic regions established as associated with prostate cancer risk. CONCLUSIONS Inclusion of additional genetic variants from established prostate cancer susceptibility regions improves disease prediction. Prostate 75:1467–1474, 2015. © 2015 Wiley Periodicals, Inc.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An estimate of the groundwater budget at the catchment scale is extremely important for the sustainable management of available water resources. Water resources are generally subjected to over-exploitation for agricultural and domestic purposes in agrarian economies like India. The double water-table fluctuation method is a reliable method for calculating the water budget in semi-arid crystalline rock areas. Extensive measurements of water levels from a dense network before and after the monsoon rainfall were made in a 53 km(2)atershed in southern India and various components of the water balance were then calculated. Later, water level data underwent geostatistical analyses to determine the priority and/or redundancy of each measurement point using a cross-validation method. An optimal network evolved from these analyses. The network was then used in re-calculation of the water-balance components. It was established that such an optimized network provides far fewer measurement points without considerably changing the conclusions regarding groundwater budget. This exercise is helpful in reducing the time and expenditure involved in exhaustive piezometric surveys and also in determining the water budget for large watersheds (watersheds greater than 50 km(2)).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The relationship between site characteristics and understorey vegetation composition was analysed with quantitative methods, especially from the viewpoint of site quality estimation. Theoretical models were applied to an empirical data set collected from the upland forests of southern Finland comprising 104 sites dominated by Scots pine (Pinus sylvestris L.), and 165 sites dominated by Norway spruce (Picea abies (L.) Karsten). Site index H100 was used as an independent measure of site quality. A new model for the estimation of site quality at sites with a known understorey vegetation composition was introduced. It is based on the application of Bayes' theorem to the density function of site quality within the study area combined with the species-specific presence-absence response curves. The resulting posterior probability density function may be used for calculating an estimate for the site variable. Using this method, a jackknife estimate of site index H100 was calculated separately for pine- and spruce-dominated sites. The results indicated that the cross-validation root mean squared error (RMSEcv) of the estimates improved from 2.98 m down to 2.34 m relative to the "null" model (standard deviation of the sample distribution) in pine-dominated forests. In spruce-dominated forests RMSEcv decreased from 3.94 m down to 3.16 m. In order to assess these results, four other estimation methods based on understorey vegetation composition were applied to the same data set. The results showed that none of the methods was clearly superior to the others. In pine-dominated forests, RMSEcv varied between 2.34 and 2.47 m, and the corresponding range for spruce-dominated forests was from 3.13 to 3.57 m.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, reduced level of rock at Bangalore, India is arrived from the 652 boreholes data in the area covering 220 sq.km. In the context of prediction of reduced level of rock in the subsurface of Bangalore and to study the spatial variability of the rock depth, ordinary kriging and Support Vector Machine (SVM) models have been developed. In ordinary kriging, the knowledge of the semivariogram of the reduced level of rock from 652 points in Bangalore is used to predict the reduced level of rock at any point in the subsurface of Bangalore, where field measurements are not available. A cross validation (Q1 and Q2) analysis is also done for the developed ordinary kriging model. The SVM is a novel type of learning machine based on statistical learning theory, uses regression technique by introducing e-insensitive loss function has been used to predict the reduced level of rock from a large set of data. A comparison between ordinary kriging and SVM model demonstrates that the SVM is superior to ordinary kriging in predicting rock depth.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The inverse problem in the diffuse optical tomography is known to be nonlinear, ill-posed, and sometimes under-determined, requiring regularization to obtain meaningful results, with Tikhonov-type regularization being the most popular one. The choice of this regularization parameter dictates the reconstructed optical image quality and is typically chosen empirically or based on prior experience. An automated method for optimal selection of regularization parameter that is based on regularized minimal residual method (MRM) is proposed and is compared with the traditional generalized cross-validation method. The results obtained using numerical and gelatin phantom data indicate that the MRM-based method is capable of providing the optimal regularization parameter. (C) 2012 Society of Photo-Optical Instrumentation Engineers (SPIE). DOI: 10.1117/1.JBO.17.10.106015]

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Purpose: Developing a computationally efficient automated method for the optimal choice of regularization parameter in diffuse optical tomography. Methods: The least-squares QR (LSQR)-type method that uses Lanczos bidiagonalization is known to be computationally efficient in performing the reconstruction procedure in diffuse optical tomography. The same is effectively deployed via an optimization procedure that uses the simplex method to find the optimal regularization parameter. The proposed LSQR-type method is compared with the traditional methods such as L-curve, generalized cross-validation (GCV), and recently proposed minimal residual method (MRM)-based choice of regularization parameter using numerical and experimental phantom data. Results: The results indicate that the proposed LSQR-type and MRM-based methods performance in terms of reconstructed image quality is similar and superior compared to L-curve and GCV-based methods. The proposed method computational complexity is at least five times lower compared to MRM-based method, making it an optimal technique. Conclusions: The LSQR-type method was able to overcome the inherent limitation of computationally expensive nature of MRM-based automated way finding the optimal regularization parameter in diffuse optical tomographic imaging, making this method more suitable to be deployed in real-time. (C) 2013 American Association of Physicists in Medicine. http://dx.doi.org/10.1118/1.4792459]