917 resultados para Cross-validation


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The majority of distribution utilities do not have accurate information on the constituents of their loads. This information is very useful in managing and planning the network, adequately and economically. Customer loads are normally categorized in three main sectors: 1) residential; 2) industrial; and 3) commercial. In this paper, penalized least-squares regression and Euclidean distance methods are developed for this application to identify and quantify the makeup of a feeder load with unknown sectors/subsectors. This process is done on a monthly basis to account for seasonal and other load changes. The error between the actual and estimated load profiles are used as a benchmark of accuracy. This approach has shown to be accurate in identifying customer types in unknown load profiles, and is used in cross-validation of the results and initial assumptions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. Methodology/Principal Findings A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. Conclusions It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpr​ed_page.php.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Early detection, clinical management and disease recurrence monitoring are critical areas in cancer treatment in which specific biomarker panels are likely to be very important in each of these key areas. We have previously demonstrated that levels of alpha-2-heremans-schmid-glycoprotein (AHSG), complement component C3 (C3), clusterin (CLI), haptoglobin (HP) and serum amyloid A (SAA) are significantly altered in serum from patients with squamous cell carcinoma of the lung. Here, we report the abundance levels for these proteins in serum samples from patients with advanced breast cancer, colorectal cancer (CRC) and lung cancer compared to healthy controls (age and gender matched) using commercially available enzyme-linked immunosorbent assay kits. Logistic regression (LR) models were fitted to the resulting data, and the classification ability of the proteins was evaluated using receiver-operating characteristic curve and leave-one-out cross-validation (LOOCV). The most accurate individual candidate biomarkers were C3 for breast cancer [area under the curve (AUC) = 0.89, LOOCV = 73%], CLI for CRC (AUC = 0.98, LOOCV = 90%), HP for small cell lung carcinoma (AUC = 0.97, LOOCV = 88%), C3 for lung adenocarcinoma (AUC = 0.94, LOOCV = 89%) and HP for squamous cell carcinoma of the lung (AUC = 0.94, LOOCV = 87%). The best dual combination of biomarkers using LR analysis were found to be AHSG + C3 (AUC = 0.91, LOOCV = 83%) for breast cancer, CLI + HP (AUC = 0.98, LOOCV = 92%) for CRC, C3 + SAA (AUC = 0.97, LOOCV = 91%) for small cell lung carcinoma and HP + SAA for both adenocarcinoma (AUC = 0.98, LOOCV = 96%) and squamous cell carcinoma of the lung (AUC = 0.98, LOOCV = 84%). The high AUC values reported here indicated that these candidate biomarkers have the potential to discriminate accurately between control and cancer groups both individually and in combination with other proteins. Copyright © 2011 UICC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVE: This study explored gene expression differences in predicting response to chemoradiotherapy in esophageal cancer. PURPOSE:: A major pathological response to neoadjuvant chemoradiation is observed in about 40% of esophageal cancer patients and is associated with favorable outcomes. However, patients with tumors of similar histology, differentiation, and stage can have vastly different responses to the same neoadjuvant therapy. This dichotomy may be due to differences in the molecular genetic environment of the tumor cells. BACKGROUND DATA: Diagnostic biopsies were obtained from a training cohort of esophageal cancer patients (13), and extracted RNA was hybridized to genome expression microarrays. The resulting gene expression data was verified by qRT-PCR. In a larger, independent validation cohort (27), we examined differential gene expression by qRT-PCR. The ability of differentially-regulated genes to predict response to therapy was assessed in a multivariate leave-one-out cross-validation model. RESULTS: Although 411 genes were differentially expressed between normal and tumor tissue, only 103 genes were altered between responder and non-responder tumor; and 67 genes differentially expressed >2-fold. These included genes previously reported in esophageal cancer and a number of novel genes. In the validation cohort, 8 of 12 selected genes were significantly different between the response groups. In the predictive model, 5 of 8 genes could predict response to therapy with 95% accuracy in a subset (74%) of patients. CONCLUSIONS: This study has identified a gene microarray pattern and a set of genes associated with response to neoadjuvant chemoradiation in esophageal cancer. The potential of these genes as biomarkers of response to treatment warrants further investigation. Copyright © 2009 by Lippincott Williams & Wilkins.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A generalised gamma bidding model is presented, which incorporates many previous models. The log likelihood equations are provided. Using a new method of testing, variants of the model are fitted to some real data for construction contract auctions to find the best fitting models for groupings of bidders. The results are examined for simplifying assumptions, including all those in the main literature. These indicate no one model to be best for all datasets. However, some models do appear to perform significantly better than others and it is suggested that future research would benefit from a closer examination of these.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The study investigated the influence of traffic and land use parameters on metal build-up on urban road surfaces. Mathematical relationships were developed to predict metals originating from fuel combustion and vehicle wear. The analysis undertaken found that nickel and chromium originate from exhaust emissions, lead, copper and zinc from vehicle wear, cadmium from both exhaust and wear and manganese from geogenic sources. Land use does not demonstrate a clear pattern in relation to the metal build-up process, though its inherent characteristics such as traffic activities exert influence. The equation derived for fuel related metal load has high cross-validated coefficient of determination (Q2) and low Standard Error of Cross-Validation (SECV) values indicates that the model is reliable, while the equation derived for wear-related metal load has low Q2 and high SECV values suggesting its use only in preliminary investigations. Relative Prediction Error values for both equations are considered to be well within the error limits for a complex system such as an urban road surface. These equations will be beneficial for developing reliable stormwater treatment strategies in urban areas which specifically focus on mitigation of metal pollution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spatially-explicit modelling of grassland classes is important to site-specific planning for improving grassland and environmental management over large areas. In this study, a climate-based grassland classification model, the Comprehensive and Sequential Classification System (CSCS) was integrated with spatially interpolated climate data to classify grassland in Gansu province, China. The study area is characterized by complex topographic features imposed by plateaus, high mountains, basins and deserts. To improve the quality of the interpolated climate data and the quality of the spatial classification over this complex topography, three linear regression methods, namely an analytic method based on multiple regression and residues (AMMRR), a modification of the AMMRR method through adding the effect of slope and aspect to the interpolation analysis (M-AMMRR) and a method which replaces the IDW approach for residue interpolation in M-AMMRR with an ordinary kriging approach (I-AMMRR), for interpolating climate variables were evaluated. The interpolation outcomes from the best interpolation method were then used in the CSCS model to classify the grassland in the study area. Climate variables interpolated included the annual cumulative temperature and annual total precipitation. The results indicated that the AMMRR and M-AMMRR methods generated acceptable climate surfaces but the best model fit and cross validation result were achieved by the I-AMMRR method. Twenty-six grassland classes were classified for the study area. The four grassland vegetation classes that covered more than half of the total study area were "cool temperate-arid temperate zonal semi-desert", "cool temperate-humid forest steppe and deciduous broad-leaved forest", "temperate-extra-arid temperate zonal desert", and "frigid per-humid rain tundra and alpine meadow". The vegetation classification map generated in this study provides spatial information on the locations and extents of the different grassland classes. This information can be used to facilitate government agencies' decision-making in land-use planning and environmental management, and for vegetation and biodiversity conservation. The information can also be used to assist land managers in the estimation of safe carrying capacities which will help to prevent overgrazing and land degradation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVES: To compare the classification accuracy of previously published RT3 accelerometer cut-points for youth using energy expenditure, measured via portable indirect calorimetry, as a criterion measure. DESIGN: Cross-sectional cross-validation study. METHODS: 100 children (mean age 11.2±2.8 years, 61% male) completed 12 standardized activities trials (3 sedentary, 5 lifestyle and 4 ambulatory) while wearing an RT3 accelerometer. V˙O2 was measured concurrently using the Oxycon Mobile portable calorimeter. Cut-points by Vanhelst (VH), Rowlands (RW), Chu (CH), Kavouras (KV) and the RT3 manufacturer (RT3M) were used to classify PA intensity as sedentary (SED), light (LPA), moderate (MPA) or vigorous (VPA). Classification accuracy was evaluated using the area under the Receiver Operating Characteristic curve (ROC-AUC) and weighted Kappa (κ). RESULTS: For moderate-to-vigorous PA (MVPA), VH, KV and RW exhibited excellent accuracy classification (ROC-AUC≥0.90), while the CH and RT3M exhibited good classification accuracy (ROC-AUC>0.80). Classification accuracy for LPA was fair to poor (ROC-AUC<0.76). For SED, VH exhibited excellent classification accuracy (ROC-AUC>0.90), while RW, CH, and RT3M exhibited good classification accuracy (ROC-AUC>0.80). Kappa statistics ranged from 0.67 (VH) to 0.55 (CH). CONCLUSIONS: All cut-points provided acceptable classification accuracy for SED and MVPA, but limited accuracy for LPA. On the basis of classification accuracy over all four levels of intensity, the use of the VH cut-points is recommended.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The purpose of this study was to derive ActiGraph cut-points for sedentary (SED), light-intensity physical activity (LPA), and moderate-to-vigorous physical activity (MVPA) in toddlers and evaluate their validity in an independent sample. The predictive validity of established preschool cut-points were also evaluated and compared. Twenty-two toddlers (mean age = 2.1 years ± 0.4 years) wore an ActiGraph accelerometer during a videotaped 20-min play period. Videos were subsequently coded for physical activity (PA) intensity using the modified Children's Activity Rating Scale (CARS). Receiver operating characteristic (ROC) curve analyses were conducted to determine cut-points. Predictive validity was assessed in an independent sample of 18 toddlers (mean age = 2.3 ± 0.4 years). From the ROC curve analyses, the 15-s count ranges corresponding to SED, LPA, and MVPA were 0–48, 49–418, and >418 counts/15 s, respectively. Classification accuracy was fair for the SED threshold (ROC-AUC = 0.74, 95% confidence interval = 0.71–0.76) and excellent for MVPA threshold (ROC-AUC = 0.90, 95% confidence interval = 0.88–0.92). In the cross-validation sample, the toddler cut-point and established preschool cut-points significantly overestimated time spent in SED and underestimated time in spent in LPA. For MVPA, mean differences between observed and predicted values for the toddler and Pate cut-points were not significantly different from zero. In summary, the ActiGraph accelerometer can provide useful group-level estimates of MVPA in toddlers. The results support the use of the Pate cut-point of 420 counts/15 s for MVPA.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The unique physical and movement characteristics of children necessitate the development of accelerometer equations and cut points that are population specific. The purpose of this study is to develop an ecologically valid cut point for the Biotrainer Pro monitor that reflects a threshold for moderate-intensity physical activity in elementary school children. A sample of 30 children (ages 8-12) wore a Biotrainer monitor while completing a series of 7 movement tasks (calibration phase) and while participating in an organized group activity (cross-validation phase). Videotapes from each session were processed using a computerized direct-observation technique to provide a criterion measure of physical activity. Analyses involved the use of mixed-model regression and receiver operator characteristic (ROC) curves. The results indicated that a cut point of 4 counts/min provides the optimal balance between the related needs for sensitivity (accurately detecting activity) and specificity (limiting misclassification of activity as inactivity). Results with the cross-validation data demonstrated that this value yielded the best overall kappa (.58) and a high classification agreement (84%) for activity determination. The specificity of 93% demonstrates that the proposed cut point can accurately detect activity; however, the lower sensitivity value of 61% suggests that some minutes of activity might be incorrectly classified as inactivity. The cut point of 4 counts/min provides an ecologically valid cut point to capture physical activity in children using the Biotrainer Pro activity monitor.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The concentrations of Na, K, Ca, Mg, Ba, Sr, Fe, Al, Mn, Zn, Pb, Cu, Ni, Cr, Co, Se, U and Ti were determined in the osteoderms and/or flesh of estuarine crocodiles (Crocodylus porosus) captured in three adjacent catchments within the Alligator Rivers Region (ARR) of northern Australia. Results from multivariate analysis of variance showed that when all metals were considered simultaneously, catchment effects were significant (P≤0.05). Despite considerable within-catchment variability, linear discriminant analysis (LDA) showed that differences in elemental signatures in the osteoderms and/or flesh of C. porosus amongst the catchments were sufficient to classify individuals accurately to their catchment of occurrence. Using cross-validation, the accuracy of classifying a crocodile to its catchment of occurrence was 76% for osteoderms and 60% for flesh. These data suggest that osteoderms provide better predictive accuracy than flesh for discriminating crocodiles amongst catchments. There was no advantage in combining the osteoderm and flesh results to increase the accuracy of classification (i.e. 67%). Based on the discriminant function coefficients for the osteoderm data, Ca, Co, Mg and U were the most important elements for discriminating amongst the three catchments. For flesh data, Ca, K, Mg, Na, Ni and Pb were the most important metals for discriminating amongst the catchments. Reasons for differences in the elemental signatures of crocodiles between catchments are generally not interpretable, due to limited data on surface water and sediment chemistry of the catchments or chemical composition of dietary items of C. porosus. From a wildlife management perspective, the provenance or source catchment(s) of 'problem' crocodiles captured at settlements or recreational areas along the ARR coastline may be established using catchment-specific elemental signatures. If the incidence of problem crocodiles can be reduced in settled or recreational areas by effective management at their source, then public safety concerns about these predators may be moderated, as well as the cost of their capture and removal. Copyright © 2002 Elsevier Science B.V.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Brain decoding of functional Magnetic Resonance Imaging data is a pattern analysis task that links brain activity patterns to the experimental conditions. Classifiers predict the neural states from the spatial and temporal pattern of brain activity extracted from multiple voxels in the functional images in a certain period of time. The prediction results offer insight into the nature of neural representations and cognitive mechanisms and the classification accuracy determines our confidence in understanding the relationship between brain activity and stimuli. In this paper, we compared the efficacy of three machine learning algorithms: neural network, support vector machines, and conditional random field to decode the visual stimuli or neural cognitive states from functional Magnetic Resonance data. Leave-one-out cross validation was performed to quantify the generalization accuracy of each algorithm on unseen data. The results indicated support vector machine and conditional random field have comparable performance and the potential of the latter is worthy of further investigation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thin plate spline finite element methods are used to fit a surface to an irregularly scattered dataset [S. Roberts, M. Hegland, and I. Altas. Approximation of a Thin Plate Spline Smoother using Continuous Piecewise Polynomial Functions. SIAM, 1:208--234, 2003]. The computational bottleneck for this algorithm is the solution of large, ill-conditioned systems of linear equations at each step of a generalised cross validation algorithm. Preconditioning techniques are investigated to accelerate the convergence of the solution of these systems using Krylov subspace methods. The preconditioners under consideration are block diagonal, block triangular and constraint preconditioners [M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numer., 14:1--137, 2005]. The effectiveness of each of these preconditioners is examined on a sample dataset taken from a known surface. From our numerical investigation, constraint preconditioners appear to provide improved convergence for this surface fitting problem compared to block preconditioners.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Existing crowd counting algorithms rely on holistic, local or histogram based features to capture crowd properties. Regression is then employed to estimate the crowd size. Insufficient testing across multiple datasets has made it difficult to compare and contrast different methodologies. This paper presents an evaluation across multiple datasets to compare holistic, local and histogram based methods, and to compare various image features and regression models. A K-fold cross validation protocol is followed to evaluate the performance across five public datasets: UCSD, PETS 2009, Fudan, Mall and Grand Central datasets. Image features are categorised into five types: size, shape, edges, keypoints and textures. The regression models evaluated are: Gaussian process regression (GPR), linear regression, K nearest neighbours (KNN) and neural networks (NN). The results demonstrate that local features outperform equivalent holistic and histogram based features; optimal performance is observed using all image features except for textures; and that GPR outperforms linear, KNN and NN regression

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Research problem: Overfitting and collinearity problems commonly exist in current construction cost estimation applications and obstruct researchers and practitioners in achieving better modelling results. Research objective and method: A hybrid approach of Akaike information criterion (AIC) stepwise regression and principal component regression (PCR) is proposed to help solve overfitting and collinearity problems. Utilization of this approach in linear regression is validated by comparing it with other commonly used approaches. The mean square error obtained by leave-one-out cross validation (MSELOOCV) is used in model selection in deciding predictive variables.