929 resultados para PREDICTIVE PERFORMANCE


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

RESUMEN El apoyo a la selección de especies a la restauración de la vegetación en España en los últimos 40 años se ha basado fundamentalmente en modelos de distribución de especies, también llamados modelos de nicho ecológico, que estiman la probabilidad de presencia de las especies en función de las condiciones del medio físico (clima, suelo, etc.). Con esta tesis se ha intentado contribuir a la mejora de la capacidad predictiva de los modelos introduciendo algunas propuestas metodológicas adaptadas a los datos disponibles actualmente en España y enfocadas al uso de los modelos en la selección de especies. No siempre se dispone de datos a una resolución espacial adecuada para la escala de los proyectos de restauración de la vegetación. Sin embrago es habitual contar con datos de baja resolución espacial para casi todas las especies vegetales presentes en España. Se propone un método de recalibración que actualiza un modelo de regresión logística de baja resolución espacial con una nueva muestra de alta resolución espacial. El método permite obtener predicciones de calidad aceptable con muestras relativamente pequeñas (25 presencias de la especie) frente a las muestras mucho mayores (más de 100 presencias) que requería una estrategia de modelización convencional que no usara el modelo previo. La selección del método estadístico puede influir decisivamente en la capacidad predictiva de los modelos y por esa razón la comparación de métodos ha recibido mucha atención en la última década. Los estudios previos consideraban a la regresión logística como un método inferior a técnicas más modernas como las de máxima entropía. Los resultados de la tesis demuestran que esa diferencia observada se debe a que los modelos de máxima entropía incluyen técnicas de regularización y la versión de la regresión logística usada en las comparaciones no. Una vez incorporada la regularización a la regresión logística usando penalización, las diferencias en cuanto a capacidad predictiva desaparecen. La regresión logística penalizada es, por tanto, una alternativa más para el ajuste de modelos de distribución de especies y está a la altura de los métodos modernos con mejor capacidad predictiva como los de máxima entropía. A menudo, los modelos de distribución de especies no incluyen variables relativas al suelo debido a que no es habitual que se disponga de mediciones directas de sus propiedades físicas o químicas. La incorporación de datos de baja resolución espacial proveniente de mapas de suelo nacionales o continentales podría ser una alternativa. Los resultados de esta tesis sugieren que los modelos de distribución de especies de alta resolución espacial mejoran de forma ligera pero estadísticamente significativa su capacidad predictiva cuando se incorporan variables relativas al suelo procedente de mapas de baja resolución espacial. La validación es una de las etapas fundamentales del desarrollo de cualquier modelo empírico como los modelos de distribución de especies. Lo habitual es validar los modelos evaluando su capacidad predictiva especie a especie, es decir, comparando en un conjunto de localidades la presencia o ausencia observada de la especie con las predicciones del modelo. Este tipo de evaluación no responde a una cuestión clave en la restauración de la vegetación ¿cuales son las n especies más idóneas para el lugar a restaurar? Se ha propuesto un método de evaluación de modelos adaptado a esta cuestión que consiste en estimar la capacidad de un conjunto de modelos para discriminar entre las especies presentes y ausentes de un lugar concreto. El método se ha aplicado con éxito a la validación de 188 modelos de distribución de especies leñosas orientados a la selección de especies para la restauración de la vegetación en España. Las mejoras metodológicas propuestas permiten mejorar la capacidad predictiva de los modelos de distribución de especies aplicados a la selección de especies en la restauración de la vegetación y también permiten ampliar el número de especies para las que se puede contar con un modelo que apoye la toma de decisiones. SUMMARY During the last 40 years, decision support tools for plant species selection in ecological restoration in Spain have been based on species distribution models (also called ecological niche models), that estimate the probability of occurrence of the species as a function of environmental predictors (e.g., climate, soil). In this Thesis some methodological improvements are proposed to contribute to a better predictive performance of such models, given the current data available in Spain and focusing in the application of the models to selection of species for ecological restoration. Fine grained species distribution data are required to train models to be used at the scale of the ecological restoration projects, but this kind of data are not always available for every species. On the other hand, coarse grained data are available for almost every species in Spain. A recalibration method is proposed that updates a coarse grained logistic regression model using a new fine grained updating sample. The method allows obtaining acceptable predictive performance with reasonably small updating sample (25 occurrences of the species), in contrast with the much larger samples (more than 100 occurrences) required for a conventional modeling approach that discards the coarse grained data. The choice of the statistical method may have a dramatic effect on model performance, therefore comparisons of methods have received much interest in the last decade. Previous studies have shown a poorer performance of the logistic regression compared to novel methods like maximum entropy models. The results of this Thesis show that the observed difference is caused by the fact that maximum entropy models include regularization techniques and the versions of logistic regression compared do not. Once regularization has been added to the logistic regression using a penalization procedure, the differences in model performance disappear. Therefore, penalized logistic regression may be considered one of the best performing methods to model species distributions. Usually, species distribution models do not consider soil related predictors because direct measurements of the chemical or physical properties are often lacking. The inclusion of coarse grained soil data from national or continental soil maps could be a reasonable alternative. The results of this Thesis suggest that the performance of the models slightly increase after including soil predictors form coarse grained soil maps. Model validation is a key stage of the development of empirical models, such as species distribution models. The usual way of validating is based on the evaluation of model performance for each species separately, i.e., comparing observed species presences or absence to predicted probabilities in a set of sites. This kind of evaluation is not informative for a common question in ecological restoration projects: which n species are the most suitable for the environment of the site to be restored? A method has been proposed to address this question that estimates the ability of a set of models to discriminate among present and absent species in a evaluation site. The method has been successfully applied to the validation of 188 species distribution models used to support decisions on species selection for ecological restoration in Spain. The proposed methodological approaches improve the predictive performance of the predictive models applied to species selection in ecological restoration and increase the number of species for which a model that supports decisions can be fitted.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective The main purpose of this research is the novel use of artificial metaplasticity on multilayer perceptron (AMMLP) as a data mining tool for prediction the outcome of patients with acquired brain injury (ABI) after cognitive rehabilitation. The final goal aims at increasing knowledge in the field of rehabilitation theory based on cognitive affectation. Methods and materials The data set used in this study contains records belonging to 123 ABI patients with moderate to severe cognitive affectation (according to Glasgow Coma Scale) that underwent rehabilitation at Institut Guttmann Neurorehabilitation Hospital (IG) using the tele-rehabilitation platform PREVIRNEC©. The variables included in the analysis comprise the neuropsychological initial evaluation of the patient (cognitive affectation profile), the results of the rehabilitation tasks performed by the patient in PREVIRNEC© and the outcome of the patient after a 3–5 months treatment. To achieve the treatment outcome prediction, we apply and compare three different data mining techniques: the AMMLP model, a backpropagation neural network (BPNN) and a C4.5 decision tree. Results The prediction performance of the models was measured by ten-fold cross validation and several architectures were tested. The results obtained by the AMMLP model are clearly superior, with an average predictive performance of 91.56%. BPNN and C4.5 models have a prediction average accuracy of 80.18% and 89.91% respectively. The best single AMMLP model provided a specificity of 92.38%, a sensitivity of 91.76% and a prediction accuracy of 92.07%. Conclusions The proposed prediction model presented in this study allows to increase the knowledge about the contributing factors of an ABI patient recovery and to estimate treatment efficacy in individual patients. The ability to predict treatment outcomes may provide new insights toward improving effectiveness and creating personalized therapeutic interventions based on clinical evidence.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Case-based reasoning (CBR) is a unique tool for the evaluation of possible failure of firms (EOPFOF) for its eases of interpretation and implementation. Ensemble computing, a variation of group decision in society, provides a potential means of improving predictive performance of CBR-based EOPFOF. This research aims to integrate bagging and proportion case-basing with CBR to generate a method of proportion bagging CBR for EOPFOF. Diverse multiple case bases are first produced by multiple case-basing, in which a volume parameter is introduced to control the size of each case base. Then, the classic case retrieval algorithm is implemented to generate diverse member CBR predictors. Majority voting, the most frequently used mechanism in ensemble computing, is finally used to aggregate outputs of member CBR predictors in order to produce final prediction of the CBR ensemble. In an empirical experiment, we statistically validated the results of the CBR ensemble from multiple case bases by comparing them with those of multivariate discriminant analysis, logistic regression, classic CBR, the best member CBR predictor and bagging CBR ensemble. The results from Chinese EOPFOF prior to 3 years indicate that the new CBR ensemble, which significantly improved CBRs predictive ability, outperformed all the comparative methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The impact of the Parkinson's disease and its treatment on the patients' health-related quality of life can be estimated either by means of generic measures such as the european quality of Life-5 Dimensions (EQ-5D) or specific measures such as the 8-item Parkinson's disease questionnaire (PDQ-8). In clinical studies, PDQ-8 could be used in detriment of EQ-5D due to the lack of resources, time or clinical interest in generic measures. Nevertheless, PDQ-8 cannot be applied in cost-effectiveness analyses which require generic measures and quantitative utility scores, such as EQ-5D. To deal with this problem, a commonly used solution is the prediction of EQ-5D from PDQ-8. In this paper, we propose a new probabilistic method to predict EQ-5D from PDQ-8 using multi-dimensional Bayesian network classifiers. Our approach is evaluated using five-fold cross-validation experiments carried out on a Parkinson's data set containing 488 patients, and is compared with two additional Bayesian network-based approaches, two commonly used mapping methods namely, ordinary least squares and censored least absolute deviations, and a deterministic model. Experimental results are promising in terms of predictive performance as well as the identification of dependence relationships among EQ-5D and PDQ-8 items that the mapping approaches are unable to detect

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance – at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest-performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Patient outcomes in transplantation would improve if dosing of immunosuppressive agents was individualized. The aim of this study is to develop a population pharmacokinetic model of tacrolimus in adult liver transplant recipients and test this model in individualizing therapy. Population analysis was performed on data from 68 patients. Estimates were sought for apparent clearance (CL/F) and apparent volume of distribution (V/F) using the nonlinear mixed effects model program (NONMEM). Factors screened for influence on these parameters were weight, age, sex, transplant type, biliary reconstructive procedure, postoperative day, days of therapy, liver function test results, creatinine clearance, hematocrit, corticosteroid dose, and interacting drugs. The predictive performance of the developed model was evaluated through Bayesian forecasting in an independent cohort of 36 patients. No linear correlation existed between tacrolimus dosage and trough concentration (r(2) = 0.005). Mean individual Bayesian estimates for CL/F and V/F were 26.5 8.2 (SD) L/hr and 399 +/- 185 L, respectively. CL/F was greater in patients with normal liver function. V/F increased with patient weight. CL/F decreased with increasing hematocrit. Based on the derived model, a 70-kg patient with an aspartate aminotransferase (AST) level less than 70 U/L would require a tacrolimus dose of 4.7 mg twice daily to achieve a steady-state trough concentration of 10 ng/mL. A 50-kg patient with an AST level greater than 70 U/L would require a dose of 2.6 mg. Marked interindividual variability (43% to 93%) and residual random error (3.3 ng/mL) were observed. Predictions made using the final model were reasonably nonbiased (0.56 ng/mL), but imprecise (4.8 ng/mL). Pharmacokinetic information obtained will assist in tacrolimus dosing; however, further investigation into reasons for the pharmacokinetic variability of tacrolimus is required.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this study was to determine the most informative sampling time(s) providing a precise prediction of tacrolimus area under the concentration-time curve (AUC). Fifty-four concentration-time profiles of tacrolimus from 31 adult liver transplant recipients were analyzed. Each profile contained 5 tacrolimus whole-blood concentrations (predose and 1, 2, 4, and 6 or 8 hours postdose), measured using liquid chromatography-tandem mass spectrometry. The concentration at 6 hours was interpolated for each profile, and 54 values of AUC(0-6) were calculated using the trapezoidal rule. The best sampling times were then determined using limited sampling strategies and sensitivity analysis. Linear mixed-effects modeling was performed to estimate regression coefficients of equations incorporating each concentration-time point (C0, C1, C2, C4, interpolated C5, and interpolated C6) as a predictor of AUC(0-6). Predictive performance was evaluated by assessment of the mean error (ME) and root mean square error (RMSE). Limited sampling strategy (LSS) equations with C2, C4, and C5 provided similar results for prediction of AUC(0-6) (R-2 = 0.869, 0.844, and 0.832, respectively). These 3 time points were superior to C0 in the prediction of AUC. The ME was similar for all time points; the RMSE was smallest for C2, C4, and C5. The highest sensitivity index was determined to be 4.9 hours postdose at steady state, suggesting that this time point provides the most information about the AUC(0-12). The results from limited sampling strategies and sensitivity analysis supported the use of a single blood sample at 5 hours postdose as a predictor of both AUC(0-6) and AUC(0-12). A jackknife procedure was used to evaluate the predictive performance of the model, and this demonstrated that collecting a sample at 5 hours after dosing could be considered as the optimal sampling time for predicting AUC(0-6).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Lean bodyweight (LBW) has been recommended for scaling drug doses. However, the current methods for predicting LBW are inconsistent at extremes of size and could be misleading with respect to interpreting weight-based regimens. Objective: The objective of the present study was to develop a semi-mechanistic model to predict fat-free mass (FFM) from subject characteristics in a population that includes extremes of size. FFM is considered to closely approximate LBW. There are several reference methods for assessing FFM, whereas there are no reference standards for LBW. Patients and methods: A total of 373 patients (168 male, 205 female) were included in the study. These data arose from two populations. Population A (index dataset) contained anthropometric characteristics, FFM estimated by dual-energy x-ray absorptiometry (DXA - a reference method) and bioelectrical impedance analysis (BIA) data. Population B (test dataset) contained the same anthropometric measures and FFM data as population A, but excluded BIA data. The patients in population A had a wide range of age (18-82 years), bodyweight (40.7-216.5kg) and BMI values (17.1-69.9 kg/m(2)). Patients in population B had BMI values of 18.7-38.4 kg/m(2). A two-stage semi-mechanistic model to predict FFM was developed from the demographics from population A. For stage 1 a model was developed to predict impedance and for stage 2 a model that incorporated predicted impedance was used to predict FFM. These two models were combined to provide an overall model to predict FFM from patient characteristics. The developed model for FFM was externally evaluated by predicting into population B. Results: The semi-mechanistic model to predict impedance incorporated sex, height and bodyweight. The developed model provides a good predictor of impedance for both males and females (r(2) = 0.78, mean error [ME] = 2.30 x 10(-3), root mean square error [RMSE] = 51.56 [approximately 10% of mean]). The final model for FFM incorporated sex, height and bodyweight. The developed model for FFM provided good predictive performance for both males and females (r(2) = 0.93, ME = -0.77, RMSE = 3.33 [approximately 6% of mean]). In addition, the model accurately predicted the FFM of subjects in population B (r(2) = 0.85, ME -0.04, RMSE = 4.39 [approximately 7% of mean]). Conclusions: A semi-mechanistic model has been developed to predict FFM (and therefore LBW) from easily accessible patient characteristics. This model has been prospectively evaluated and shown to have good predictive performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this study was to ascertain the most suitable dosing schedule for gentamicin in patients receiving hemodialysis. We developed a model to describe the concentrationtime course of gentamicin in patients receiving hemodialysis. Using the model, an optimal dosing schedule was evaluated. Various dosing regimens were compared in their ability to achieve maximum concentration (C-max, >= 8 mg/L) and area under the concentration time-curve (AUC >= 70 mg(.)h/L and <= 120 mg(.)h/L per 24 hours). The model was evaluated by comparing model predictions against real data collected retrospectively. Simulations from the model confirmed the benefits of predialysis dosing. The mean optimal dose was 230 mg administered immediately before dialysis. The model was found to have good predictive performance when simulated data were compared to data observed in real patients. In summary, a model was developed that describes gentamicin pharmacokinetics in patients receiving hemodialysis. Predialysis dosing provided a superior pharmacokinetic profile than did postdialysis dosing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

When making predictions with complex simulators it can be important to quantify the various sources of uncertainty. Errors in the structural specification of the simulator, for example due to missing processes or incorrect mathematical specification, can be a major source of uncertainty, but are often ignored. We introduce a methodology for inferring the discrepancy between the simulator and the system in discrete-time dynamical simulators. We assume a structural form for the discrepancy function, and show how to infer the maximum-likelihood parameter estimates using a particle filter embedded within a Monte Carlo expectation maximization (MCEM) algorithm. We illustrate the method on a conceptual rainfall-runoff simulator (logSPM) used to model the Abercrombie catchment in Australia. We assess the simulator and discrepancy model on the basis of their predictive performance using proper scoring rules. This article has supplementary material online. © 2011 International Biometric Society.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MOTIVATION: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. RESULTS: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It is well established that accent recognition can be as accurate as up to 95% when the signals are noise-free, using feature extraction techniques such as mel-frequency cepstral coefficients and binary classifiers such as discriminant analysis, support vector machine and k-nearest neighbors. In this paper, we demonstrate that the predictive performance can be reduced by as much as 15% when the signals are noisy. Specifically, in this paper we perturb the signals with different levels of white noise, and as the noise become stronger, the out-of-sample predictive performance deteriorates from 95% to 80%, although the in-sample prediction gives overly-optimistic results. ACM Computing Classification System (1998): C.3, C.5.1, H.1.2, H.2.4., G.3.