950 resultados para cross-validation


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Flow State Scale-2 (FSS-2) and Dispositional Flow Scale-2 (DFS-2) are presented as two self-report instruments designed to assess flow experiences in physical activity. Item modifications were made to the original versions of these scales in order to improve the measurement of some of the flow dimensions. Confirmatory factor analyses of an item identification and a cross-validation sample demonstrated a good fit of the new scales. There was support for both a 9-first-order factor model and a higher order model with a global flow factor. The item identification sample yielded mean item loadings on the first-order factor of .78 for the FSS-2 and .77 for the DFS-2. Reliability estimates ranged from .80 to .90 for the FSS-2, and .81 to .90 for the DFS-2. In the cross-validation sample, mean item loadings on the first-order factor were .80 for the FSS-2, and .73 for the DFS-2. Reliability estimates ranged between .80 to .92 for the FSS-2 and .78 to .86 for the DFS-2. The scales are presented as ways of assessing flow experienced within a particular event (FSS-2) or the frequency of flow experiences in chosen physical activity in general (DFS-2).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The principle of using induction rules based on spatial environmental data to model a soil map has previously been demonstrated Whilst the general pattern of classes of large spatial extent and those with close association with geology were delineated small classes and the detailed spatial pattern of the map were less well rendered Here we examine several strategies to improve the quality of the soil map models generated by rule induction Terrain attributes that are better suited to landscape description at a resolution of 250 m are introduced as predictors of soil type A map sampling strategy is developed Classification error is reduced by using boosting rather than cross validation to improve the model Further the benefit of incorporating the local spatial context for each environmental variable into the rule induction is examined The best model was achieved by sampling in proportion to the spatial extent of the mapped classes boosting the decision trees and using spatial contextual information extracted from the environmental variables.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: To develop a model to predict the bleeding source and identify the cohort amongst patients with acute gastrointestinal bleeding (GIB) who require urgent intervention, including endoscopy. Patients with acute GIB, an unpredictable event, are most commonly evaluated and managed by non-gastroenterologists. Rapid and consistently reliable risk stratification of patients with acute GIB for urgent endoscopy may potentially improve outcomes amongst such patients by targeting scarce health-care resources to those who need it the most. Design and methods: Using ICD-9 codes for acute GIB, 189 patients with acute GIB and all. available data variables required to develop and test models were identified from a hospital medical records database. Data on 122 patients was utilized for development of the model and on 67 patients utilized to perform comparative analysis of the models. Clinical data such as presenting signs and symptoms, demographic data, presence of co-morbidities, laboratory data and corresponding endoscopic diagnosis and outcomes were collected. Clinical data and endoscopic diagnosis collected for each patient was utilized to retrospectively ascertain optimal management for each patient. Clinical presentations and corresponding treatment was utilized as training examples. Eight mathematical models including artificial neural network (ANN), support vector machine (SVM), k-nearest neighbor, linear discriminant analysis (LDA), shrunken centroid (SC), random forest (RF), logistic regression, and boosting were trained and tested. The performance of these models was compared using standard statistical analysis and ROC curves. Results: Overall the random forest model best predicted the source, need for resuscitation, and disposition with accuracies of approximately 80% or higher (accuracy for endoscopy was greater than 75%). The area under ROC curve for RF was greater than 0.85, indicating excellent performance by the random forest model Conclusion: While most mathematical models are effective as a decision support system for evaluation and management of patients with acute GIB, in our testing, the RF model consistently demonstrated the best performance. Amongst patients presenting with acute GIB, mathematical models may facilitate the identification of the source of GIB, need for intervention and allow optimization of care and healthcare resource allocation; these however require further validation. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Purpose: The range of variability between individuals of the same chronological age (CA) in somatic and biological maturity is large and especially accentuated around the adolescent growth spurt. Maturity assessment is an important consideration when dealing with adolescents, from both a research perspective and youth sports stratification. A noninvasive, practical method predicting years from peak height velocity (a maturity offset value) by using anthropometric variables is developed in one sample and cross-validated in two different samples. Methods: Gender specific multiple regression equations were calculated on a sample of 152 Canadian children aged 8-16 yr (79 boys; 73 girls) who were followed through adolescence from 1991 to 1997, The equations included three somatic dimensions (height, sitting height, and leg length), CA, and their interactions. The equations were cross-validated on a Combined sample of Canadian (71 boys, 40 girls measured from 1964 through 1973) and Flemish children (50 boys, 48 girls measured from 1985 through 1999). Results: The coefficient of determination (R2) for the boys' model was 0.92 and for the girls' model 0.91 the SEEs were 0.49 and 0.50, respectively, Mean difference between actual and predicted maturity offset for the verification samples was 0.24 (SD 0.65) yr in boys and 0,001 (SD 0.68) yr in girls. Conclusion: Although the cross-validation meets statistical standards or acceptance, caution 1, warranted with regard to implementation. It is recommended that maturity offset be considered as a categorical rather than a continuous assessment. Nevertheless, the equations presented are a reliable, noninvasive and a practical solution for the measure of biological maturity for matching adolescent athletes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: To compare percentage body fat (%BF) for a given body mass index (BMI) among New Zealand European, Maori and Pacific Island children. To develop prediction equations based on bioimpedance measurements for the estimation of fat-free mass (FFM) appropriate to children in these three ethnic groups. Design: Cross-sectional study. Purposive sampling of schoolchildren aimed at recruiting three children of each sex and ethnicity for each year of age. Double cross-validation of FFM prediction equations developed by multiple regression. Setting: Local schools in Auckland. Subjects: Healthy European, Maori and Pacific Island children (n = 172, 83 M, 89 F, mean age 9.4 +/- 2.8(s. d.), range 5 - 14 y). Measurements: Height, weight, age, sex and ethnicity were recorded. FFM was derived from measurements of total body water by deuterium dilution and resistance and reactance were measured by bioimpedance analysis. Results: For fixed BMI, the Maori and Pacific Island girls averaged 3.7% lower % BF than European girls. For boys a similar relation was not found since BMI did not significantly influence % BF of European boys ( P = 0.18). Based on bioimpedance measurements a single prediction equation was developed for all children: FFM (kg) = 0.622 height (cm)(2)/ resistance +0.234 weight (kg)+1.166, R-2 = 0.96, s. e. e. = 2.44 kg. Ethnicity, age and sex were not significant predictors. Conclusions: A robust equation for estimation of FFM in New Zealand European, Maori and Pacific Island children in the 5 - 14 y age range that is more suitable than BMI for the determination of body fatness in field studies has been developed. Sponsorship: Maurice and Phyllis Paykel Trust, Auckland University of Technology Contestable Grants Fund and the Ministry of Health.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main purpose of this study was to examine the applicability of geostatistical modeling to obtain valuable information for assessing the environmental impact of sewage outfall discharges. The data set used was obtained in a monitoring campaign to S. Jacinto outfall, located off the Portuguese west coast near Aveiro region, using an AUV. The Matheron’s classical estimator was used the compute the experimental semivariogram which was fitted to three theoretical models: spherical, exponential and gaussian. The cross-validation procedure suggested the best semivariogram model and ordinary kriging was used to obtain the predictions of salinity at unknown locations. The generated map shows clearly the plume dispersion in the studied area, indicating that the effluent does not reach the near by beaches. Our study suggests that an optimal design for the AUV sampling trajectory from a geostatistical prediction point of view, can help to compute more precise predictions and hence to quantify more accurately dilution. Moreover, since accurate measurements of plume’s dilution are rare, these studies might be very helpful in the future for validation of dispersion models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work the identification and diagnosis of various stages of chronic liver disease is addressed. The classification results of a support vector machine, a decision tree and a k-nearest neighbor classifier are compared. Ultrasound image intensity and textural features are jointly used with clinical and laboratorial data in the staging process. The classifiers training is performed by using a population of 97 patients at six different stages of chronic liver disease and a leave-one-out cross-validation strategy. The best results are obtained using the support vector machine with a radial-basis kernel, with 73.20% of overall accuracy. The good performance of the method is a promising indicator that it can be used, in a non invasive way, to provide reliable information about the chronic liver disease staging.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work liver contour is semi-automatically segmented and quantified in order to help the identification and diagnosis of diffuse liver disease. The features extracted from the liver contour are jointly used with clinical and laboratorial data in the staging process. The classification results of a support vector machine, a Bayesian and a k-nearest neighbor classifier are compared. A population of 88 patients at five different stages of diffuse liver disease and a leave-one-out cross-validation strategy are used in the classification process. The best results are obtained using the k-nearest neighbor classifier, with an overall accuracy of 80.68%. The good performance of the proposed method shows a reliable indicator that can improve the information in the staging of diffuse liver disease.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Introdução Actualmente, as mensagens electrónicas são consideradas um importante meio de comunicação. As mensagens electrónicas – vulgarmente conhecidas como emails – são utilizadas fácil e frequentemente para enviar e receber o mais variado tipo de informação. O seu uso tem diversos fins gerando diariamente um grande número de mensagens e, consequentemente um enorme volume de informação. Este grande volume de informação requer uma constante manipulação das mensagens de forma a manter o conjunto organizado. Tipicamente esta manipulação consiste em organizar as mensagens numa taxonomia. A taxonomia adoptada reflecte os interesses e as preferências particulares do utilizador. Motivação A organização manual de emails é uma actividade morosa e que consome tempo. A optimização deste processo através da implementação de um método automático, tende a melhorar a satisfação do utilizador. Cada vez mais existe a necessidade de encontrar novas soluções para a manipulação de conteúdo digital poupando esforços e custos ao utilizador; esta necessidade, concretamente no âmbito da manipulação de emails, motivou a realização deste trabalho. Hipótese O objectivo principal deste projecto consiste em permitir a organização ad-hoc de emails com um esforço reduzido por parte do utilizador. A metodologia proposta visa organizar os emails num conjunto de categorias, disjuntas, que reflectem as preferências do utilizador. A principal finalidade deste processo é produzir uma organização onde as mensagens sejam classificadas em classes apropriadas requerendo o mínimo número esforço possível por parte do utilizador. Para alcançar os objectivos estipulados, este projecto recorre a técnicas de mineração de texto, em especial categorização automática de texto, e aprendizagem activa. Para reduzir a necessidade de inquirir o utilizador – para etiquetar exemplos de acordo com as categorias desejadas – foi utilizado o algoritmo d-confidence. Processo de organização automática de emails O processo de organizar automaticamente emails é desenvolvido em três fases distintas: indexação, classificação e avaliação. Na primeira fase, fase de indexação, os emails passam por um processo transformativo de limpeza que visa essencialmente gerar uma representação dos emails adequada ao processamento automático. A segunda fase é a fase de classificação. Esta fase recorre ao conjunto de dados resultantes da fase anterior para produzir um modelo de classificação, aplicando-o posteriormente a novos emails. Partindo de uma matriz onde são representados emails, termos e os seus respectivos pesos, e um conjunto de exemplos classificados manualmente, um classificador é gerado a partir de um processo de aprendizagem. O classificador obtido é então aplicado ao conjunto de emails e a classificação de todos os emails é alcançada. O processo de classificação é feito com base num classificador de máquinas de vectores de suporte recorrendo ao algoritmo de aprendizagem activa d-confidence. O algoritmo d-confidence tem como objectivo propor ao utilizador os exemplos mais significativos para etiquetagem. Ao identificar os emails com informação mais relevante para o processo de aprendizagem, diminui-se o número de iterações e consequentemente o esforço exigido por parte dos utilizadores. A terceira e última fase é a fase de avaliação. Nesta fase a performance do processo de classificação e a eficiência do algoritmo d-confidence são avaliadas. O método de avaliação adoptado é o método de validação cruzada denominado 10-fold cross validation. Conclusões O processo de organização automática de emails foi desenvolvido com sucesso, a performance do classificador gerado e do algoritmo d-confidence foi relativamente boa. Em média as categorias apresentam taxas de erro relativamente baixas, a não ser as classes mais genéricas. O esforço exigido pelo utilizador foi reduzido, já que com a utilização do algoritmo d-confidence obteve-se uma taxa de erro próxima do valor final, mesmo com um número de casos etiquetados abaixo daquele que é requerido por um método supervisionado. É importante salientar, que além do processo automático de organização de emails, este projecto foi uma excelente oportunidade para adquirir conhecimento consistente sobre mineração de texto e sobre os processos de classificação automática e recuperação de informação. O estudo de áreas tão interessantes despertou novos interesses que consistem em verdadeiros desafios futuros.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação apresentada para a obtenção do Grau de Doutor em Informática pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia

Relevância:

60.00% 60.00%

Publicador:

Resumo:

INTRODUCTION: There are several risk scores for stratification of patients with ST-segment elevation myocardial infarction (STEMI), the most widely used of which are the TIMI and GRACE scores. However, these are complex and require several variables. The aim of this study was to obtain a reduced model with fewer variables and similar predictive and discriminative ability. METHODS: We studied 607 patients (age 62 years, SD=13; 76% male) who were admitted with STEMI and underwent successful primary angioplasty. Our endpoints were all-cause in-hospital and 30-day mortality. Considering all variables from the TIMI and GRACE risk scores, multivariate logistic regression models were fitted to the data to identify the variables that best predicted death. RESULTS: Compared to the TIMI score, the GRACE score had better predictive and discriminative performance for in-hospital mortality, with similar results for 30-day mortality. After data modeling, the variables with highest predictive ability were age, serum creatinine, heart failure and the occurrence of cardiac arrest. The new predictive model was compared with the GRACE risk score, after internal validation using 10-fold cross validation. A similar discriminative performance was obtained and some improvement was achieved in estimates of probabilities of death (increased for patients who died and decreased for those who did not). CONCLUSION: It is possible to simplify risk stratification scores for STEMI and primary angioplasty using only four variables (age, serum creatinine, heart failure and cardiac arrest). This simplified model maintained a good predictive and discriminative performance for short-term mortality.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.