181 resultados para Prediction algorithms


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The noise power spectrum (NPS) is the reference metric for understanding the noise content in computed tomography (CT) images. To evaluate the noise properties of clinical multidetector (MDCT) scanners, local 2D and 3D NPSs were computed for different acquisition reconstruction parameters.A 64- and a 128-MDCT scanners were employed. Measurements were performed on a water phantom in axial and helical acquisition modes. CT dose index was identical for both installations. Influence of parameters such as the pitch, the reconstruction filter (soft, standard and bone) and the reconstruction algorithm (filtered-back projection (FBP), adaptive statistical iterative reconstruction (ASIR)) were investigated. Images were also reconstructed in the coronal plane using a reformat process. Then 2D and 3D NPS methods were computed.In axial acquisition mode, the 2D axial NPS showed an important magnitude variation as a function of the z-direction when measured at the phantom center. In helical mode, a directional dependency with lobular shape was observed while the magnitude of the NPS was kept constant. Important effects of the reconstruction filter, pitch and reconstruction algorithm were observed on 3D NPS results for both MDCTs. With ASIR, a reduction of the NPS magnitude and a shift of the NPS peak to the low frequency range were visible. 2D coronal NPS obtained from the reformat images was impacted by the interpolation when compared to 2D coronal NPS obtained from 3D measurements.The noise properties of volume measured in last generation MDCTs was studied using local 3D NPS metric. However, impact of the non-stationarity noise effect may need further investigations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The state of the art to describe image quality in medical imaging is to assess the performance of an observer conducting a task of clinical interest. This can be done by using a model observer leading to a figure of merit such as the signal-to-noise ratio (SNR). Using the non-prewhitening (NPW) model observer, we objectively characterised the evolution of its figure of merit in various acquisition conditions. The NPW model observer usually requires the use of the modulation transfer function (MTF) as well as noise power spectra. However, although the computation of the MTF poses no problem when dealing with the traditional filtered back-projection (FBP) algorithm, this is not the case when using iterative reconstruction (IR) algorithms, such as adaptive statistical iterative reconstruction (ASIR) or model-based iterative reconstruction (MBIR). Given that the target transfer function (TTF) had already shown it could accurately express the system resolution even with non-linear algorithms, we decided to tune the NPW model observer, replacing the standard MTF by the TTF. It was estimated using a custom-made phantom containing cylindrical inserts surrounded by water. The contrast differences between the inserts and water were plotted for each acquisition condition. Then, mathematical transformations were performed leading to the TTF. As expected, the first results showed a dependency of the image contrast and noise levels on the TTF for both ASIR and MBIR. Moreover, FBP also proved to be dependent of the contrast and noise when using the lung kernel. Those results were then introduced in the NPW model observer. We observed an enhancement of SNR every time we switched from FBP to ASIR to MBIR. IR algorithms greatly improve image quality, especially in low-dose conditions. Based on our results, the use of MBIR could lead to further dose reduction in several clinical applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Activity monitors based on accelerometry are used to predict the speed and energy cost of walking at 0% slope, but not at other inclinations. Parallel measurements of body accelerations and altitude variation were studied to determine whether walking speed prediction could be improved. Fourteen subjects walked twice along a 1.3 km circuit with substantial slope variations (-17% to +17%). The parameters recorded were body acceleration using a uni-axial accelerometer, altitude variation using differential barometry, and walking speed using satellite positioning (DGPS). Linear regressions were calculated between acceleration and walking speed, and between acceleration/altitude and walking speed. These predictive models, calculated using the data from the first circuit run, were used to predict speed during the second circuit. Finally the predicted velocity was compared with the measured one. The result was that acceleration alone failed to predict speed (mean r = 0.4). Adding altitude variation improved the prediction (mean r = 0.7). With regard to the altitude/acceleration-speed relationship, substantial inter-individual variation was found. It is concluded that accelerometry, combined with altitude measurement, can assess position variations of humans provided inter-individual variation is taken into account. It is also confirmed that DGPS can be used for outdoor walking speed measurements, opening up new perspectives in the field of biomechanics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our goal was to evaluate the diagnostic utility of C-reactive protein (CRP) alone or combined with clinical probability assessment in patients with suspected pulmonary embolism (PE), and to compare its performance to a D-dimer assay. We conducted a prospective study in which we performed a common immuno-turbidimetric CRP test and a rapid enzyme-linked immunosorbent assay (ELISA) D-dimer test in 259 consecutive outpatients with suspected PE at the emergency department of a teaching hospital. We assessed clinical probability of PE by a validated prediction rule overridden by clinical judgment. Patients with D-dimer levels > or = 500 microg/l underwent a work-up consisting of lower-limb venous ultrasound, spiral computerized tomography, ventilation-perfusion scan, or pulmonary angiography. Patients were followed up for three months. Seventy-seven (30%) of the patients had PE. The CRP alone had a sensitivity of 84% (95% confidence interval [CI).: 74 to 92%) and a negative predictive value (NPV) of 87% (95% CI: 78 to 93%) at a cutpoint of 5 mg/l. Overall, 61 (24%) patients with a low clinical probability of PE had a CRP < 5 mg/l. Due to the low prevalence of PE (9%) in this subgroup, the NPV increased to 97% (95% CI: 89 to 100%). The D-dimer (cutpoint 500 micro g/l) showed a sensitivity of 100% (95% CI: 95 to 100%) and a NPV of 100% (95% CI: 94 to 100%) irrespective of clinical probability and accurately rule out PE in 56 (22%) patients. Standard CRP tests alone or combined with clinical probability assessment cannot safely exclude PE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

3 Summary 3. 1 English The pharmaceutical industry has been facing several challenges during the last years, and the optimization of their drug discovery pipeline is believed to be the only viable solution. High-throughput techniques do participate actively to this optimization, especially when complemented by computational approaches aiming at rationalizing the enormous amount of information that they can produce. In siiico techniques, such as virtual screening or rational drug design, are now routinely used to guide drug discovery. Both heavily rely on the prediction of the molecular interaction (docking) occurring between drug-like molecules and a therapeutically relevant target. Several softwares are available to this end, but despite the very promising picture drawn in most benchmarks, they still hold several hidden weaknesses. As pointed out in several recent reviews, the docking problem is far from being solved, and there is now a need for methods able to identify binding modes with a high accuracy, which is essential to reliably compute the binding free energy of the ligand. This quantity is directly linked to its affinity and can be related to its biological activity. Accurate docking algorithms are thus critical for both the discovery and the rational optimization of new drugs. In this thesis, a new docking software aiming at this goal is presented, EADock. It uses a hybrid evolutionary algorithm with two fitness functions, in combination with a sophisticated management of the diversity. EADock is interfaced with .the CHARMM package for energy calculations and coordinate handling. A validation was carried out on 37 crystallized protein-ligand complexes featuring 11 different proteins. The search space was defined as a sphere of 15 R around the center of mass of the ligand position in the crystal structure, and conversely to other benchmarks, our algorithms was fed with optimized ligand positions up to 10 A root mean square deviation 2MSD) from the crystal structure. This validation illustrates the efficiency of our sampling heuristic, as correct binding modes, defined by a RMSD to the crystal structure lower than 2 A, were identified and ranked first for 68% of the complexes. The success rate increases to 78% when considering the five best-ranked clusters, and 92% when all clusters present in the last generation are taken into account. Most failures in this benchmark could be explained by the presence of crystal contacts in the experimental structure. EADock has been used to understand molecular interactions involved in the regulation of the Na,K ATPase, and in the activation of the nuclear hormone peroxisome proliferatoractivated receptors a (PPARa). It also helped to understand the action of common pollutants (phthalates) on PPARy, and the impact of biotransformations of the anticancer drug Imatinib (Gleevec®) on its binding mode to the Bcr-Abl tyrosine kinase. Finally, a fragment-based rational drug design approach using EADock was developed, and led to the successful design of new peptidic ligands for the a5ß1 integrin, and for the human PPARa. In both cases, the designed peptides presented activities comparable to that of well-established ligands such as the anticancer drug Cilengitide and Wy14,643, respectively. 3.2 French Les récentes difficultés de l'industrie pharmaceutique ne semblent pouvoir se résoudre que par l'optimisation de leur processus de développement de médicaments. Cette dernière implique de plus en plus. de techniques dites "haut-débit", particulièrement efficaces lorsqu'elles sont couplées aux outils informatiques permettant de gérer la masse de données produite. Désormais, les approches in silico telles que le criblage virtuel ou la conception rationnelle de nouvelles molécules sont utilisées couramment. Toutes deux reposent sur la capacité à prédire les détails de l'interaction moléculaire entre une molécule ressemblant à un principe actif (PA) et une protéine cible ayant un intérêt thérapeutique. Les comparatifs de logiciels s'attaquant à cette prédiction sont flatteurs, mais plusieurs problèmes subsistent. La littérature récente tend à remettre en cause leur fiabilité, affirmant l'émergence .d'un besoin pour des approches plus précises du mode d'interaction. Cette précision est essentielle au calcul de l'énergie libre de liaison, qui est directement liée à l'affinité du PA potentiel pour la protéine cible, et indirectement liée à son activité biologique. Une prédiction précise est d'une importance toute particulière pour la découverte et l'optimisation de nouvelles molécules actives. Cette thèse présente un nouveau logiciel, EADock, mettant en avant une telle précision. Cet algorithme évolutionnaire hybride utilise deux pressions de sélections, combinées à une gestion de la diversité sophistiquée. EADock repose sur CHARMM pour les calculs d'énergie et la gestion des coordonnées atomiques. Sa validation a été effectuée sur 37 complexes protéine-ligand cristallisés, incluant 11 protéines différentes. L'espace de recherche a été étendu à une sphère de 151 de rayon autour du centre de masse du ligand cristallisé, et contrairement aux comparatifs habituels, l'algorithme est parti de solutions optimisées présentant un RMSD jusqu'à 10 R par rapport à la structure cristalline. Cette validation a permis de mettre en évidence l'efficacité de notre heuristique de recherche car des modes d'interactions présentant un RMSD inférieur à 2 R par rapport à la structure cristalline ont été classés premier pour 68% des complexes. Lorsque les cinq meilleures solutions sont prises en compte, le taux de succès grimpe à 78%, et 92% lorsque la totalité de la dernière génération est prise en compte. La plupart des erreurs de prédiction sont imputables à la présence de contacts cristallins. Depuis, EADock a été utilisé pour comprendre les mécanismes moléculaires impliqués dans la régulation de la Na,K ATPase et dans l'activation du peroxisome proliferatoractivated receptor a (PPARa). Il a également permis de décrire l'interaction de polluants couramment rencontrés sur PPARy, ainsi que l'influence de la métabolisation de l'Imatinib (PA anticancéreux) sur la fixation à la kinase Bcr-Abl. Une approche basée sur la prédiction des interactions de fragments moléculaires avec protéine cible est également proposée. Elle a permis la découverte de nouveaux ligands peptidiques de PPARa et de l'intégrine a5ß1. Dans les deux cas, l'activité de ces nouveaux peptides est comparable à celles de ligands bien établis, comme le Wy14,643 pour le premier, et le Cilengitide (PA anticancéreux) pour la seconde.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Prognostic models have been developed to predict survival of patients with newly diagnosed glioblastoma (GBM). To improve predictions, models should be updated with information at the recurrence. We performed a pooled analysis of European Organization for Research and Treatment of Cancer (EORTC) trials on recurrent glioblastoma to validate existing clinical prognostic factors, identify new markers, and derive new predictions for overall survival (OS) and progression free survival (PFS).¦METHODS: Data from 300 patients with recurrent GBM recruited in eight phase I or II trials conducted by the EORTC Brain Tumour Group were used to evaluate patient's age, sex, World Health Organisation (WHO) performance status (PS), presence of neurological deficits, disease history, use of steroids or anti-epileptics and disease characteristics to predict PFS and OS. Prognostic calculators were developed in patients initially treated by chemoradiation with temozolomide.¦RESULTS: Poor PS and more than one target lesion had a significant negative prognostic impact for both PFS and OS. Patients with large tumours measured by the maximum diameter of the largest lesion (⩾42mm) and treated with steroids at baseline had shorter OS. Tumours with predominant frontal location had better survival. Age and sex did not show independent prognostic values for PFS or OS.¦CONCLUSIONS: This analysis confirms performance status but not age as a major prognostic factor for PFS and OS in recurrent GBM. Patients with multiple and large lesions have an increased risk of death. With these data prognostic calculators with confidence intervals for both medians and fixed time probabilities of survival were derived.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Cytomegalovirus (CMV) disease remains an important problem in solid-organ transplant recipients, with the greatest risk among donor CMV-seropositive, recipient-seronegative (D(+)/R(-)) patients. CMV-specific cell-mediated immunity may be able to predict which patients will develop CMV disease. METHODS: We prospectively included D(+)/R(-) patients who received antiviral prophylaxis. We used the Quantiferon-CMV assay to measure interferon-γ levels following in vitro stimulation with CMV antigens. The test was performed at the end of prophylaxis and 1 and 2 months later. The primary outcome was the incidence of CMV disease at 12 months after transplant. We calculated positive and negative predictive values of the assay for protection from CMV disease. RESULTS: Overall, 28 of 127 (22%) patients developed CMV disease. Of 124 evaluable patients, 31 (25%) had a positive result, 81 (65.3%) had a negative result, and 12 (9.7%) had an indeterminate result (negative mitogen and CMV antigen) with the Quantiferon-CMV assay. At 12 months, patients with a positive result had a subsequent lower incidence of CMV disease than patients with a negative and an indeterminate result (6.4% vs 22.2% vs 58.3%, respectively; P < .001). Positive and negative predictive values of the assay for protection from CMV disease were 0.90 (95% confidence interval [CI], .74-.98) and 0.27 (95% CI, .18-.37), respectively. CONCLUSIONS: This assay may be useful to predict if patients are at low, intermediate, or high risk for the development of subsequent CMV disease after prophylaxis. CLINICAL TRIALS REGISTRATION: NCT00817908.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses primary care physicians, cardiologists, internists, angiologists and doctors desirous of improving vascular risk prediction in primary care. Many cardiovascular risk factors act aggressively on the arterial wall and result in atherosclerosis and atherothrombosis. Cardiovascular prognosis derived from ultrasound imaging is, however, excellent in subjects without formation of intimal thickening or atheromas. Since ultrasound visualises the arterial wall directly, the information derived from the arterial wall may add independent incremental information to the knowledge of risk derived from global risk assessment. This paper provides an overview on plaque imaging for vascular risk prediction in two parts: Part 1: Carotid IMT is frequently used as a surrogate marker for outcome in intervention studies addressing rather large cohorts of subjects. Carotid IMT as a risk prediction tool for the prevention of acute myocardial infarction and stroke has been extensively studied in many patients since 1987, and has yielded incremental hazard ratios for these cardiovascular events independently of established cardiovascular risk factors. However, carotid IMT measurements are not used uniformly and therefore still lack widely accepted standardisation. Hence, at an individual, practicebased level, carotid IMT is not recommended as a risk assessment tool. The total plaque area of the carotid arteries (TPA) is a measure of the global plaque burden within both carotid arteries. It was recently shown in a large Norwegian cohort involving over 6000 subjects that TPA is a very good predictor for future myocardial infarction in women with an area under the curve (AUC) using a receiver operating curves (ROC) value of 0.73 (in men: 0.63). Further, the AUC for risk prediction is high both for vascular death in a vascular prevention clinic group (AUC 0.77) and fatal or nonfatal myocardial infarction in a true primary care group (AUC 0.79). Since TPA has acceptable reproducibility, allows calculation of posttest risk and is easily obtained at low cost, this risk assessment tool may come in for more widespread use in the future and also serve as a tool for atherosclerosis tracking and guidance for intensity of preventive therapy. However, more studies with TPA are needed. Part 2: Carotid and femoral plaque formation as detected by ultrasound offers a global view of the extent of atherosclerosis. Several prospective cohort studies have shown that cardiovascular risk prediction is greater for plaques than for carotid IMT. The number of arterial beds affected by significant atheromas may simply be added numerically to derive additional information on the risk of vascular events. A new atherosclerosis burden score (ABS) simply calculates the sum of carotid and femoral plaques encountered during ultrasound scanning. ABS correlates well and independently with the presence of coronary atherosclerosis and stenosis as measured by invasive coronary angiogram. However, the prognostic power of ABS as an independent marker of risk still needs to be elucidated in prospective studies. In summary, the large number of ways to measure atherosclerosis and related changes in human arteries by ultrasound indicates that this technology is not yet sufficiently perfected and needs more standardisation and workup on clearly defined outcome studies before it can be recommended as a practice-based additional risk modifier.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present the most comprehensive comparison to date of the predictive benefit of genetics in addition to currently used clinical variables, using genotype data for 33 single-nucleotide polymorphisms (SNPs) in 1,547 Caucasian men from the placebo arm of the REduction by DUtasteride of prostate Cancer Events (REDUCE®) trial. Moreover, we conducted a detailed comparison of three techniques for incorporating genetics into clinical risk prediction. The first method was a standard logistic regression model, which included separate terms for the clinical covariates and for each of the genetic markers. This approach ignores a substantial amount of external information concerning effect sizes for these Genome Wide Association Study (GWAS)-replicated SNPs. The second and third methods investigated two possible approaches to incorporating meta-analysed external SNP effect estimates - one via a weighted PCa 'risk' score based solely on the meta analysis estimates, and the other incorporating both the current and prior data via informative priors in a Bayesian logistic regression model. All methods demonstrated a slight improvement in predictive performance upon incorporation of genetics. The two methods that incorporated external information showed the greatest receiver-operating-characteristic AUCs increase from 0.61 to 0.64. The value of our methods comparison is likely to lie in observations of performance similarities, rather than difference, between three approaches of very different resource requirements. The two methods that included external information performed best, but only marginally despite substantial differences in complexity.