12 resultados para Bootstrapping resampling

em Université de Lausanne, Switzerland


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Galton (1907) first demonstrated the "wisdom of crowds" phenomenon by averaging independent estimates of unknown quantities given by many individuals. Herzog and Hertwig (2009; hereafter H&H in Psychological Science) showed that individuals' own estimates can be improved by asking them to make two estimates at separate times and averaging them. H&H claimed to observe far greater improvement in accuracy when participants received "dialectical" instructions to consider why their first estimate might be wrong before making their second estimates than when they received standard instructions. We reanalyzed H&H's data using measures of accuracy that are unrelated to the frequency of identical first and second responses and found that participants in both conditions improved their accuracy to an equal degree.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

INTRODUCTION/OBJECTIVES: Detection rates for adenoma and early colorectal cancer (CRC) are insufficient due to low compliance towards invasive screening procedures, like colonoscopy.Available non-invasive screening tests have unfortunately low sensitivity and specificity performances.Therefore, there is a large unmet need calling for a cost-effective, reliable and non-invasive test to screen for early neoplastic and pre-neoplastic lesions AIMS & Methods: The objective is to develop a screening test able to detect early CRCs and adenomas.This test is based on a nucleic acids multi-gene assay performed on peripheral blood mononuclear cells (PBMCs).A colonoscopy-controlled feasibility study was conducted on 179 subjects.The first 92 subjects was used as training set to generate a statistical significant signature.Colonoscopy revealed 21 subjects with CRC,30 with adenoma bigger than 1 cm and 41 with no neoplastic or inflammatory lesions.The second group of 48 subjects (controls, CRC and polyps) was used as a test set and will be kept blinded for the entire data analysis.To determine the organ and disease specificity 38 subjects were used:24 with inflammatory bowel disease (IBD),14 with other cancers than CRC (OC).Blood samples were taken from each patient the day of the colonoscopy and PBMCs were purified. Total RNA was extracted following standard procedures.Multiplex RT-qPCR was applied on 92 different candidate biomarkers.Different univariate and multivariate statistical methods were applied on these candidates and among them 60 biomarkers with significant p-values (<0.01) were selected.These biomarkers are involved in several different biological functions as cellular movement,cell signaling and interaction,tissue and cellular development,cancer and cell growth and proliferation.Two distinct biomarker signatures are used to separate patients without lesion from those with cancer or with adenoma, named COLOX CRC and COLOX POL respectively.COLOX performances were validated using random resampling method, bootstrap. RESULTS: COLOX CRC and POL tests successfully separate patients without lesions from those with CRC (Se 67%,Sp 93%,AUC 0.87) and from those with adenoma bigger than 1cm (Se 63%,Sp 83%,AUC 0.77),respectively. 6/24 patients in the IBD group and 1/14 patients in the OC group have a positive COLOX CRC CONCLUSION: The two COLOX tests demonstrated a high sensitivity and specificity to detect the presence of CRCs and adenomas bigger than 1 cm.A prospective, multicenter, pivotal study is underway in order to confirm these promising results in a larger cohort.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Des progrès significatifs ont été réalisés dans le domaine de l'intégration quantitative des données géophysique et hydrologique l'échelle locale. Cependant, l'extension à de plus grandes échelles des approches correspondantes constitue encore un défi majeur. Il est néanmoins extrêmement important de relever ce défi pour développer des modèles fiables de flux des eaux souterraines et de transport de contaminant. Pour résoudre ce problème, j'ai développé une technique d'intégration des données hydrogéophysiques basée sur une procédure bayésienne de simulation séquentielle en deux étapes. Cette procédure vise des problèmes à plus grande échelle. L'objectif est de simuler la distribution d'un paramètre hydraulique cible à partir, d'une part, de mesures d'un paramètre géophysique pertinent qui couvrent l'espace de manière exhaustive, mais avec une faible résolution (spatiale) et, d'autre part, de mesures locales de très haute résolution des mêmes paramètres géophysique et hydraulique. Pour cela, mon algorithme lie dans un premier temps les données géophysiques de faible et de haute résolution à travers une procédure de réduction déchelle. Les données géophysiques régionales réduites sont ensuite reliées au champ du paramètre hydraulique à haute résolution. J'illustre d'abord l'application de cette nouvelle approche dintégration des données à une base de données synthétiques réaliste. Celle-ci est constituée de mesures de conductivité hydraulique et électrique de haute résolution réalisées dans les mêmes forages ainsi que destimations des conductivités électriques obtenues à partir de mesures de tomographic de résistivité électrique (ERT) sur l'ensemble de l'espace. Ces dernières mesures ont une faible résolution spatiale. La viabilité globale de cette méthode est testée en effectuant les simulations de flux et de transport au travers du modèle original du champ de conductivité hydraulique ainsi que du modèle simulé. Les simulations sont alors comparées. Les résultats obtenus indiquent que la procédure dintégration des données proposée permet d'obtenir des estimations de la conductivité en adéquation avec la structure à grande échelle ainsi que des predictions fiables des caractéristiques de transports sur des distances de moyenne à grande échelle. Les résultats correspondant au scénario de terrain indiquent que l'approche d'intégration des données nouvellement mise au point est capable d'appréhender correctement les hétérogénéitées à petite échelle aussi bien que les tendances à gande échelle du champ hydraulique prévalent. Les résultats montrent également une flexibilté remarquable et une robustesse de cette nouvelle approche dintégration des données. De ce fait, elle est susceptible d'être appliquée à un large éventail de données géophysiques et hydrologiques, à toutes les gammes déchelles. Dans la deuxième partie de ma thèse, j'évalue en détail la viabilité du réechantillonnage geostatique séquentiel comme mécanisme de proposition pour les méthodes Markov Chain Monte Carlo (MCMC) appliquées à des probmes inverses géophysiques et hydrologiques de grande dimension . L'objectif est de permettre une quantification plus précise et plus réaliste des incertitudes associées aux modèles obtenus. En considérant une série dexemples de tomographic radar puits à puits, j'étudie deux classes de stratégies de rééchantillonnage spatial en considérant leur habilité à générer efficacement et précisément des réalisations de la distribution postérieure bayésienne. Les résultats obtenus montrent que, malgré sa popularité, le réechantillonnage séquentiel est plutôt inefficace à générer des échantillons postérieurs indépendants pour des études de cas synthétiques réalistes, notamment pour le cas assez communs et importants où il existe de fortes corrélations spatiales entre le modèle et les paramètres. Pour résoudre ce problème, j'ai développé un nouvelle approche de perturbation basée sur une déformation progressive. Cette approche est flexible en ce qui concerne le nombre de paramètres du modèle et lintensité de la perturbation. Par rapport au rééchantillonage séquentiel, cette nouvelle approche s'avère être très efficace pour diminuer le nombre requis d'itérations pour générer des échantillons indépendants à partir de la distribution postérieure bayésienne. - Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending corresponding approaches beyond the local scale still represents a major challenge, yet is critically important for the development of reliable groundwater flow and contaminant transport models. To address this issue, I have developed a hydrogeophysical data integration technique based on a two-step Bayesian sequential simulation procedure that is specifically targeted towards larger-scale problems. The objective is to simulate the distribution of a target hydraulic parameter based on spatially exhaustive, but poorly resolved, measurements of a pertinent geophysical parameter and locally highly resolved, but spatially sparse, measurements of the considered geophysical and hydraulic parameters. To this end, my algorithm links the low- and high-resolution geophysical data via a downscaling procedure before relating the downscaled regional-scale geophysical data to the high-resolution hydraulic parameter field. I first illustrate the application of this novel data integration approach to a realistic synthetic database consisting of collocated high-resolution borehole measurements of the hydraulic and electrical conductivities and spatially exhaustive, low-resolution electrical conductivity estimates obtained from electrical resistivity tomography (ERT). The overall viability of this method is tested and verified by performing and comparing flow and transport simulations through the original and simulated hydraulic conductivity fields. The corresponding results indicate that the proposed data integration procedure does indeed allow for obtaining faithful estimates of the larger-scale hydraulic conductivity structure and reliable predictions of the transport characteristics over medium- to regional-scale distances. The approach is then applied to a corresponding field scenario consisting of collocated high- resolution measurements of the electrical conductivity, as measured using a cone penetrometer testing (CPT) system, and the hydraulic conductivity, as estimated from electromagnetic flowmeter and slug test measurements, in combination with spatially exhaustive low-resolution electrical conductivity estimates obtained from surface-based electrical resistivity tomography (ERT). The corresponding results indicate that the newly developed data integration approach is indeed capable of adequately capturing both the small-scale heterogeneity as well as the larger-scale trend of the prevailing hydraulic conductivity field. The results also indicate that this novel data integration approach is remarkably flexible and robust and hence can be expected to be applicable to a wide range of geophysical and hydrological data at all scale ranges. In the second part of my thesis, I evaluate in detail the viability of sequential geostatistical resampling as a proposal mechanism for Markov Chain Monte Carlo (MCMC) methods applied to high-dimensional geophysical and hydrological inverse problems in order to allow for a more accurate and realistic quantification of the uncertainty associated with the thus inferred models. Focusing on a series of pertinent crosshole georadar tomographic examples, I investigated two classes of geostatistical resampling strategies with regard to their ability to efficiently and accurately generate independent realizations from the Bayesian posterior distribution. The corresponding results indicate that, despite its popularity, sequential resampling is rather inefficient at drawing independent posterior samples for realistic synthetic case studies, notably for the practically common and important scenario of pronounced spatial correlation between model parameters. To address this issue, I have developed a new gradual-deformation-based perturbation approach, which is flexible with regard to the number of model parameters as well as the perturbation strength. Compared to sequential resampling, this newly proposed approach was proven to be highly effective in decreasing the number of iterations required for drawing independent samples from the Bayesian posterior distribution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

ABSTRACT: BACKGROUND: Chest pain raises concern for the possibility of coronary heart disease. Scoring methods have been developed to identify coronary heart disease in emergency settings, but not in primary care. METHODS: Data were collected from a multicenter Swiss clinical cohort study including 672 consecutive patients with chest pain, who had visited one of 59 family practitioners' offices. Using delayed diagnosis we derived a prediction rule to rule out coronary heart disease by means of a logistic regression model. Known cardiovascular risk factors, pain characteristics, and physical signs associated with coronary heart disease were explored to develop a clinical score. Patients diagnosed with angina or acute myocardial infarction within the year following their initial visit comprised the coronary heart disease group. RESULTS: The coronary heart disease score was derived from eight variables: age, gender, duration of chest pain from 1 to 60 minutes, substernal chest pain location, pain increases with exertion, absence of tenderness point at palpation, cardiovascular risks factors, and personal history of cardiovascular disease. Area under the receiver operating characteristics curve was of 0.95 with a 95% confidence interval of 0.92; 0.97. From this score, 413 patients were considered as low risk for values of percentile 5 of the coronary heart disease patients. Internal validity was confirmed by bootstrapping. External validation using data from a German cohort (Marburg, n = 774) revealed a receiver operating characteristics curve of 0.75 (95% confidence interval, 0.72; 0.81) with a sensitivity of 85.6% and a specificity of 47.2%. CONCLUSIONS: This score, based only on history and physical examination, is a complementary tool for ruling out coronary heart disease in primary care patients complaining of chest pain.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: The study aimed to compare the cost-effectiveness of concomitant and adjuvant temozolomide (TMZ) for the treatment of newly diagnosed glioblastoma multiforme versus initial radiotherapy alone from a public health care perspective. METHODS: The economic evaluation was performed alongside a randomized, multicenter, phase 3 trial. The primary endpoint of the trial was overall survival. Costs included all direct medical costs. Economic data were collected prospectively for a subgroup of 219 patients (38%). Unit costs for drugs, procedures, laboratory and imaging, radiotherapy, and hospital costs per day were collected from the official national reimbursement lists based on 2004. For the cost-effectiveness analysis, survival was expressed as 2.5 years restricted mean estimates. The incremental cost-effectiveness ratio (ICER) was constructed. Confidence intervals for the ICER were calculated using the Fieller method and bootstrapping. RESULTS: The difference in 2.5 years restricted mean survival between the treatment arms was 0.25 life-years and the ICER was euro37,361 per life-year gained with a 95% confidence interval (CI) ranging from euro19,544 to euro123,616. The area between the survival curves of the treatment arms suggests an increase of the overall survival gain for a longer follow-up. An extrapolation of the overall survival per treatment arm and imputation of costs for the extrapolated survival showed a substantial reduction in ICER. CONCLUSIONS: The ICER of euro37,361 per life-year gained is a conservative estimate. We concluded that despite the high TMZ acquisition costs, the costs per life-year gained are comparable to accepted first-line treatment with chemotherapy in patients with cancer.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

ABSTRACT: BACKGROUND: Chest pain raises concern for the possibility of coronary heart disease. Scoring methods have been developed to identify coronary heart disease in emergency settings, but not in primary care. METHODS: Data were collected from a multicenter Swiss clinical cohort study including 672 consecutive patients with chest pain, who had visited one of 59 family practitioners' offices. Using delayed diagnosis we derived a prediction rule to rule out coronary heart disease by means of a logistic regression model. Known cardiovascular risk factors, pain characteristics, and physical signs associated with coronary heart disease were explored to develop a clinical score. Patients diagnosed with angina or acute myocardial infarction within the year following their initial visit comprised the coronary heart disease group. RESULTS: The coronary heart disease score was derived from eight variables: age, gender, duration of chest pain from 1 to 60 minutes, substernal chest pain location, pain increases with exertion, absence of tenderness point at palpation, cardiovascular risks factors, and personal history of cardiovascular disease. Area under the receiver operating characteristics curve was of 0.95 with a 95% confidence interval of 0.92; 0.97. From this score, 413 patients were considered as low risk for values of percentile 5 of the coronary heart disease patients. Internal validity was confirmed by bootstrapping. External validation using data from a German cohort (Marburg, n = 774) revealed a receiver operating characteristics curve of 0.75 (95% confidence interval, 0.72; 0.81) with a sensitivity of 85.6% and a specificity of 47.2%. CONCLUSIONS: This score, based only on history and physical examination, is a complementary tool for ruling out coronary heart disease in primary care patients complaining of chest pain.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Accurate detection of subpopulation size determinations in bimodal populations remains problematic yet it represents a powerful way by which cellular heterogeneity under different environmental conditions can be compared. So far, most studies have relied on qualitative descriptions of population distribution patterns, on population-independent descriptors, or on arbitrary placement of thresholds distinguishing biological ON from OFF states. We found that all these methods fall short of accurately describing small population sizes in bimodal populations. Here we propose a simple, statistics-based method for the analysis of small subpopulation sizes for use in the free software environment R and test this method on real as well as simulated data. Four so-called population splitting methods were designed with different algorithms that can estimate subpopulation sizes from bimodal populations. All four methods proved more precise than previously used methods when analyzing subpopulation sizes of transfer competent cells arising in populations of the bacterium Pseudomonas knackmussii B13. The methods' resolving powers were further explored by bootstrapping and simulations. Two of the methods were not severely limited by the proportions of subpopulations they could estimate correctly, but the two others only allowed accurate subpopulation quantification when this amounted to less than 25% of the total population. In contrast, only one method was still sufficiently accurate with subpopulations smaller than 1% of the total population. This study proposes a number of rational approximations to quantifying small subpopulations and offers an easy-to-use protocol for their implementation in the open source statistical software environment R.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: The objective is to develop a cost-effective, reliable and non invasive screening test able to detect early CRCs and adenomas. This is done on a nucleic acids multigene assay performed on peripheral blood mononuclear cells (PBMCs). METHODS: A colonoscopy-controlled study was conducted on 179 subjects. 92 subjects (21 CRC, 30 adenoma >1 cm and 41 controls) were used as training set to generate a signature. Other 48 subjects kept blinded (controls, CRC and polyps) were used as a test set. To determine organ and disease specificity 38 subjects were used: 24 with inflammatory bowel disease (IBD),14 with other cancers (OC). Blood samples were taken and PBMCs were purified. After the RNA extraction, multiplex RT-qPCR was applied on 92 different candidate biomarkers. After different univariate and multivariate analysis 60 biomarkers with significant p-values (<0.01) were selected. 2 distinct biomarker signatures are used to separate patients without lesion from those with CRC or with adenoma, named COLOX CRC and COLOX POL. COLOX performances were validated using random resampling method, bootstrap. RESULTS: COLOX CRC and POL tests successfully separate patients without lesions from those with CRC (Se 67%, Sp 93%, AUC 0.87), and from those with adenoma > 1cm (Se 63%, Sp 83%, AUC 0.77). 6/24 patients in the IBD group and 1/14 patients in the OC group have a positive COLOX CRC. CONCLUSION: The two COLOX tests demonstrated a high Se and Sp to detect the presence of CRCs and adenomas > 1 cm. A prospective, multicenter, pivotal study is underway in order to confirm these promising results in a larger cohort.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Detection rates for adenoma and early colorectal cancer (CRC) are unsatisfactory due to low compliance towards invasive screening procedures such as colonoscopy. There is a large unmet screening need calling for an accurate, non-invasive and cost-effective test to screen for early neoplastic and pre-neoplastic lesions. Our goal is to identify effective biomarker combinations to develop a screening test aimed at detecting precancerous lesions and early CRC stages, based on a multigene assay performed on peripheral blood mononuclear cells (PBMC).Methods: A pilot study was conducted on 92 subjects. Colonoscopy revealed 21 CRC, 30 adenomas larger than 1 cm and 41 healthy controls. A panel of 103 biomarkers was selected by two approaches: a candidate gene approach based on literature review and whole transcriptome analysis of a subset of this cohort by Illumina TAG profiling. Blood samples were taken from each patient and PBMC purified. Total RNA was extracted and the 103 biomarkers were tested by multiplex RT-qPCR on the cohort. Different univariate and multivariate statistical methods were applied on the PCR data and 60 biomarkers, with significant p-value (< 0.01) for most of the methods, were selected.Results: The 60 biomarkers are involved in several different biological functions, such as cell adhesion, cell motility, cell signaling, cell proliferation, development and cancer. Two distinct molecular signatures derived from the biomarker combinations were established based on penalized logistic regression to separate patients without lesion from those with CRC or adenoma. These signatures were validated using bootstrapping method, leading to a separation of patients without lesion from those with CRC (Se 67%, Sp 93%, AUC 0.87) and from those with adenoma larger than 1cm (Se 63%, Sp 83%, AUC 0.77). In addition, the organ and disease specificity of these signatures was confirmed by means of patients with other cancer types and inflammatory bowel diseases.Conclusions: The two defined biomarker combinations effectively detect the presence of CRC and adenomas larger than 1 cm with high sensitivity and specificity. A prospective, multicentric, pivotal study is underway in order to validate these results in a larger cohort.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Intravenous thrombolysis (IVT) as treatment in acute ischaemic strokes may be insufficient to achieve recanalisation in certain patients. Predicting probability of non-recanalisation after IVT may have the potential to influence patient selection to more aggressive management strategies. We aimed at deriving and internally validating a predictive score for post-thrombolytic non-recanalisation, using clinical and radiological variables. In thrombolysis registries from four Swiss academic stroke centres (Lausanne, Bern, Basel and Geneva), patients were selected with large arterial occlusion on acute imaging and with repeated arterial assessment at 24 hours. Based on a logistic regression analysis, an integer-based score for each covariate of the fitted multivariate model was generated. Performance of integer-based predictive model was assessed by bootstrapping available data and cross validation (delete-d method). In 599 thrombolysed strokes, five variables were identified as independent predictors of absence of recanalisation: Acute glucose > 7 mmol/l (A), significant extracranial vessel STenosis (ST), decreased Range of visual fields (R), large Arterial occlusion (A) and decreased Level of consciousness (L). All variables were weighted 1, except for (L) which obtained 2 points based on β-coefficients on the logistic scale. ASTRAL-R scores 0, 3 and 6 corresponded to non-recanalisation probabilities of 18, 44 and 74 % respectively. Predictive ability showed AUC of 0.66 (95 %CI, 0.61-0.70) when using bootstrap and 0.66 (0.63-0.68) when using delete-d cross validation. In conclusion, the 5-item ASTRAL-R score moderately predicts non-recanalisation at 24 hours in thrombolysed ischaemic strokes. If its performance can be confirmed by external validation and its clinical usefulness can be proven, the score may influence patient selection for more aggressive revascularisation strategies in routine clinical practice.