80 resultados para parameter estimates
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
The loss of presynaptic markers is thought to represent a strong pathologic correlate of cognitive decline in Alzheimer's disease (AD). Spinophilin is a postsynaptic marker mainly located to the heads of dendritic spines. We assessed total numbers of spinophilin-immunoreactive puncta. in the CA I and CA3 fields of hippocampus and area 9 in 18 elderly individuals with various degrees of cognitive decline. The decrease in spinophilin-immunoreactivity was significantly related to both Braak neurofibrillary tangle (NFT) staging and clinical severity but not A beta deposition staging. The total number of spinophilin-immunoreactive puncta in CA I field and area 9 were significantly related to MMSE scores and predicted 23.5 and 61.9% of its variability. The relationship between total number of spinophilin-immunoreactive puncta in CA I field and MMSE scores did not persist when adjusting for Braak NFT staging. In contrast, the total number of spinophilin-immunoreactive puncta in area 9 was still significantly related to the cognitive outcome explaining an extra 9.6% of MMSE and 25.6% of the Clinical Dementia Rating scores variability. Our data suggest that neocortical dendritic spine loss is an independent parameter to consider in AD clinicopathologic correlations.
Resumo:
The aim of the present study was to retrospectively estimate the absorbed dose to kidneys in 17 patients treated in clinical practice with 90Y-ibritumomab tiuxetan for non-Hodgkin's lymphoma, using appropriate dosimetric approaches available. METHODS: The single-view effective point source method, including background subtraction, is used for planar quantification of renal activity. Since the high uptake in the liver affects the activity estimate in the right kidney, the dose to the left kidney serves as a surrogate for the dose to both kidneys. Calculation of absorbed dose is based on the Medical Internal Radiation Dose methodology with adjustment for patient kidney mass. RESULTS: The median dose to kidneys, based on the left kidney only, is 2.1 mGy/MBq (range, 0.92-4.4), whereas a value of 2.5 mGy/MBq (range, 1.5-4.7) is obtained, considering the activity in both kidneys. CONCLUSIONS: Irrespective of the method, doses to kidneys obtained in the present study were about 10 times higher than the median dose of 0.22 mGy/MBq (range, 0.00-0.95) were originally reported from the study leading to Food and Drug Administration approval. Our results are in good agreement with kidney-dose estimates recently reported from high-dose myeloablative therapy with 90Y-ibritumomab tiuxetan.
Resumo:
Time-lapse crosshole ground-penetrating radar (GPR) data, collected while infiltration occurs, can provide valuable information regarding the hydraulic properties of the unsaturated zone. In particular, the stochastic inversion of such data provides estimates of parameter uncertainties, which are necessary for hydrological prediction and decision making. Here, we investigate the effect of different infiltration conditions on the stochastic inversion of time-lapse, zero-offset-profile, GPR data. Inversions are performed using a Bayesian Markov-chain-Monte-Carlo methodology. Our results clearly indicate that considering data collected during a forced infiltration test helps to better refine soil hydraulic properties compared to data collected under natural infiltration conditions
Resumo:
Modern sonic logging tools designed for shallow environmental and engineering applications allow for P-wave phase velocity measurements over a wide frequency band. Methodological considerations indicate that, for saturated unconsolidated sediments in the silt to sand range and source frequencies ranging from approximately 1 to 30 kHz, the observable poro-elastic P-wave velocity dispersion is sufficiently pronounced to allow for reliable first-order estimations of the underlying permeability structure. These predictions have been tested on and verified for a surficial alluvial aquifer. Our results indicate that, even without any further calibration, the thus obtained permeability estimates as well as their variabilities within the pertinent lithological units are remarkably close to those expected based on the corresponding granulometric characteristics.
Resumo:
Participation is a key indicator of the potential effectiveness of any population-based intervention. Defining, measuring and reporting participation in cancer screening programmes has become more heterogeneous as the number and diversity of interventions have increased, and the purposes of this benchmarking parameter have broadened. This study, centred on colorectal cancer, addresses current issues that affect the increasingly complex task of comparing screening participation across settings. Reports from programmes with a defined target population and active invitation scheme, published between 2005 and 2012, were reviewed. Differences in defining and measuring participation were identified and quantified, and participation indicators were grouped by aims of measure and temporal dimensions. We found that consistent terminology, clear and complete reporting of participation definition and systematic documentation of coverage by invitation were lacking. Further, adherence to definitions proposed in the 2010 European Guidelines for Quality Assurance in Colorectal Cancer Screening was suboptimal. Ineligible individuals represented 1% to 15% of invitations, and variable criteria for ineligibility yielded differences in participation estimates that could obscure the interpretation of colorectal cancer screening participation internationally. Excluding ineligible individuals from the reference population enhances comparability of participation measures. Standardised measures of cumulative participation to compare screening protocols with different intervals and inclusion of time since invitation in definitions are urgently needed to improve international comparability of colorectal cancer screening participation. Recommendations to improve comparability of participation indicators in cancer screening interventions are made.
Resumo:
OBJECTIVES: Family studies typically use multiple sources of information on each individual including direct interviews and family history information. The aims of the present study were to: (1) assess agreement for diagnoses of specific substance use disorders between direct interviews and the family history method; (2) compare prevalence estimates according to the two methods; (3) test strategies to approximate prevalence estimates according to family history reports to those based on direct interviews; (4) determine covariates of inter-informant agreement; and (5) identify covariates that affect the likelihood of reporting disorders by informants. METHODS: Analyses were based on family study data which included 1621 distinct informant (first-degree relatives and spouses) - index subject pairs. RESULTS: Our main findings were: (1) inter-informant agreement was fair to good for all substance disorders, except for alcohol abuse; (2) the family history method underestimated the prevalence of drug but not alcohol use disorders; (3) lowering diagnostic thresholds for drug disorders and combining multiple family histories increased the accuracy of prevalence estimates for these disorders according to the family history method; (4) female sex of index subjects was associated with higher agreement for nearly all disorders; and (5) informants who themselves had a history of the same substance use disorder were more likely to report this disorder in their relatives, which entails the risk of overestimation of the size of familial aggregation. CONCLUSION: Our findings have important implications for the best-estimate procedure applied in family studies.
Quantifying uncertainty: physicians' estimates of infection in critically ill neonates and children.
Resumo:
To determine the diagnostic accuracy of physicians' prior probability estimates of serious infection in critically ill neonates and children, we conducted a prospective cohort study in 2 intensive care units. Using available clinical, laboratory, and radiographic information, 27 physicians provided 2567 probability estimates for 347 patients (follow-up rate, 92%). The median probability estimate of infection increased from 0% (i.e., no antibiotic treatment or diagnostic work-up for sepsis), to 2% on the day preceding initiation of antibiotic therapy, to 20% at initiation of antibiotic treatment (P<.001). At initiation of treatment, predictions discriminated well between episodes subsequently classified as proven infection and episodes ultimately judged unlikely to be infection (area under the curve, 0.88). Physicians also showed a good ability to predict blood culture-positive sepsis (area under the curve, 0.77). Treatment and testing thresholds were derived from the provided predictions and treatment rates. Physicians' prognoses regarding the presence of serious infection were remarkably precise. Studies investigating the value of new tests for diagnosis of sepsis should establish that they add incremental value to physicians' judgment.
Resumo:
Diagnostic information on children is typically elicited from both children and their parents. The aims of the present paper were to: (1) compare prevalence estimates according to maternal reports, paternal reports and direct interviews of children [major depressive disorder (MDD), anxiety and attention-deficit and disruptive behavioural disorders]; (2) assess mother-child, father-child and inter-parental agreement for these disorders; (3) determine the association between several child, parent and familial characteristics and the degree of diagnostic agreement or the likelihood of parental reporting; (4) determine the predictive validity of diagnostic information provided by parents and children. Analyses were based on 235 mother-offspring, 189 father-offspring and 128 mother-father pairs. Diagnostic assessment included the Kiddie-schedule for Affective Disorders and Schizophrenia (K-SADS) (offspring) and the Diagnostic Interview for Genetic Studies (DIGS) (parents and offspring at follow-up) interviews. Parental reports were collected using the Family History - Research Diagnostic Criteria (FH-RDC). Analyses revealed: (1) prevalence estimates for internalizing disorders were generally lower according to parental information than according to the K-SADS; (2) mother-child and father-child agreement was poor and within similar ranges; (3) parents with a history of MDD or attention deficit hyperactivity disorder (ADHD) reported these disorders in their children more frequently; (4) in a sub-sample followed-up into adulthood, diagnoses of MDD, separation anxiety and conduct disorder at baseline concurred with the corresponding lifetime diagnosis at age 19 according to the child rather than according to the parents. In conclusion, our findings support large discrepancies of diagnostic information provided by parents and children with generally lower reporting of internalizing disorders by parents, and differential reporting of depression and ADHD by parental disease status. Follow-up data also supports the validity of information provided by adolescent offspring.
Resumo:
Des progrès significatifs ont été réalisés dans le domaine de l'intégration quantitative des données géophysique et hydrologique l'échelle locale. Cependant, l'extension à de plus grandes échelles des approches correspondantes constitue encore un défi majeur. Il est néanmoins extrêmement important de relever ce défi pour développer des modèles fiables de flux des eaux souterraines et de transport de contaminant. Pour résoudre ce problème, j'ai développé une technique d'intégration des données hydrogéophysiques basée sur une procédure bayésienne de simulation séquentielle en deux étapes. Cette procédure vise des problèmes à plus grande échelle. L'objectif est de simuler la distribution d'un paramètre hydraulique cible à partir, d'une part, de mesures d'un paramètre géophysique pertinent qui couvrent l'espace de manière exhaustive, mais avec une faible résolution (spatiale) et, d'autre part, de mesures locales de très haute résolution des mêmes paramètres géophysique et hydraulique. Pour cela, mon algorithme lie dans un premier temps les données géophysiques de faible et de haute résolution à travers une procédure de réduction déchelle. Les données géophysiques régionales réduites sont ensuite reliées au champ du paramètre hydraulique à haute résolution. J'illustre d'abord l'application de cette nouvelle approche dintégration des données à une base de données synthétiques réaliste. Celle-ci est constituée de mesures de conductivité hydraulique et électrique de haute résolution réalisées dans les mêmes forages ainsi que destimations des conductivités électriques obtenues à partir de mesures de tomographic de résistivité électrique (ERT) sur l'ensemble de l'espace. Ces dernières mesures ont une faible résolution spatiale. La viabilité globale de cette méthode est testée en effectuant les simulations de flux et de transport au travers du modèle original du champ de conductivité hydraulique ainsi que du modèle simulé. Les simulations sont alors comparées. Les résultats obtenus indiquent que la procédure dintégration des données proposée permet d'obtenir des estimations de la conductivité en adéquation avec la structure à grande échelle ainsi que des predictions fiables des caractéristiques de transports sur des distances de moyenne à grande échelle. Les résultats correspondant au scénario de terrain indiquent que l'approche d'intégration des données nouvellement mise au point est capable d'appréhender correctement les hétérogénéitées à petite échelle aussi bien que les tendances à gande échelle du champ hydraulique prévalent. Les résultats montrent également une flexibilté remarquable et une robustesse de cette nouvelle approche dintégration des données. De ce fait, elle est susceptible d'être appliquée à un large éventail de données géophysiques et hydrologiques, à toutes les gammes déchelles. Dans la deuxième partie de ma thèse, j'évalue en détail la viabilité du réechantillonnage geostatique séquentiel comme mécanisme de proposition pour les méthodes Markov Chain Monte Carlo (MCMC) appliquées à des probmes inverses géophysiques et hydrologiques de grande dimension . L'objectif est de permettre une quantification plus précise et plus réaliste des incertitudes associées aux modèles obtenus. En considérant une série dexemples de tomographic radar puits à puits, j'étudie deux classes de stratégies de rééchantillonnage spatial en considérant leur habilité à générer efficacement et précisément des réalisations de la distribution postérieure bayésienne. Les résultats obtenus montrent que, malgré sa popularité, le réechantillonnage séquentiel est plutôt inefficace à générer des échantillons postérieurs indépendants pour des études de cas synthétiques réalistes, notamment pour le cas assez communs et importants où il existe de fortes corrélations spatiales entre le modèle et les paramètres. Pour résoudre ce problème, j'ai développé un nouvelle approche de perturbation basée sur une déformation progressive. Cette approche est flexible en ce qui concerne le nombre de paramètres du modèle et lintensité de la perturbation. Par rapport au rééchantillonage séquentiel, cette nouvelle approche s'avère être très efficace pour diminuer le nombre requis d'itérations pour générer des échantillons indépendants à partir de la distribution postérieure bayésienne. - Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending corresponding approaches beyond the local scale still represents a major challenge, yet is critically important for the development of reliable groundwater flow and contaminant transport models. To address this issue, I have developed a hydrogeophysical data integration technique based on a two-step Bayesian sequential simulation procedure that is specifically targeted towards larger-scale problems. The objective is to simulate the distribution of a target hydraulic parameter based on spatially exhaustive, but poorly resolved, measurements of a pertinent geophysical parameter and locally highly resolved, but spatially sparse, measurements of the considered geophysical and hydraulic parameters. To this end, my algorithm links the low- and high-resolution geophysical data via a downscaling procedure before relating the downscaled regional-scale geophysical data to the high-resolution hydraulic parameter field. I first illustrate the application of this novel data integration approach to a realistic synthetic database consisting of collocated high-resolution borehole measurements of the hydraulic and electrical conductivities and spatially exhaustive, low-resolution electrical conductivity estimates obtained from electrical resistivity tomography (ERT). The overall viability of this method is tested and verified by performing and comparing flow and transport simulations through the original and simulated hydraulic conductivity fields. The corresponding results indicate that the proposed data integration procedure does indeed allow for obtaining faithful estimates of the larger-scale hydraulic conductivity structure and reliable predictions of the transport characteristics over medium- to regional-scale distances. The approach is then applied to a corresponding field scenario consisting of collocated high- resolution measurements of the electrical conductivity, as measured using a cone penetrometer testing (CPT) system, and the hydraulic conductivity, as estimated from electromagnetic flowmeter and slug test measurements, in combination with spatially exhaustive low-resolution electrical conductivity estimates obtained from surface-based electrical resistivity tomography (ERT). The corresponding results indicate that the newly developed data integration approach is indeed capable of adequately capturing both the small-scale heterogeneity as well as the larger-scale trend of the prevailing hydraulic conductivity field. The results also indicate that this novel data integration approach is remarkably flexible and robust and hence can be expected to be applicable to a wide range of geophysical and hydrological data at all scale ranges. In the second part of my thesis, I evaluate in detail the viability of sequential geostatistical resampling as a proposal mechanism for Markov Chain Monte Carlo (MCMC) methods applied to high-dimensional geophysical and hydrological inverse problems in order to allow for a more accurate and realistic quantification of the uncertainty associated with the thus inferred models. Focusing on a series of pertinent crosshole georadar tomographic examples, I investigated two classes of geostatistical resampling strategies with regard to their ability to efficiently and accurately generate independent realizations from the Bayesian posterior distribution. The corresponding results indicate that, despite its popularity, sequential resampling is rather inefficient at drawing independent posterior samples for realistic synthetic case studies, notably for the practically common and important scenario of pronounced spatial correlation between model parameters. To address this issue, I have developed a new gradual-deformation-based perturbation approach, which is flexible with regard to the number of model parameters as well as the perturbation strength. Compared to sequential resampling, this newly proposed approach was proven to be highly effective in decreasing the number of iterations required for drawing independent samples from the Bayesian posterior distribution.
Resumo:
Natural selection is typically exerted at some specific life stages. If natural selection takes place before a trait can be measured, using conventional models can cause wrong inference about population parameters. When the missing data process relates to the trait of interest, a valid inference requires explicit modeling of the missing process. We propose a joint modeling approach, a shared parameter model, to account for nonrandom missing data. It consists of an animal model for the phenotypic data and a logistic model for the missing process, linked by the additive genetic effects. A Bayesian approach is taken and inference is made using integrated nested Laplace approximations. From a simulation study we find that wrongly assuming that missing data are missing at random can result in severely biased estimates of additive genetic variance. Using real data from a wild population of Swiss barn owls Tyto alba, our model indicates that the missing individuals would display large black spots; and we conclude that genes affecting this trait are already under selection before it is expressed. Our model is a tool to correctly estimate the magnitude of both natural selection and additive genetic variance.
Resumo:
Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale for the purpose of improving predictions of groundwater flow and solute transport. However, extending corresponding approaches to the regional scale still represents one of the major challenges in the domain of hydrogeophysics. To address this problem, we have developed a regional-scale data integration methodology based on a two-step Bayesian sequential simulation approach. Our objective is to generate high-resolution stochastic realizations of the regional-scale hydraulic conductivity field in the common case where there exist spatially exhaustive but poorly resolved measurements of a related geophysical parameter, as well as highly resolved but spatially sparse collocated measurements of this geophysical parameter and the hydraulic conductivity. To integrate this multi-scale, multi-parameter database, we first link the low- and high-resolution geophysical data via a stochastic downscaling procedure. This is followed by relating the downscaled geophysical data to the high-resolution hydraulic conductivity distribution. After outlining the general methodology of the approach, we demonstrate its application to a realistic synthetic example where we consider as data high-resolution measurements of the hydraulic and electrical conductivities at a small number of borehole locations, as well as spatially exhaustive, low-resolution estimates of the electrical conductivity obtained from surface-based electrical resistivity tomography. The different stochastic realizations of the hydraulic conductivity field obtained using our procedure are validated by comparing their solute transport behaviour with that of the underlying ?true? hydraulic conductivity field. We find that, even in the presence of strong subsurface heterogeneity, our proposed procedure allows for the generation of faithful representations of the regional-scale hydraulic conductivity structure and reliable predictions of solute transport over long, regional-scale distances.
Resumo:
To evaluate how young physicians in training perceive their patients' cardiovascular risk based on the medical charts and their clinical judgment. Cross sectional observational study. University outpatient clinic, Lausanne, Switzerland. Two hundred hypertensive patients and 50 non-hypertensive patients with at least one cardiovascular risk factor. Comparison of the absolute 10-year cardiovascular risk calculated by a computer program based on the Framingham score and adapted for physicians by the WHO/ISH with the perceived risk as assessed clinically by the physicians. Physicians underestimated the 10-year cardiovascular risk of their patients compared to that calculated with the Framingham score. Concordance between methods was 39% for hypertensive patients and 30% for non-hypertensive patients. Underestimation of cardiovascular risks for hypertensive patients was related to the fact they had a stabilized systolic blood pressure under 140 mm Hg (OR = 2.1 [1.1; 4.1]). These data show that young physicians in training often have an incorrect perception of the cardiovascular risk of their patients with a tendency to underestimate the risk. However, the calculated risk could also be slightly overestimated when applying the Framingham Heart Study model to a Swiss population. To implement a systematic evaluation of risk factors in primary care a greater emphasis should be placed on the teaching of cardiovascular risk evaluation and on the implementation of quality improvement programs.
Resumo:
The tendency for more closely related species to share similar traits and ecological strategies can be explained by their longer shared evolutionary histories and represents phylogenetic conservatism. How strongly species traits co-vary with phylogeny can significantly impact how we analyze cross-species data and can influence our interpretation of assembly rules in the rapidly expanding field of community phylogenetics. Phylogenetic conservatism is typically quantified by analyzing the distribution of species values on the phylogenetic tree that connects them. Many phylogenetic approaches, however, assume a completely sampled phylogeny: while we have good estimates of deeper phylogenetic relationships for many species-rich groups, such as birds and flowering plants, we often lack information on more recent interspecific relationships (i.e., within a genus). A common solution has been to represent these relationships as polytomies on trees using taxonomy as a guide. Here we show that such trees can dramatically inflate estimates of phylogenetic conservatism quantified using S. P. Blomberg et al.'s K statistic. Using simulations, we show that even randomly generated traits can appear to be phylogenetically conserved on poorly resolved trees. We provide a simple rarefaction-based solution that can reliably retrieve unbiased estimates of K, and we illustrate our method using data on first flowering times from Thoreau's woods (Concord, Massachusetts, USA).