956 resultados para Data sets
Resumo:
Self-reported home values are widely used as a measure of housing wealth by researchers employing a variety of data sets and studying a number of different individual and household level decisions. The accuracy of this measure is an open empirical question, and requires some type of market assessment of the values reported. In this research, we study the predictive power of self-reported housing wealth when estimating sales prices utilizing the Health and Retirement Study. We find that homeowners, on average, overestimate the value of their properties by between 5% and 10%. More importantly, we are the first to document a strong correlation between accuracy and the economic conditions at the time of the purchase of the property (measured by the prevalent interest rate, the growth of household income, and the growth of median housing prices). While most individuals overestimate the value of their properties, those who bought during more difficult economic times tend to be more accurate, and in some cases even underestimate the value of their house. These results establish a surprisingly strong, likely permanent, and in many cases long-lived, effect of the initial conditions surrounding the purchases of properties, on how individuals value them. This cyclicality of the overestimation of house prices can provide some explanations for the difficulties currently faced by many homeowners, who were expecting large appreciations in home value to rescue them in case of increases in interest rates which could jeopardize their ability to live up to their financial commitments.
Resumo:
To be diagnostically useful, structural MRI must reliably distinguish Alzheimer's disease (AD) from normal aging in individual scans. Recent advances in statistical learning theory have led to the application of support vector machines to MRI for detection of a variety of disease states. The aims of this study were to assess how successfully support vector machines assigned individual diagnoses and to determine whether data-sets combined from multiple scanners and different centres could be used to obtain effective classification of scans. We used linear support vector machines to classify the grey matter segment of T1-weighted MR scans from pathologically proven AD patients and cognitively normal elderly individuals obtained from two centres with different scanning equipment. Because the clinical diagnosis of mild AD is difficult we also tested the ability of support vector machines to differentiate control scans from patients without post-mortem confirmation. Finally we sought to use these methods to differentiate scans between patients suffering from AD from those with frontotemporal lobar degeneration. Up to 96% of pathologically verified AD patients were correctly classified using whole brain images. Data from different centres were successfully combined achieving comparable results from the separate analyses. Importantly, data from one centre could be used to train a support vector machine to accurately differentiate AD and normal ageing scans obtained from another centre with different subjects and different scanner equipment. Patients with mild, clinically probable AD and age/sex matched controls were correctly separated in 89% of cases which is compatible with published diagnosis rates in the best clinical centres. This method correctly assigned 89% of patients with post-mortem confirmed diagnosis of either AD or frontotemporal lobar degeneration to their respective group. Our study leads to three conclusions: Firstly, support vector machines successfully separate patients with AD from healthy aging subjects. Secondly, they perform well in the differential diagnosis of two different forms of dementia. Thirdly, the method is robust and can be generalized across different centres. This suggests an important role for computer based diagnostic image analysis for clinical practice.
Resumo:
The network choice revenue management problem models customers as choosing from an offer-set, andthe firm decides the best subset to offer at any given moment to maximize expected revenue. The resultingdynamic program for the firm is intractable and approximated by a deterministic linear programcalled the CDLP which has an exponential number of columns. However, under the choice-set paradigmwhen the segment consideration sets overlap, the CDLP is difficult to solve. Column generation has beenproposed but finding an entering column has been shown to be NP-hard. In this paper, starting with aconcave program formulation based on segment-level consideration sets called SDCP, we add a class ofconstraints called product constraints, that project onto subsets of intersections. In addition we proposea natural direct tightening of the SDCP called ?SDCP, and compare the performance of both methodson the benchmark data sets in the literature. Both the product constraints and the ?SDCP method arevery simple and easy to implement and are applicable to the case of overlapping segment considerationsets. In our computational testing on the benchmark data sets in the literature, SDCP with productconstraints achieves the CDLP value at a fraction of the CPU time taken by column generation and webelieve is a very promising approach for quickly approximating CDLP when segment consideration setsoverlap and the consideration sets themselves are relatively small.
Resumo:
In this paper, we propose two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification. Based on predefined heuristics, the classifier ranks the unlabeled pixels and automatically chooses those that are considered the most valuable for its improvement. Once the pixels have been selected, the analyst labels them manually and the process is iterated. Starting with a small and nonoptimal training set, the model itself builds the optimal set of samples which minimizes the classification error. We have applied the proposed algorithms to a variety of remote sensing data, including very high resolution and hyperspectral images, using support vector machines. Experimental results confirm the consistency of the methods. The required number of training samples can be reduced to 10% using the methods proposed, reaching the same level of accuracy as larger data sets. A comparison with a state-of-the-art active learning method, margin sampling, is provided, highlighting advantages of the methods proposed. The effect of spatial resolution and separability of the classes on the quality of the selection of pixels is also discussed.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
Understanding how communities of living organisms assemble has been a central question in ecology since the early days of the discipline. Disentangling the different processes involved in community assembly is not only interesting in itself but also crucial for an understanding of how communities will behave under future environmental scenarios. The traditional concept of assembly rules reflects the notion that species do not co-occur randomly but are restricted in their co-occurrence by interspecific competition. This concept can be redefined in a more general framework where the co-occurrence of species is a product of chance, historical patterns of speciation and migration, dispersal, abiotic environmental factors, and biotic interactions, with none of these processes being mutually exclusive. Here we present a survey and meta-analyses of 59 papers that compare observed patterns in plant communities with null models simulating random patterns of species assembly. According to the type of data under study and the different methods that are applied to detect community assembly, we distinguish four main types of approach in the published literature: species co-occurrence, niche limitation, guild proportionality and limiting similarity. Results from our meta-analyses suggest that non-random co-occurrence of plant species is not a widespread phenomenon. However, whether this finding reflects the individualistic nature of plant communities or is caused by methodological shortcomings associated with the studies considered cannot be discerned from the available metadata. We advocate that more thorough surveys be conducted using a set of standardized methods to test for the existence of assembly rules in data sets spanning larger biological and geographical scales than have been considered until now. We underpin this general advice with guidelines that should be considered in future assembly rules research. This will enable us to draw more accurate and general conclusions about the non-random aspect of assembly in plant communities.
Resumo:
Background and aim of the study: Genomic gains and losses play a crucial role in the development and progression of DLBCL and are closely related to gene expression profiles (GEP), including the germinal center B-cell like (GCB) and activated B-cell like (ABC) cell of origin (COO) molecular signatures. To identify new oncogenes or tumor suppressor genes (TSG) involved in DLBCL pathogenesis and to determine their prognostic values, an integrated analysis of high-resolution gene expression and copy number profiling was performed. Patients and methods: Two hundred and eight adult patients with de novo CD20+ DLBCL enrolled in the prospective multicentric randomized LNH-03 GELA trials (LNH03-1B, -2B, -3B, 39B, -5B, -6B, -7B) with available frozen tumour samples, centralized reviewing and adequate DNA/RNA quality were selected. 116 patients were treated by Rituximab(R)-CHOP/R-miniCHOP and 92 patients were treated by the high dose (R)-ACVBP regimen dedicated to patients younger than 60 years (y) in frontline. Tumour samples were simultaneously analysed by high resolution comparative genomic hybridization (CGH, Agilent, 144K) and gene expression arrays (Affymetrix, U133+2). Minimal common regions (MCR), as defined by segments that affect the same chromosomal region in different cases, were delineated. Gene expression and MCR data sets were merged using Gene expression and dosage integrator algorithm (GEDI, Lenz et al. PNAS 2008) to identify new potential driver genes. Results: A total of 1363 recurrent (defined by a penetrance > 5%) MCRs within the DLBCL data set, ranging in size from 386 bp, affecting a single gene, to more than 24 Mb were identified by CGH. Of these MCRs, 756 (55%) showed a significant association with gene expression: 396 (59%) gains, 354 (52%) single-copy deletions, and 6 (67%) homozygous deletions. By this integrated approach, in addition to previously reported genes (CDKN2A/2B, PTEN, DLEU2, TNFAIP3, B2M, CD58, TNFRSF14, FOXP1, REL...), several genes targeted by gene copy abnormalities with a dosage effect and potential physiopathological impact were identified, including genes with TSG activity involved in cell cycle (HACE1, CDKN2C) immune response (CD68, CD177, CD70, TNFSF9, IRAK2), DNA integrity (XRCC2, BRCA1, NCOR1, NF1, FHIT) or oncogenic functions (CD79b, PTPRT, MALT1, AUTS2, MCL1, PTTG1...) with distinct distribution according to COO signature. The CDKN2A/2B tumor suppressor locus (9p21) was deleted homozygously in 27% of cases and hemizygously in 9% of cases. Biallelic loss was observed in 49% of ABC DLBCL and in 10% of GCB DLBCL. This deletion was strongly correlated to age and associated to a limited number of additional genetic abnormalities including trisomy 3, 18 and short gains/losses of Chr. 1, 2, 19 regions (FDR < 0.01), allowing to identify genes that may have synergistic effects with CDKN2A/2B inactivation. With a median follow-up of 42.9 months, only CDKN2A/2B biallelic deletion strongly correlates (FDR p.value < 0.01) to a poor outcome in the entire cohort (4y PFS = 44% [32-61] respectively vs. 74% [66-82] for patients in germline configuration; 4y OS = 53% [39-72] vs 83% [76-90]). In a Cox proportional hazard prediction of the PFS, CDKN2A/2B deletion remains predictive (HR = 1.9 [1.1-3.2], p = 0.02) when combined with IPI (HR = 2.4 [1.4-4.1], p = 0.001) and GCB status (HR = 1.3 [0.8-2.3], p = 0.31). This difference remains predictive in the subgroup of patients treated by R-CHOP (4y PFS = 43% [29-63] vs. 66% [55-78], p=0.02), in patients treated by R-ACVBP (4y PFS = 49% [28-84] vs. 83% [74-92], p=0.003), and in GCB (4y PFS = 50% [27-93] vs. 81% [73-90], p=0.02), or ABC/unclassified (5y PFS = 42% [28-61] vs. 67% [55-82] p = 0.009) molecular subtypes (Figure 1). Conclusion: We report for the first time an integrated genetic analysis of a large cohort of DLBCL patients included in a prospective multicentric clinical trial program allowing identifying new potential driver genes with pathogenic impact. However CDKN2A/2B deletion constitutes the strongest and unique prognostic factor of chemoresistance to R-CHOP, regardless the COO signature, which is not overcome by a more intensified immunochemotherapy. Patients displaying this frequent genomic abnormality warrant new and dedicated therapeutic approaches.
Resumo:
ABSTRACT: BACKGROUND: Many studies have been published outlining the global effects of 17 beta-estradiol (E2) on gene expression in human epithelial breast cancer derived MCF-7 cells. These studies show large variation in results, reporting between ~100 and ~1500 genes regulated by E2, with poor overlap. RESULTS: We performed a meta-analysis of these expression studies, using the Rank product method to obtain a more accurate and stable list of the differentially expressed genes, and of pathways regulated by E2. We analyzed 9 time-series data sets, concentrating on response at 3-4 hrs (early) and at 24 hrs (late). We found >1000 statistically significant probe sets after correction for multiple testing at 3-4 hrs, and >2000 significant probe sets at 24 hrs. Differentially expressed genes were examined by pathway analysis. This revealed 15 early response pathways, mostly related to cell signaling and proliferation, and 20 late response pathways, mostly related to breast cancer, cell division, DNA repair and recombination. CONCLUSIONS: Our results show that meta-analysis identified more differentially expressed genes than the individual studies, and that these genes act together in networks. These results provide new insight into E2 regulated mechanisms, especially in the context of breast cancer.
Resumo:
This paper describes the development and applications of a super-resolution method, known as Super-Resolution Variable-Pixel Linear Reconstruction. The algorithm works combining different lower resolution images in order to obtain, as a result, a higher resolution image. We show that it can make significant spatial resolution improvements to satellite images of the Earth¿s surface allowing recognition of objects with size approaching the limiting spatial resolution of the lower resolution images. The algorithm is based on the Variable-Pixel Linear Reconstruction algorithm developed by Fruchter and Hook, a well-known method in astronomy but never used for Earth remote sensing purposes. The algorithm preserves photometry, can weight input images according to the statistical significance of each pixel, and removes the effect of geometric distortion on both image shape and photometry. In this paper, we describe its development for remote sensing purposes, show the usefulness of the algorithm working with images as different to the astronomical images as the remote sensing ones, and show applications to: 1) a set of simulated multispectral images obtained from a real Quickbird image; and 2) a set of multispectral real Landsat Enhanced Thematic Mapper Plus (ETM+) images. These examples show that the algorithm provides a substantial improvement in limiting spatial resolution for both simulated and real data sets without significantly altering the multispectral content of the input low-resolution images, without amplifying the noise, and with very few artifacts.
Resumo:
This paper reports the results from a second characterisation of the 91500 zircon, including data from electron probe microanalysis, laser ablation inductively coupled plasma-mass spectrometry (LA-ICP-MS), secondary ion mass spectrometry (SIMS) and laser fluorination analyses. The focus of this initiative was to establish the suitability of this large single zircon crystal for calibrating in situ analyses of the rare earth elements and oxygen isotopes, as well as to provide working values for key geochemical systems. In addition to extensive testing of the chemical and structural homogeneity of this sample, the occurrence of banding in 91500 in both backscattered electron and cathodoluminescence images is described in detail. Blind intercomparison data reported by both LA-ICP-MS and SIMS laboratories indicate that only small systematic differences exist between the data sets provided by these two techniques. Furthermore, the use of NIST SRM 610 glass as the calibrant for SIMS analyses was found to introduce little or no systematic error into the results for zircon. Based on both laser fluorination and SIMS data, zircon 91500 seems to be very well suited for calibrating in situ oxygen isotopic analyses.
Resumo:
The evolution of ants is marked by remarkable adaptations that allowed the development of very complex social systems. To identify how ant-specific adaptations are associated with patterns of molecular evolution, we searched for signs of positive selection on amino-acid changes in proteins. We identified 24 functional categories of genes which were enriched for positively selected genes in the ant lineage. We also reanalyzed genome-wide data sets in bees and flies with the same methodology to check whether positive selection was specific to ants or also present in other insects. Notably, genes implicated in immunity were enriched for positively selected genes in the three lineages, ruling out the hypothesis that the evolution of hygienic behaviors in social insects caused a major relaxation of selective pressure on immune genes. Our scan also indicated that genes implicated in neurogenesis and olfaction started to undergo increased positive selection before the evolution of sociality in Hymenoptera. Finally, the comparison between these three lineages allowed us to pinpoint molecular evolution patterns that were specific to the ant lineage. In particular, there was ant-specific recurrent positive selection on genes with mitochondrial functions, suggesting that mitochondrial activity was improved during the evolution of this lineage. This might have been an important step toward the evolution of extreme lifespan that is a hallmark of ants.
Resumo:
PURPOSE: To use diffusion-tensor (DT) magnetic resonance (MR) imaging in patients with essential tremor who were treated with transcranial MR imaging-guided focused ultrasound lesion inducement to identify the structural connectivity of the ventralis intermedius nucleus of the thalamus and determine how DT imaging changes correlated with tremor changes after lesion inducement. MATERIALS AND METHODS: With institutional review board approval, and with prospective informed consent, 15 patients with medication-refractory essential tremor were enrolled in a HIPAA-compliant pilot study and were treated with transcranial MR imaging-guided focused ultrasound surgery targeting the ventralis intermedius nucleus of the thalamus contralateral to their dominant hand. Fourteen patients were ultimately included. DT MR imaging studies at 3.0 T were performed preoperatively and 24 hours, 1 week, 1 month, and 3 months after the procedure. Fractional anisotropy (FA) maps were calculated from the DT imaging data sets for all time points in all patients. Voxels where FA consistently decreased over time were identified, and FA change in these voxels was correlated with clinical changes in tremor over the same period by using Pearson correlation. RESULTS: Ipsilateral brain structures that showed prespecified negative correlation values of FA over time of -0.5 or less included the pre- and postcentral subcortical white matter in the hand knob area; the region of the corticospinal tract in the centrum semiovale, in the posterior limb of the internal capsule, and in the cerebral peduncle; the thalamus; the region of the red nucleus; the location of the central tegmental tract; and the region of the inferior olive. The contralateral middle cerebellar peduncle and bilateral portions of the superior vermis also showed persistent decrease in FA over time. There was strong correlation between decrease in FA and clinical improvement in hand tremor 3 months after lesion inducement (P < .001). CONCLUSION: DT MR imaging after MR imaging-guided focused ultrasound thalamotomy depicts changes in specific brain structures. The magnitude of the DT imaging changes after thalamic lesion inducement correlates with the degree of clinical improvement in essential tremor.
Resumo:
The most advanced stage of water erosion, the gully, represents severe problems in different contexts, both in rural and urban environments. In the search for a stabilization of the process in a viable manner it is of utmost importance to assess the efficiency of evaluation methodologies. For this purpose, the efficiency of low-cost conservation practices were tested for the reduction of soil and nutrient losses caused by erosion from gullies in Pinheiral, state of Rio de Janeiro. The following areas were studied: gully recovered by means of physical and biological strategies; gullies in recovering stage, by means of physical strategies only, and gullies under no restoration treatment. During the summer of 2005/2006, the following data sets were collected for this study: soil classification of each of the eroded gully areas; planimetric and altimetric survey; determination of rain erosivity indexes; determination of amount of soil sediment; sediment grain size characteristics; natural amounts of nutrients Ca, Mg, K and P, as well as total C and N concentrations. The results for the three first measurements were 52.5, 20.5, and 29.0 Mg in the sediments from the gully without intervention, and of 1.0, 1.7 and 1.8 Mg from the gully with physical interventions, indicating an average reduction of 95 %. The fully recovered gully produced no sediment during the period. The data of total nutrient loss from the three gullies under investigation showed reductions of 98 % for the recovering gully, and 99 % for the fully recovered one. As for the loss of nutrients, the data indicate a nutrient loss of 1,811 kg from for the non-treated gully. The use of physical and biological interventions made it possible to reduce overall nutrient loss by more than 96 %, over the entire rainy season, as compared to the non-treated gully. Results show that the methods used were effective in reducing soil and nutrient losses from gullies.
Resumo:
Objective: Health status measures usually have an asymmetric distribution and present a highpercentage of respondents with the best possible score (ceiling effect), specially when they areassessed in the overall population. Different methods to model this type of variables have beenproposed that take into account the ceiling effect: the tobit models, the Censored Least AbsoluteDeviations (CLAD) models or the two-part models, among others. The objective of this workwas to describe the tobit model, and compare it with the Ordinary Least Squares (OLS) model,that ignores the ceiling effect.Methods: Two different data sets have been used in order to compare both models: a) real datacomming from the European Study of Mental Disorders (ESEMeD), in order to model theEQ5D index, one of the measures of utilities most commonly used for the evaluation of healthstatus; and b) data obtained from simulation. Cross-validation was used to compare thepredicted values of the tobit model and the OLS models. The following estimators werecompared: the percentage of absolute error (R1), the percentage of squared error (R2), the MeanSquared Error (MSE) and the Mean Absolute Prediction Error (MAPE). Different datasets werecreated for different values of the error variance and different percentages of individuals withceiling effect. The estimations of the coefficients, the percentage of explained variance and theplots of residuals versus predicted values obtained under each model were compared.Results: With regard to the results of the ESEMeD study, the predicted values obtained with theOLS model and those obtained with the tobit models were very similar. The regressioncoefficients of the linear model were consistently smaller than those from the tobit model. In thesimulation study, we observed that when the error variance was small (s=1), the tobit modelpresented unbiased estimations of the coefficients and accurate predicted values, specially whenthe percentage of individuals wiht the highest possible score was small. However, when theerrror variance was greater (s=10 or s=20), the percentage of explained variance for the tobitmodel and the predicted values were more similar to those obtained with an OLS model.Conclusions: The proportion of variability accounted for the models and the percentage ofindividuals with the highest possible score have an important effect in the performance of thetobit model in comparison with the linear model.
Resumo:
Purpose: SIOPEN scoring of 123I mIBG imaging has been shown to predict response to induction chemotherapy and outcome at diagnosis in children with HRN.Method: Patterns of skeletal 123I mIBG uptake were assigned numerical scores (Mscore) ranging from 0 (no metastasis) to 72 (diffuse metastases) within 12 body areas as described previously. 271 anonymised, paired image data sets acquired at diagnosis and on completion of Rapid COJEC induction chemotherapy were reviewed, constituting a representative sample of 1602 children treated prospectively within the HR-NBL1/SIOPEN trial. Pre-and post-treatment Mscores were compared with bone marrow cytology (BM) and 3 year event free survival (EFS).Results: Results 224/271 patients showed skeletal MIBG-uptake at diagnosis and were evaluable forMIBG-response. Complete response (CR) on MIBG to Rapid COJEC induction was achieved by 66%, 34% and 15% of patients who had pre-treatment Mscores of <18 (n¼65, 29%), 18-44 (n¼95,42%) and Y ´ 45 (n¼64, 28.5%) respectively (chi squared test p<.0001). Mscore at diagnosis and on completion of Rapid COJEC correlated strongly with BM involvement (p<0.0001). The correlation of pre score with post scores and response was highly significant (p<0.001). Most importantly, the 3 year EFS in 47 children with Mscore 0 at diagnosis was 0.68 (A ` 0.07), by comparison with 0.42 (A` 0.06), 0.35 (A` 0.05) and 0.25 (A` 0.06) for patients in pre-treatment score groups <18, 18-44 and Y ´ 45, respectively (p<0.001). AnMscore threshold ofY ´ 45 at diagnosis was associated with significantly worse outcome by comparison with all other Mscore groups (p¼0.029). The 3 year EFS of 0.53 (A` 0.07) of patients in metastatic CR (mIBG and BM) after Rapid Cojec (33%) is clearly superior to patients not achieving metastatic CR (0.24 (A ` 0.04), p¼0.005).Conclusion: SIOPEN scoring of 123I mIBG imaging has been shown to predict response to induction chemotherapy and outcome at diagnosis in children with HRN.