969 resultados para classification scheme
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
The absolute K magnitudes and kinematic parameters of about 350 oxygen-rich Long-Period Variable stars are calibrated, by means of an up-to-date maximum-likelihood method, using HIPPARCOS parallaxes and proper motions together with radial velocities and, as additional data, periods and V-K colour indices. Four groups, differing by their kinematics and mean magnitudes, are found. For each of them, we also obtain the distributions of magnitude, period and de-reddened colour of the base population, as well as de-biased period-luminosity-colour relations and their two-dimensional projections. The SRa semiregulars do not seem to constitute a separate class of LPVs. The SRb appear to belong to two populations of different ages. In a PL diagram, they constitute two evolutionary sequences towards the Mira stage. The Miras of the disk appear to pulsate on a lower-order mode. The slopes of their de-biased PL and PC relations are found to be very different from the ones of the Oxygen Miras of the LMC. This suggests that a significant number of so-called Miras of the LMC are misclassified. This also suggests that the Miras of the LMC do not constitute a homogeneous group, but include a significant proportion of metal-deficient stars, suggesting a relatively smooth star formation history. As a consequence, one may not trivially transpose the LMC period-luminosity relation from one galaxy to the other.
Resumo:
Objectives: Recent population genetic studies suggest that the Staphylococcal Chromosome Cassettes mec (SCCmec) was acquired at a global scale much more frequently than previously thought. We hypothesized that such acquisitions can also be observed at a local level. In the present study, we aimed at investigating the diversity of SCCmec in a local MRSA population, where the dissemination of four MRSA clones has been observed (JCM 2007, 45: 3729). Methods: All the MRSA isolates (one per patient) recovered in the Vaud canton of Switzerland from January 2005 to December 2008 were analyzed in this study. We used the Double Locus Sequence Typing (DLST) method, based on clfB and spa loci, and the e-BURST algorithm to group the types with one allele in common (i.e. clone). To increase the discriminatory power of the DLST method, a third polymorphic marker (clfA) was further analyzed on a sub-sample of isolates. The SCCmec type of each isolate was determined with the first two PCRs of the Kondo scheme. Results: DLST analysis indicated that 1884/2036 isolates (92.5%) belong to the four predominant clones. A majority of isolates in each clone harboured an identical SCCmec type: 61/64 (95%) isolates to DLST clone 1−1 SCCmec IV, 1282/1323 (97%) to clone 2−2 SCCmec II, 237/288 (82%) to clone 3−3 SCCmec IV, and 192/209 (92%) to clone 4−4 SCCmec I. Unexpectedly, different SCCmec types were present in a single predominant DLST clone: SCCmec V plus one unusual type in 3 isolates of clone 1−1; SCCmec I, IV, V, VI plus two unusual types in 41 isolates of clone 2−2; SCCmec I, II, VI plus three unusual types in 51 isolates of clone 3−3; and SCCmec II, IV, V plus one unusual type in 17 isolates of clone 4−4. Interestingly, adding a third locus generally did not change the classification of incongruent SCCmec types, suggesting that these SCCmec elements have been acquired locally during the dissemination of the clones. Conclusion: Although the SCCmec diversity within clones was relatively low at a local level, a significant proportion of isolates with different SCCmec have been identified in the four major clones. This suggests that the local acquisition of SCCmec elements is not a rare event and illustrates the great capacity of S. aureus to quickly adapt to its environment by acquiring new genetic elements.
Resumo:
During the period 1996-2000, forty-three heavy rainfall events have been detected in the Internal Basins of Catalonia (Northeastern of Spain). Most of these events caused floods and serious damage. This high number leads to the need for a methodology to classify them, on the basis of their surface rainfall distribution, their internal organization and their physical features. The aim of this paper is to show a methodology to analyze systematically the convective structures responsible of those heavy rainfall events on the basis of the information supplied by the meteorological radar. The proposed methodology is as follows. Firstly, the rainfall intensity and the surface rainfall pattern are analyzed on the basis of the raingauge data. Secondly, the convective structures at the lowest level are identified and characterized by using a 2-D algorithm, and the convective cells are identified by using a 3-D procedure that looks for the reflectivity cores in every radar volume. Thirdly, the convective cells (3-D) are associated with the 2-D structures (convective rainfall areas). This methodology has been applied to the 43 heavy rainfall events using the meteorological radar located near Barcelona and the SAIH automatic raingauge network.
Resumo:
INTRODUCTION: The 2004 version of the World Health Organization classification subdivides thymic epithelial tumors into A, AB, B1, B2, and B3 (and rare other) thymomas and thymic carcinomas (TC). Due to a morphological continuum between some thymoma subtypes and some morphological overlap between thymomas and TC, a variable proportion of cases may pose problems in classification, contributing to the poor interobserver reproducibility in some studies. METHODS: To overcome this problem, hematoxylin-eosin-stained and immunohistochemically processed sections of prototypic, "borderland," and "combined" thymomas and TC (n = 72) were studied by 18 pathologists at an international consensus slide workshop supported by the International Thymic Malignancy Interest Group. RESULTS: Consensus was achieved on refined criteria for decision making at the A/AB borderland, the distinction between B1, B2, and B3 thymomas and the separation of B3 thymomas from TCs. "Atypical type A thymoma" is tentatively proposed as a new type A thymoma variant. New reporting strategies for tumors with more than one histological pattern are proposed. CONCLUSION: These guidelines can set the stage for reproducibility studies and the design of a clinically meaningful grading system for thymic epithelial tumors.
Resumo:
Selostus: Suomen happamien sulfaattimaiden kansainvälinen luokittelu
Resumo:
111 patients with acute leukemia, including 29 children, were classified according to the surface markers and cytochemistry of their blasts. The acute leukemias were separated into two majors groups (lymphoid and non-lymphoid) depending on the presence or absence of specific lymphoid markers. On the basis of these criteria a correlation of 94% with the hematological diagnosis was obtained. Acute lymphoblastic leukemia (ALL) was divisible into three sub-groups: 11 cases expressing T-cell specific markers were classified as T-ALL and 33 cases expressing the common ALL antigen (CALLA) as c-ALL. 18 of the latter expressed an additional marker, DSA (Daudi surface antigen), splitting c-ALL cases in two subgroups. Cytochemistry of the cases lacking specific surface markers (n = 67) served to diagnose 41 acute myeloid leukemia (AML) cases and 8 monoblastic leukemias. The remaining 18 cases could not be classified. The presence of absence of HLD-DR (Ia) antigens served to subdivide AML into two major subgroups. The prognostic significance of these new diagnostic splits is under active study.
Resumo:
BACKGROUND: Extensive research exists estimating the effect hazardous alcohol¦use on morbidity and mortality, but little research quantifies the association between¦alcohol consumption and utility scores in patients with alcohol dependence.¦In the context of comparative research, the World Health Organisation (WHO)¦proposed to categorise the risk for alcohol-related acute and chronic harm according¦to patients' average daily alcohol consumption. OBJECTIVES: To estimate utility¦scores associated with each category of the WHO drinking risk-level classification¦in patients with alcohol dependence (AD). METHODS: We used data from¦CONTROL, an observational cohort study including 143 AD patients from the Alcohol¦Treatment Center at Lausanne University Hospital, followed for 12 months.¦Average daily alcohol consumption was assessed monthly using the Timeline Follow-¦back method and patients were categorised according to the WHO drinking¦risk-level classification: abstinent, low, medium, high and very high. Other measures¦as sociodemographic characteristics and utility scores derived from the EuroQoL¦5-Dimensions questionnaire (EQ-5D) were collected every three months.¦Mixed models for repeated measures were used to estimate mean utility scores¦associated with WHO drinking risk-level categories. RESULTS: A total of 143 patients¦were included and the 12-month follow-up permitting the assessment of¦1318 person-months. At baseline the mean age of the patients was 44.6 (SD 11.8)¦and the majority of patients was male (63.6%). Using repeated measures analysis,¦utility scores decreased with increasing drinking levels, ranging from 0.80 in abstinent¦patients to 0.62 in patients with very high risk drinking level (p_0.0001).¦CONCLUSIONS: In this sample of patients with alcohol dependence undergoing¦specialized care, utility scores estimated from the EQ-5D appeared to substantially¦and consistently vary according to patients' WHO drinking level.
Resumo:
CD34/QBEND10 immunostaining has been assessed in 150 bone marrow biopsies (BMB) including 91 myelodysplastic syndromes (MDS), 16 MDS-related AML, 25 reactive BMB, and 18 cases where RA could neither be established nor ruled out. All cases were reviewed and classified according to the clinical and morphological FAB criteria. The percentage of CD34-positive (CD34 +) hematopoietic cells and the number of clusters of CD34+ cells in 10 HPF were determined. In most cases the CD34+ cell count was similar to the blast percentage determined morphologically. In RA, however, not only typical blasts but also less immature hemopoietic cells lying morphologically between blasts and promyelocytes were stained with CD34. The CD34+ cell count and cluster values were significantly higher in RA than in BMB with reactive changes (p<0.0001 for both), in RAEB than in RA (p=0.0006 and p=0.0189, respectively), in RAEBt than in RAEB (p=0.0001 and p=0.0038), and in MDS-AML than in RAEBt (p<0.0001 and p=0.0007). Presence of CD34+ cell clusters in RA correlated with increased risk of progression of the disease. We conclude that CD34 immunostaining in BMB is a useful tool for distinguishing RA from other anemias, assessing blast percentage in MDS cases, classifying them according to FAB, and following their evolution.
Resumo:
BACKGROUND: Several studies have established Glioblastoma Multiforme (GBM) prognostic and predictive models based on age and Karnofsky Performance Status (KPS), while very few studies evaluated the prognostic and predictive significance of preoperative MR-imaging. However, to date, there is no simple preoperative GBM classification that also correlates with a highly prognostic genomic signature. Thus, we present for the first time a biologically relevant, and clinically applicable tumor Volume, patient Age, and KPS (VAK) GBM classification that can easily and non-invasively be determined upon patient admission. METHODS: We quantitatively analyzed the volumes of 78 GBM patient MRIs present in The Cancer Imaging Archive (TCIA) corresponding to patients in The Cancer Genome Atlas (TCGA) with VAK annotation. The variables were then combined using a simple 3-point scoring system to form the VAK classification. A validation set (N = 64) from both the TCGA and Rembrandt databases was used to confirm the classification. Transcription factor and genomic correlations were performed using the gene pattern suite and Ingenuity Pathway Analysis. RESULTS: VAK-A and VAK-B classes showed significant median survival differences in discovery (P = 0.007) and validation sets (P = 0.008). VAK-A is significantly associated with P53 activation, while VAK-B shows significant P53 inhibition. Furthermore, a molecular gene signature comprised of a total of 25 genes and microRNAs was significantly associated with the classes and predicted survival in an independent validation set (P = 0.001). A favorable MGMT promoter methylation status resulted in a 10.5 months additional survival benefit for VAK-A compared to VAK-B patients. CONCLUSIONS: The non-invasively determined VAK classification with its implication of VAK-specific molecular regulatory networks, can serve as a very robust initial prognostic tool, clinical trial selection criteria, and important step toward the refinement of genomics-based personalized therapy for GBM patients.
Resumo:
A semisupervised support vector machine is presented for the classification of remote sensing images. The method exploits the wealth of unlabeled samples for regularizing the training kernel representation locally by means of cluster kernels. The method learns a suitable kernel directly from the image and thus avoids assuming a priori signal relations by using a predefined kernel structure. Good results are obtained in image classification examples when few labeled samples are available. The method scales almost linearly with the number of unlabeled samples and provides out-of-sample predictions.
Resumo:
The Brazilian System of Soil Classification (SiBCS) is a taxonomic system, open and in permanent construction, as new knowledge on Brazilian soils is obtained. The objective of this study was to characterize the chemical, physical, morphological, micro-morphological and mineralogical properties of four pedons of Oxisols in a highland toposequence in the upper Jequitinhonha Valley, emphasizing aspects of their genesis, classification and landscape development. The pedons occupy the following slope positions: summit - Red Oxisol (LV), mid slope (upper third) - Yellow-Red Oxisol (LVA), lower slope (middle third)- Yellow Oxisol (LA) and bottom of the valley (lowest third) - "Gray Oxisol" ("LAC"). These pedons were described and sampled for characterization in chemical and physical routine analyses. The total Fe, Al and Mn contents were determined by sulfuric attack and the Fe, Al and Mn oxides in dithionite-citrate-bicarbonate and oxalate extraction. The mineralogy of silicate clays was identified by X ray diffraction and the Fe oxides were detected by differential X ray diffraction. Total Ti, Ga and Zr contents were determined by X ray fluorescence spectrometry. The "LAC" is gray-colored and contains significant fragments of structure units in the form of a dense paste, characteristic of a gleysoil, in the horizons A and BA. All pedons are very clayey, dystrophic and have low contents of available P and a pH of around 5. The soil color was related to the Fe oxide content, which decreased along the slope. The decrease of crystalline and low- crystalline Fe along the slope confirmed the loss of Fe from the "LAC". Total Si increased along the slope and total Al remained constant. The clay fraction in all pedons was dominated by kaolinite and gibbsite. Hematite and goethite were identified in LV, low-intensity hematite and goethite in LVA, goethite in LA. In the "LAC", no hematite peaks and goethite were detected by differential X ray diffraction. The micro-morphology indicated prevalence of granular microstructure and porosity with complex stacking patterns.. The soil properties in the toposequence converged to a single soil class, the Oxisols, derived from the same source material. The landscape evolution and genesis of Oxisols of the highlands in the upper Jequitinhonha Valley are related to the evolution of the drainage system and the activity of excavating fauna.
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.
Resumo:
Soil science has sought to develop better techniques for the classification of soils, one of which is the use of remote sensing applications. The use of ground sensors to obtain soil spectral data has enabled the characterization of these data and the advancement of techniques for the quantification of soil attributes. In order to do this, the creation of a soil spectral library is necessary. A spectral library should be representative of the variability of the soils in a region. The objective of this study was to create a spectral library of distinct soils from several agricultural regions of Brazil. Spectral data were collected (using a Fieldspec sensor, 350-2,500 nm) for the horizons of 223 soil profiles from the regions of Matão, Paraguaçu Paulista, Andradina, Ipaussu, Mirandópolis, Piracicaba, São Carlos, Araraquara, Guararapes, Valparaíso (SP); Naviraí, Maracajú, Rio Brilhante, Três Lagoas (MS); Goianésia (GO); and Uberaba and Lagoa da Prata (MG). A Principal Component Analysis (PCA) of the data was then performed and a graphic representation of the spectral curve was created for each profile. The reflectance intensity of the curves was principally influenced by the levels of Fe2O3, clay, organic matter and the presence of opaque minerals. There was no change in the spectral curves in the horizons of the Latossolos, Nitossolos, and Neossolos Quartzarênicos. Argissolos had superficial horizon curves with the greatest intensity of reflection above 2,200 nm. Cambissolos and Neossolos Litólicos had curves with greater reflectance intensity in poorly developed horizons. Gleisols showed a convex curve in the region of 350-400 nm. The PCA was able to separate different data collection areas according to the region of source material. Principal component one (PC1) was correlated with the intensity of reflectance samples and PC2 with the slope between the visible and infrared samples. The use of the Spectral Library as an indicator of possible soil classes proved to be an important tool in profile classification.