133 resultados para classification aided by clustering
Resumo:
BACKGROUND AND PURPOSE: MCI was recently subdivided into sd-aMCI, sd-fMCI, and md-aMCI. The current investigation aimed to discriminate between MCI subtypes by using DTI. MATERIALS AND METHODS: Sixty-six prospective participants were included: 18 with sd-aMCI, 13 with sd-fMCI, and 35 with md-aMCI. Statistics included group comparisons using TBSS and individual classification using SVMs. RESULTS: The group-level analysis revealed a decrease in FA in md-aMCI versus sd-aMCI in an extensive bilateral, right-dominant network, and a more pronounced reduction of FA in md-aMCI compared with sd-fMCI in right inferior fronto-occipital fasciculus and inferior longitudinal fasciculus. The comparison between sd-fMCI and sd-aMCI, as well as the analysis of the other diffusion parameters, yielded no significant group differences. The individual-level SVM analysis provided discrimination between the MCI subtypes with accuracies around 97%. The major limitation is the relatively small number of cases of MCI. CONCLUSIONS: Our data show that, at the group level, the md-aMCI subgroup has the most pronounced damage in white matter integrity. Individually, SVM analysis of white matter FA provided highly accurate classification of MCI subtypes.
Resumo:
Colorectal cancer (CRC) is a major cause of cancer mortality. Whereas some patients respond well to therapy, others do not, and thus more precise, individualized treatment strategies are needed. To that end, we analyzed gene expression profiles from 1,290 CRC tumors using consensus-based unsupervised clustering. The resultant clusters were then associated with therapeutic response data to the epidermal growth factor receptor-targeted drug cetuximab in 80 patients. The results of these studies define six clinically relevant CRC subtypes. Each subtype shares similarities to distinct cell types within the normal colon crypt and shows differing degrees of 'stemness' and Wnt signaling. Subtype-specific gene signatures are proposed to identify these subtypes. Three subtypes have markedly better disease-free survival (DFS) after surgical resection, suggesting these patients might be spared from the adverse effects of chemotherapy when they have localized disease. One of these three subtypes, identified by filamin A expression, does not respond to cetuximab but may respond to cMET receptor tyrosine kinase inhibitors in the metastatic setting. Two other subtypes, with poor and intermediate DFS, associate with improved response to the chemotherapy regimen FOLFIRI in adjuvant or metastatic settings. Development of clinically deployable assays for these subtypes and of subtype-specific therapies may contribute to more effective management of this challenging disease.
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
When dealing with multi-angular image sequences, problems of reflectance changes due either to illumination and acquisition geometry, or to interactions with the atmosphere, naturally arise. These phenomena interplay with the scene and lead to a modification of the measured radiance: for example, according to the angle of acquisition, tall objects may be seen from top or from the side and different light scatterings may affect the surfaces. This results in shifts in the acquired radiance, that make the problem of multi-angular classification harder and might lead to catastrophic results, since surfaces with the same reflectance return significantly different signals. In this paper, rather than performing atmospheric or bi-directional reflection distribution function (BRDF) correction, a non-linear manifold learning approach is used to align data structures. This method maximizes the similarity between the different acquisitions by deforming their manifold, thus enhancing the transferability of classification models among the images of the sequence.
Resumo:
Tire traces can be observed on several crime scenes as vehicles are often used by criminals. The tread abrasion on the road, while braking or skidding, leads to the production of small rubber particles which can be collected for comparison purposes. This research focused on the statistical comparison of Py-GC/MS profiles of tire traces and tire treads. The optimisation of the analytical method was carried out using experimental designs. The aim was to determine the best pyrolysis parameters regarding the repeatability of the results. Thus, the pyrolysis factor effect could also be calculated. The pyrolysis temperature was found to be five time more important than time. Finally, a pyrolysis at 650 °C during 15 s was selected. Ten tires of different manufacturers and models were used for this study. Several samples were collected on each tire, and several replicates were carried out to study the variability within each tire (intravariability). More than eighty compounds were integrated for each analysis and the variability study showed that more than 75% presented a relative standard deviation (RSD) below 5% for the ten tires, thus supporting a low intravariability. The variability between the ten tires (intervariability) presented higher values and the ten most variant compounds had a RSD value above 13%, supporting their high potential of discrimination between the tires tested. Principal Component Analysis (PCA) was able to fully discriminate the ten tires with the help of the first three principal components. The ten tires were finally used to perform braking tests on a racetrack with a vehicle equipped with an anti-lock braking system. The resulting tire traces were adequately collected using sheets of white gelatine. As for tires, the intravariability for the traces was found to be lower than the intervariability. Clustering methods were carried out and the Ward's method based on the squared Euclidean distance was able to correctly group all of the tire traces replicates in the same cluster than the replicates of their corresponding tire. Blind tests on traces were performed and were correctly assigned to their tire source. These results support the hypothesis that the tested tires, of different manufacturers and models, can be discriminated by a statistical comparison of their chemical profiles. The traces were found to be not differentiable from their source but differentiable from all the other tires present in the subset. The results are promising and will be extended on a larger sample set.
Resumo:
Multicentric carpotarsal osteolysis (MCTO) is a rare skeletal dysplasia characterized by aggressive osteolysis, particularly affecting the carpal and tarsal bones, and is frequently associated with progressive renal failure. Using exome capture and next-generation sequencing in five unrelated simplex cases of MCTO, we identified previously unreported missense mutations clustering within a 51 base pair region of the single exon of MAFB, validated by Sanger sequencing. A further six unrelated simplex cases with MCTO were also heterozygous for previously unreported mutations within this same region, as were affected members of two families with autosomal-dominant MCTO. MAFB encodes a transcription factor that negatively regulates RANKL-induced osteoclastogenesis and is essential for normal renal development. Identification of this gene paves the way for development of novel therapeutic approaches for this crippling disease and provides insight into normal bone and kidney development.
Resumo:
General clustering deals with weighted objects and fuzzy memberships. We investigate the group- or object-aggregation-invariance properties possessed by the relevant functionals (effective number of groups or objects, centroids, dispersion, mutual object-group information, etc.). The classical squared Euclidean case can be generalized to non-Euclidean distances, as well as to non-linear transformations of the memberships, yielding the c-means clustering algorithm as well as two presumably new procedures, the convex and pairwise convex clustering. Cluster stability and aggregation-invariance of the optimal memberships associated to the various clustering schemes are examined as well.
Resumo:
Most hematopoietic stem cells (HSC) in the bone marrow reside in a quiescent state and occasionally enter the cell cycle upon cytokine-induced activation. Although the mechanisms regulating HSC quiescence and activation remain poorly defined, recent studies have revealed a role of lipid raft clustering (LRC) in HSC activation. Here, we tested the hypothesis that changes in lipid raft distribution could serve as an indicator of the quiescent and activated state of HSCs in response to putative niche signals. A semi-automated image analysis tool was developed to map the presence or absence of lipid raft clusters in live HSCs cultured for just one hour in serum-free medium supplemented with stem cell factor (SCF). By screening the ability of 19 protein candidates to alter lipid raft dynamics, we identified six factors that induced either a marked decrease (Wnt5a, Wnt3a and Osteopontin) or increase (IL3, IL6 and VEGF) in LRC. Cell cycle kinetics of single HSCs exposed to these factors revealed a correlation of LRC dynamics and proliferation kinetics: factors that decreased LRC slowed down cell cycle kinetics, while factors that increased LRC led to faster and more synchronous cycling. The possibility of identifying, by LRC analysis at very early time points, whether a stem cell is activated and possibly committed upon exposure to a signaling cue of interest could open up new avenues for large-scale screening efforts.
Resumo:
In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.
Resumo:
PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.
Resumo:
The World Health Organization (WHO) plans to submit the 11th revision of the International Classification of Diseases (ICD) to the World Health Assembly in 2018. The WHO is working toward a revised classification system that has an enhanced ability to capture health concepts in a manner that reflects current scientific evidence and that is compatible with contemporary information systems. In this paper, we present recommendations made to the WHO by the ICD revision's Quality and Safety Topic Advisory Group (Q&S TAG) for a new conceptual approach to capturing healthcare-related harms and injuries in ICD-coded data. The Q&S TAG has grouped causes of healthcare-related harm and injuries into four categories that relate to the source of the event: (a) medications and substances, (b) procedures, (c) devices and (d) other aspects of care. Under the proposed multiple coding approach, one of these sources of harm must be coded as part of a cluster of three codes to depict, respectively, a healthcare activity as a 'source' of harm, a 'mode or mechanism' of harm and a consequence of the event summarized by these codes (i.e. injury or harm). Use of this framework depends on the implementation of a new and potentially powerful code-clustering mechanism in ICD-11. This new framework for coding healthcare-related harm has great potential to improve the clinical detail of adverse event descriptions, and the overall quality of coded health data.
Resumo:
The long term goal of this research is to develop a program able to produce an automatic segmentation and categorization of textual sequences into discourse types. In this preliminary contribution, we present the construction of an algorithm which takes a segmented text as input and attempts to produce a categorization of sequences, such as narrative, argumentative, descriptive and so on. Also, this work aims at investigating a possible convergence between the typological approach developed in particular in the field of text and discourse analysis in French by Adam (2008) and Bronckart (1997) and unsupervised statistical learning.
Resumo:
This paper presents a semisupervised support vector machine (SVM) that integrates the information of both labeled and unlabeled pixels efficiently. Method's performance is illustrated in the relevant problem of very high resolution image classification of urban areas. The SVM is trained with the linear combination of two kernels: a base kernel working only with labeled examples is deformed by a likelihood kernel encoding similarities between labeled and unlabeled examples. Results obtained on very high resolution (VHR) multispectral and hyperspectral images show the relevance of the method in the context of urban image classification. Also, its simplicity and the few parameters involved make the method versatile and workable by unexperienced users.