808 resultados para Agglomerative Hierarchical Clustering
Resumo:
Although Leontopodium alpinum is considered to be threatened in many countries, only limited scientific information about its autecology is available. In this study, we aim to define the most important ecological factors which influence the distribution of L. alpinum in the Swiss Alps. These were assessed at the national scale using species distribution models based on topoclimatic predictors and at the community scale using exhaustive plant inventories. The latter were analysed using hierarchical clustering and principal component analysis, and the results were interpreted using ecological indicator values. L. alpinum was found almost exclusively on base-rich bedrocks (limestone and ultramaphic rocks). The species distribution models showed that the available moisture (dry regions, mostly in the Inner Alps), elevation (mostly above 2000 m.a.s.l.) and slope (mostly >30°) were the most important predictors. The relevés showed that L. alpinum is present in a wide range of plant communities, all subalpine-alpine open grasslands, with a low grass cover. As a light-demanding and short species, L. alpinum requires light at ground level; hence, it can only grow in open, nutrient-poor grasslands. These conditions are met in dry conditions (dry, summer-warm climate, rocky and draining soil, south-facing aspect and/or steep slope), at high elevations, on oligotrophic soils and/or on windy ridges. Base-rich soils appear to also be essential, although it is still unclear if this corresponds to physiological or ecological (lower competition) requirements.
Resumo:
Previous microarray studies on breast cancer identified multiple tumour classes, of which the most prominent, named luminal and basal, differ in expression of the oestrogen receptor alpha gene (ER). We report here the identification of a group of breast tumours with increased androgen signalling and a 'molecular apocrine' gene expression profile. Tumour samples from 49 patients with large operable or locally advanced breast cancers were tested on Affymetrix U133A gene expression microarrays. Principal components analysis and hierarchical clustering split the tumours into three groups: basal, luminal and a group we call molecular apocrine. All of the molecular apocrine tumours have strong apocrine features on histological examination (P=0.0002). The molecular apocrine group is androgen receptor (AR) positive and contains all of the ER-negative tumours outside the basal group. Kolmogorov-Smirnov testing indicates that oestrogen signalling is most active in the luminal group, and androgen signalling is most active in the molecular apocrine group. ERBB2 amplification is commoner in the molecular apocrine than the other groups. Genes that best split the three groups were identified by Wilcoxon test. Correlation of the average expression profile of these genes in our data with the expression profile of individual tumours in four published breast cancer studies suggest that molecular apocrine tumours represent 8-14% of tumours in these studies. Our data show that it is possible with microarray data to divide mammary tumour cells into three groups based on steroid receptor activity: luminal (ER+ AR+), basal (ER- AR-) and molecular apocrine (ER- AR+).
Resumo:
Hierarchical clustering is a popular method for finding structure in multivariate data,resulting in a binary tree constructed on the particular objects of the study, usually samplingunits. The user faces the decision where to cut the binary tree in order to determine the numberof clusters to interpret and there are various ad hoc rules for arriving at a decision. A simplepermutation test is presented that diagnoses whether non-random levels of clustering are presentin the set of objects and, if so, indicates the specific level at which the tree can be cut. The test isvalidated against random matrices to verify the type I error probability and a power study isperformed on data sets with known clusteredness to study the type II error.
Resumo:
Peatlands are soil environments that accumulate water and organic carbon and function as records of paleo-environmental changes. The variability in the composition of organic matter is reflected in their morphological, physical, and chemical properties. The aim of this study was to characterize these properties in peatlands from the headwaters of the Rio Araçuaí (Araçuaí River) in different stages of preservation. Two cores from peatlands with different vegetation types (moist grassland and semideciduous seasonal forest) from the Rio Preto [Preto River] headwaters (conservation area) and the Córrego Cachoeira dos Borges [Cachoeira dos Borges stream] (disturbed area) were sampled. Both are tributaries of the Rio Araçuaí. Samples were taken from layers of 15 cm, and morphological, physical, and chemical analyses were performed. The 14C age and δ13C values were determined in three samples from each core and the vertical growth and organic carbon accumulation rates were estimated. Dendrograms were constructed for each peatland by hierarchical clustering of similar layers with data from 34 parameters. The headwater peatlands of the Rio Araçuaí have a predominance of organic material in an advanced stage of decomposition and their soils are classified as Typic Haplosaprists. The organic matter in the Histosols of the peatlands of the headwaters of the Rio Araçuaí shows marked differences with respect to its morphological, physical, and chemical composition, as it is influenced by the type of vegetation that colonizes it. The peat from the headwaters of the Córrego Cachoeira dos Borges is in a more advanced stage of degradation than the peat from the Rio Preto, which highlights the urgent need for protection of these ecosystems/soil environments.
Resumo:
This paper examines a dataset that derives from an observational tracking, in order to analyze where and how middle-class working families spend time at home. We use an ethnographic approach to study the everyday lives of Italian dual-income middle-class families, with the aim to analyze quantitatively the use of home spaces and the types of activities of family members on weekday afternoons and evenings. The different analyses (multiple correspondence analysis, agglomerative hierarchical cluster, discriminant analysis) show how particular spaces and activities in these spaces are dominated by certain family members. We suggest a combination of qualitative and quantitative methodologies as useful tools to explore in detail the everyday lives of families, and to understand how family members use the domestic spaces. In particular, we consider relevant the use of quantitative analyses to examine ethnographic data, especially in connection with the methodological reflexivity among researchers
Resumo:
Microarray gene expression profiles of fresh clinical samples of chronic myeloid leukaemia in chronic phase, acute promyelocytic leukaemia and acute monocytic leukaemia were compared with profiles from cell lines representing the corresponding types of leukaemia (K562, NB4, HL60). In a hierarchical clustering analysis, all clinical samples clustered separately from the cell lines, regardless of leukaemic subtype. Gene ontology analysis showed that cell lines chiefly overexpressed genes related to macromolecular metabolism, whereas in clinical samples genes related to the immune response were abundantly expressed. These findings must be taken into consideration when conclusions from cell line-based studies are extrapolated to patients.
Resumo:
The ability to obtain gene expression profiles from human disease specimens provides an opportunity to identify relevant gene pathways, but is limited by the absence of data sets spanning a broad range of conditions. Here, we analyzed publicly available microarray data from 16 diverse skin conditions in order to gain insight into disease pathogenesis. Unsupervised hierarchical clustering separated samples by disease as well as common cellular and molecular pathways. Disease-specific signatures were leveraged to build a multi-disease classifier, which predicted the diagnosis of publicly and prospectively collected expression profiles with 93% accuracy. In one sample, the molecular classifier differed from the initial clinical diagnosis and correctly predicted the eventual diagnosis as the clinical presentation evolved. Finally, integration of IFN-regulated gene programs with the skin database revealed a significant inverse correlation between IFN-β and IFN-γ programs across all conditions. Our study provides an integrative approach to the study of gene signatures from multiple skin conditions, elucidating mechanisms of disease pathogenesis. In addition, these studies provide a framework for developing tools for personalized medicine toward the precise prediction, prevention, and treatment of disease on an individual level.
Resumo:
In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.
Resumo:
A new issue, once again a bouquet of attractive papers. First of all the paper by Droit-Dupré et al. (10.1007/s00428-015-1724-9). The group studied colonic adenocarcinomas, not otherwise specified, by immunohistochemistry for the expression of markers of intestinal epithelial cell differentiation. Hierarchical clustering analysis identified a major cluster of two thirds of the case series, expressing cytokeratin 20, CDX2 and MUC2 and invariably mismatch repair competent, which they called crypt-like. In stage III colon cancer, the crypt-like cluster had a better prognosis. The paper is a relatively simple example of what is happening in cancer classification beyond morphology: multiparameter differentiation and (epi)genomic markers defining new subtypes of cancer with potential clinical significance in clinical decision making.
Resumo:
Dans une turbine hydraulique, la rotation des aubes dans l’eau crée une zone de basse pression, amenant l’eau à passer de l’état liquide à l’état gazeux. Ce phénomène de changement de phase est appelé cavitation et est similaire à l’ébullition. Lorsque les cavités de vapeur formées implosent près des parois, il en résulte une érosion sévère des matériaux, accélérant de façon importante la dégradation de la turbine. Un système de détection de l’érosion de cavitation à l’aide de mesures vibratoires, employable sur les turbines en opération, a donc été installé sur quatre groupes turbine-alternateur d’une centrale et permet d’estimer précisément le taux d’érosion en kg/ 10 000 h. Le présent projet vise à répondre à deux objectifs principaux. Premièrement, étudier le comportement de la cavitation sur un groupe turbine-alternateur cible et construire un modèle statistique, dans le but de prédire la variable cavitation en fonction des variables opératoires (tels l’ouverture de vannage, le débit, les niveaux amont et aval, etc.). Deuxièmement, élaborer une méthodologie permettant la reproductibilité de l’étude à d’autres sites. Une étude rétrospective sera effectuée et on se concentrera sur les données disponibles depuis la mise à jour du système en 2010. Des résultats préliminaires ont mis en évidence l’hétérogénéité du comportement de cavitation ainsi que des changements entre la relation entre la cavitation et diverses variables opératoires. Nous nous proposons de développer un modèle probabiliste adapté, en utilisant notamment le regroupement hiérarchique et des modèles de régression linéaire multiple.
Resumo:
Besides the spinal deformity, scoliosis modifies notably the general appearance of the trunk resulting in trunk rotation, imbalance, and asymmetries that constitutes patients' major concern. Existing classifications of scoliosis, based on the type of spinal curve as depicted on radiographs, are currently used to guide treatment strategies. Unfortunately, even though a perfect correction of the spinal curve is achieved, some trunk deformities remain, making patients dissatisfied with the treatment received. The purpose of this study is to identify possible shape patterns of trunk surface deformity associated with scoliosis. First, trunk surface is represented by a multivariate functional trunk shape descriptor based on 3-D clinical measurements computed on cross sections of the trunk. Then, the classical formulation of hierarchical clustering is adapted to the case of multivariate functional data and applied to a set of 236 trunk surface 3-D reconstructions. The highest internal validity is obtained when considering 11 clusters that explain up to 65% of the variance in our dataset. Our clustering result shows a concordance with the radiographic classification of spinal curves in 68% of the cases. As opposed to radiographic evaluation, the trunk descriptor is 3-D and its functional nature offers a compact and elegant description of not only the type, but also the severity and extent of the trunk surface deformity along the trunk length. In future work, new management strategies based on the resulting trunk shape patterns could be thought of in order to improve the esthetic outcome after treatment, and thus patients satisfaction.
Resumo:
The diversity of social bees was assessed at 15 sites across five locations of the Nilgiri Biosphere Reserve, Western Ghats, India, from January to December 2007. We also conducted floristic analyses of local vegetation in each site using one-hectare sample plots. All woody species with a dbh (diameter at breast height) : 30 cm were recorded within the plots. A total area of 9.72 ha was assessed for floristic composition. Similarity of floristic composition between sites was determined using the Jaccard's distance measure and a dendrogram constructed based on the hierarchical clustering of floristic dissimilarities between sites. A Bee Importance Index (BII) was developed to give a measure of the bee diversity at each site. This index was a sum of the species richness of bee species in a site and their visitation frequencies to flowers, calculated as mean flower visits hour 1 within 2 focal patches within one hectare plots. The visits of bee species to flowers were also recorded. The Jaccard distance measure indicated that the montane sites were quite dissimilar to the low elevation sites in floristic diversity. The BII was 7-9 for the wet forest sites and ranged from 4-6 for drier forest sites. Seventy three plant species were identified as social bee plants and of them 45% were visited by one species of bee, 37% by two bee species and 18% by more than two bee species, indicating a certain degree of floral specialization among bees.
Resumo:
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.
Resumo:
The polar winter stratospheric vortex is a coherent structure that undergoes different types of deformation that can be revealed by the geometric invariant moments. Three moments are used—the aspect ratio, the centroid latitude, and the area of the vortex based on stratospheric data from the 40-yr ECMWF Re-Analysis (ERA-40) project—to study sudden stratospheric warmings. Hierarchical clustering combined with data image visualization techniques is used as well. Using the gap statistic, three optimal clusters are obtained based on the three geometric moments considered here. The 850-K potential vorticity field, as well as the vertical profiles of polar temperature and zonal wind, provides evidence that the clusters represent, respectively, the undisturbed (U), displaced (D), and split (S) states of the polar vortex. This systematic method for identifying and characterizing the state of the polar vortex using objective methods is useful as a tool for analyzing observations and as a test for climate models to simulate the observations. The method correctly identifies all previously identified major warmings and also identifies significant minor warmings where the atmosphere is substantially disturbed but does not quite meet the criteria to qualify as a major stratospheric warming.
Resumo:
The presence of 10 virulence genes was examined using polymerase chain reaction (PCR) in 365 European O157 and non-O157 Escherichia coli isolates associated with verotoxin production. Strain-specific PCR data were analysed using hierarchical clustering. The resulting dendrogram clearly separated O157 from non-O157 strains. The former clustered typical high-risk seropathotype (SPT) A strains from all regions, including Sweden and Spain, which were homogenous by Cramer's V statistic, and strains with less typical O157 features mostly from Hungary. The non-O157 strains divided into a high-risk SPTB harbouring O26, O111 and O103 strains, a group pathogenic to pigs, and a group with few virulence genes other than for verotoxin. The data demonstrate SPT designation and selected PCR separated verotoxigenic E. coli of high and low risk to humans; although more virulence genes or pulsed-field gel electrophoresis will need to be included to separate high-risk strains further for epidemiological tracing.