795 resultados para label hierarchical clustering
Resumo:
This paper studies the statistical distributions of worldwide earthquakes from year 1963 up to year 2012. A Cartesian grid, dividing Earth into geographic regions, is considered. Entropy and the Jensen–Shannon divergence are used to analyze and compare real-world data. Hierarchical clustering and multi-dimensional scaling techniques are adopted for data visualization. Entropy-based indices have the advantage of leading to a single parameter expressing the relationships between the seismic data. Classical and generalized (fractional) entropy and Jensen–Shannon divergence are tested. The generalized measures lead to a clear identification of patterns embedded in the data and contribute to better understand earthquake distributions.
Resumo:
Proceeding of the 3rd International Conference on Fractional Systems and Signals, at Ghent, Belgium
Resumo:
In this paper we study several natural and man-made complex phenomena in the perspective of dynamical systems. For each class of phenomena, the system outputs are time-series records obtained in identical conditions. The time-series are viewed as manifestations of the system behavior and are processed for analyzing the system dynamics. First, we use the Fourier transform to process the data and we approximate the amplitude spectra by means of power law functions. We interpret the power law parameters as a phenomenological signature of the system dynamics. Second, we adopt the techniques of non-hierarchical clustering and multidimensional scaling to visualize hidden relationships between the complex phenomena. Third, we propose a vector field based analogy to interpret the patterns unveiled by the PL parameters.
Resumo:
The last 40 years of the world economy are analyzed by means of computer visualization methods. Multidimensional scaling and the hierarchical clustering tree techniques are used. The current Western downturn in favor of Asian partners may still be reversed in the coming decades.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plants resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( = 280-400m), suggesting that besides the biological activities of those secondary metabolites, they also play a relevant role for the discrimination and classification of that complex matrix through bioinformatics tools. Finally, a series of machine learning approaches, e.g., partial least square-discriminant analysis (PLS-DA), k-Nearest Neighbors (kNN), and Decision Trees showed to be complementary to PCA and HCA, allowing to obtain relevant information as to the sample discrimination.
Resumo:
Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years, respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in 71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing patients (AUC = 0.993, p-value<0.05). Network analysis based on miRNA-target interactions curated by public databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast surgery.
Resumo:
Although Leontopodium alpinum is considered to be threatened in many countries, only limited scientific information about its autecology is available. In this study, we aim to define the most important ecological factors which influence the distribution of L. alpinum in the Swiss Alps. These were assessed at the national scale using species distribution models based on topoclimatic predictors and at the community scale using exhaustive plant inventories. The latter were analysed using hierarchical clustering and principal component analysis, and the results were interpreted using ecological indicator values. L. alpinum was found almost exclusively on base-rich bedrocks (limestone and ultramaphic rocks). The species distribution models showed that the available moisture (dry regions, mostly in the Inner Alps), elevation (mostly above 2000 m.a.s.l.) and slope (mostly >30°) were the most important predictors. The relevés showed that L. alpinum is present in a wide range of plant communities, all subalpine-alpine open grasslands, with a low grass cover. As a light-demanding and short species, L. alpinum requires light at ground level; hence, it can only grow in open, nutrient-poor grasslands. These conditions are met in dry conditions (dry, summer-warm climate, rocky and draining soil, south-facing aspect and/or steep slope), at high elevations, on oligotrophic soils and/or on windy ridges. Base-rich soils appear to also be essential, although it is still unclear if this corresponds to physiological or ecological (lower competition) requirements.
Resumo:
Previous microarray studies on breast cancer identified multiple tumour classes, of which the most prominent, named luminal and basal, differ in expression of the oestrogen receptor alpha gene (ER). We report here the identification of a group of breast tumours with increased androgen signalling and a 'molecular apocrine' gene expression profile. Tumour samples from 49 patients with large operable or locally advanced breast cancers were tested on Affymetrix U133A gene expression microarrays. Principal components analysis and hierarchical clustering split the tumours into three groups: basal, luminal and a group we call molecular apocrine. All of the molecular apocrine tumours have strong apocrine features on histological examination (P=0.0002). The molecular apocrine group is androgen receptor (AR) positive and contains all of the ER-negative tumours outside the basal group. Kolmogorov-Smirnov testing indicates that oestrogen signalling is most active in the luminal group, and androgen signalling is most active in the molecular apocrine group. ERBB2 amplification is commoner in the molecular apocrine than the other groups. Genes that best split the three groups were identified by Wilcoxon test. Correlation of the average expression profile of these genes in our data with the expression profile of individual tumours in four published breast cancer studies suggest that molecular apocrine tumours represent 8-14% of tumours in these studies. Our data show that it is possible with microarray data to divide mammary tumour cells into three groups based on steroid receptor activity: luminal (ER+ AR+), basal (ER- AR-) and molecular apocrine (ER- AR+).
Resumo:
Cape Verde is a tropical oceanic ecosystem, highly fragmented and dispersed, with islands physically isolated by distance and depth. To understand how isolation affects the ecological variability in this archipelago, we conducted a research project on the community structure of the 18 commercially most important demersal fishes. An index of ecological distance based on species relative dominance (Di) is developed from Catch Per Unit Effort, derived from an extensive database of artisanal fisheries. Two ecological measures of distance between islands are calculated: at the species level, DDi, and at the community level, DD (sum of DDi). A physical isolation factor (Idb) combining distance (d) and bathymetry (b) is proposed. Covariance analysis shows that isolation factor is positively correlated with both DDi and DD, suggesting that Idb can be considered as an ecological isolation factor. The effect of Idb varies with season and species. This effect is stronger in summer (May to November), than in winter (December to April), which appears to be more unstable. Species react differently to Idb, independently of season. A principal component analysis on the monthly (DDi) for the 12 islands and the 18 species, complemented by an agglomerative hierarchical clustering, shows a geographic pattern of island organization, according to Idb. Results indicate that the ecological structure of demersal fish communities of Cape Verde archipelago, both in time and space, can be explained by a geographic isolation factor. The analytical approach used here is promising and could be tested in other archipelago systems.
Resumo:
Hierarchical clustering is a popular method for finding structure in multivariate data,resulting in a binary tree constructed on the particular objects of the study, usually samplingunits. The user faces the decision where to cut the binary tree in order to determine the numberof clusters to interpret and there are various ad hoc rules for arriving at a decision. A simplepermutation test is presented that diagnoses whether non-random levels of clustering are presentin the set of objects and, if so, indicates the specific level at which the tree can be cut. The test isvalidated against random matrices to verify the type I error probability and a power study isperformed on data sets with known clusteredness to study the type II error.
Resumo:
Peatlands are soil environments that accumulate water and organic carbon and function as records of paleo-environmental changes. The variability in the composition of organic matter is reflected in their morphological, physical, and chemical properties. The aim of this study was to characterize these properties in peatlands from the headwaters of the Rio Araçuaí (Araçuaí River) in different stages of preservation. Two cores from peatlands with different vegetation types (moist grassland and semideciduous seasonal forest) from the Rio Preto [Preto River] headwaters (conservation area) and the Córrego Cachoeira dos Borges [Cachoeira dos Borges stream] (disturbed area) were sampled. Both are tributaries of the Rio Araçuaí. Samples were taken from layers of 15 cm, and morphological, physical, and chemical analyses were performed. The 14C age and δ13C values were determined in three samples from each core and the vertical growth and organic carbon accumulation rates were estimated. Dendrograms were constructed for each peatland by hierarchical clustering of similar layers with data from 34 parameters. The headwater peatlands of the Rio Araçuaí have a predominance of organic material in an advanced stage of decomposition and their soils are classified as Typic Haplosaprists. The organic matter in the Histosols of the peatlands of the headwaters of the Rio Araçuaí shows marked differences with respect to its morphological, physical, and chemical composition, as it is influenced by the type of vegetation that colonizes it. The peat from the headwaters of the Córrego Cachoeira dos Borges is in a more advanced stage of degradation than the peat from the Rio Preto, which highlights the urgent need for protection of these ecosystems/soil environments.
Resumo:
Microarray gene expression profiles of fresh clinical samples of chronic myeloid leukaemia in chronic phase, acute promyelocytic leukaemia and acute monocytic leukaemia were compared with profiles from cell lines representing the corresponding types of leukaemia (K562, NB4, HL60). In a hierarchical clustering analysis, all clinical samples clustered separately from the cell lines, regardless of leukaemic subtype. Gene ontology analysis showed that cell lines chiefly overexpressed genes related to macromolecular metabolism, whereas in clinical samples genes related to the immune response were abundantly expressed. These findings must be taken into consideration when conclusions from cell line-based studies are extrapolated to patients.
Resumo:
The ability to obtain gene expression profiles from human disease specimens provides an opportunity to identify relevant gene pathways, but is limited by the absence of data sets spanning a broad range of conditions. Here, we analyzed publicly available microarray data from 16 diverse skin conditions in order to gain insight into disease pathogenesis. Unsupervised hierarchical clustering separated samples by disease as well as common cellular and molecular pathways. Disease-specific signatures were leveraged to build a multi-disease classifier, which predicted the diagnosis of publicly and prospectively collected expression profiles with 93% accuracy. In one sample, the molecular classifier differed from the initial clinical diagnosis and correctly predicted the eventual diagnosis as the clinical presentation evolved. Finally, integration of IFN-regulated gene programs with the skin database revealed a significant inverse correlation between IFN-β and IFN-γ programs across all conditions. Our study provides an integrative approach to the study of gene signatures from multiple skin conditions, elucidating mechanisms of disease pathogenesis. In addition, these studies provide a framework for developing tools for personalized medicine toward the precise prediction, prevention, and treatment of disease on an individual level.
Resumo:
A new issue, once again a bouquet of attractive papers. First of all the paper by Droit-Dupré et al. (10.1007/s00428-015-1724-9). The group studied colonic adenocarcinomas, not otherwise specified, by immunohistochemistry for the expression of markers of intestinal epithelial cell differentiation. Hierarchical clustering analysis identified a major cluster of two thirds of the case series, expressing cytokeratin 20, CDX2 and MUC2 and invariably mismatch repair competent, which they called crypt-like. In stage III colon cancer, the crypt-like cluster had a better prognosis. The paper is a relatively simple example of what is happening in cancer classification beyond morphology: multiparameter differentiation and (epi)genomic markers defining new subtypes of cancer with potential clinical significance in clinical decision making.