811 resultados para hierarchical clustering
Resumo:
In this paper we study several natural and man-made complex phenomena in the perspective of dynamical systems. For each class of phenomena, the system outputs are time-series records obtained in identical conditions. The time-series are viewed as manifestations of the system behavior and are processed for analyzing the system dynamics. First, we use the Fourier transform to process the data and we approximate the amplitude spectra by means of power law functions. We interpret the power law parameters as a phenomenological signature of the system dynamics. Second, we adopt the techniques of non-hierarchical clustering and multidimensional scaling to visualize hidden relationships between the complex phenomena. Third, we propose a vector field based analogy to interpret the patterns unveiled by the PL parameters.
Resumo:
The last 40 years of the world economy are analyzed by means of computer visualization methods. Multidimensional scaling and the hierarchical clustering tree techniques are used. The current Western downturn in favor of Asian partners may still be reversed in the coming decades.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plants resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( = 280-400m), suggesting that besides the biological activities of those secondary metabolites, they also play a relevant role for the discrimination and classification of that complex matrix through bioinformatics tools. Finally, a series of machine learning approaches, e.g., partial least square-discriminant analysis (PLS-DA), k-Nearest Neighbors (kNN), and Decision Trees showed to be complementary to PCA and HCA, allowing to obtain relevant information as to the sample discrimination.
Resumo:
Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years, respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in 71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing patients (AUC = 0.993, p-value<0.05). Network analysis based on miRNA-target interactions curated by public databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast surgery.
Resumo:
Although Leontopodium alpinum is considered to be threatened in many countries, only limited scientific information about its autecology is available. In this study, we aim to define the most important ecological factors which influence the distribution of L. alpinum in the Swiss Alps. These were assessed at the national scale using species distribution models based on topoclimatic predictors and at the community scale using exhaustive plant inventories. The latter were analysed using hierarchical clustering and principal component analysis, and the results were interpreted using ecological indicator values. L. alpinum was found almost exclusively on base-rich bedrocks (limestone and ultramaphic rocks). The species distribution models showed that the available moisture (dry regions, mostly in the Inner Alps), elevation (mostly above 2000 m.a.s.l.) and slope (mostly >30°) were the most important predictors. The relevés showed that L. alpinum is present in a wide range of plant communities, all subalpine-alpine open grasslands, with a low grass cover. As a light-demanding and short species, L. alpinum requires light at ground level; hence, it can only grow in open, nutrient-poor grasslands. These conditions are met in dry conditions (dry, summer-warm climate, rocky and draining soil, south-facing aspect and/or steep slope), at high elevations, on oligotrophic soils and/or on windy ridges. Base-rich soils appear to also be essential, although it is still unclear if this corresponds to physiological or ecological (lower competition) requirements.
Resumo:
Previous microarray studies on breast cancer identified multiple tumour classes, of which the most prominent, named luminal and basal, differ in expression of the oestrogen receptor alpha gene (ER). We report here the identification of a group of breast tumours with increased androgen signalling and a 'molecular apocrine' gene expression profile. Tumour samples from 49 patients with large operable or locally advanced breast cancers were tested on Affymetrix U133A gene expression microarrays. Principal components analysis and hierarchical clustering split the tumours into three groups: basal, luminal and a group we call molecular apocrine. All of the molecular apocrine tumours have strong apocrine features on histological examination (P=0.0002). The molecular apocrine group is androgen receptor (AR) positive and contains all of the ER-negative tumours outside the basal group. Kolmogorov-Smirnov testing indicates that oestrogen signalling is most active in the luminal group, and androgen signalling is most active in the molecular apocrine group. ERBB2 amplification is commoner in the molecular apocrine than the other groups. Genes that best split the three groups were identified by Wilcoxon test. Correlation of the average expression profile of these genes in our data with the expression profile of individual tumours in four published breast cancer studies suggest that molecular apocrine tumours represent 8-14% of tumours in these studies. Our data show that it is possible with microarray data to divide mammary tumour cells into three groups based on steroid receptor activity: luminal (ER+ AR+), basal (ER- AR-) and molecular apocrine (ER- AR+).
Resumo:
Cape Verde is a tropical oceanic ecosystem, highly fragmented and dispersed, with islands physically isolated by distance and depth. To understand how isolation affects the ecological variability in this archipelago, we conducted a research project on the community structure of the 18 commercially most important demersal fishes. An index of ecological distance based on species relative dominance (Di) is developed from Catch Per Unit Effort, derived from an extensive database of artisanal fisheries. Two ecological measures of distance between islands are calculated: at the species level, DDi, and at the community level, DD (sum of DDi). A physical isolation factor (Idb) combining distance (d) and bathymetry (b) is proposed. Covariance analysis shows that isolation factor is positively correlated with both DDi and DD, suggesting that Idb can be considered as an ecological isolation factor. The effect of Idb varies with season and species. This effect is stronger in summer (May to November), than in winter (December to April), which appears to be more unstable. Species react differently to Idb, independently of season. A principal component analysis on the monthly (DDi) for the 12 islands and the 18 species, complemented by an agglomerative hierarchical clustering, shows a geographic pattern of island organization, according to Idb. Results indicate that the ecological structure of demersal fish communities of Cape Verde archipelago, both in time and space, can be explained by a geographic isolation factor. The analytical approach used here is promising and could be tested in other archipelago systems.
Resumo:
Hierarchical clustering is a popular method for finding structure in multivariate data,resulting in a binary tree constructed on the particular objects of the study, usually samplingunits. The user faces the decision where to cut the binary tree in order to determine the numberof clusters to interpret and there are various ad hoc rules for arriving at a decision. A simplepermutation test is presented that diagnoses whether non-random levels of clustering are presentin the set of objects and, if so, indicates the specific level at which the tree can be cut. The test isvalidated against random matrices to verify the type I error probability and a power study isperformed on data sets with known clusteredness to study the type II error.
Resumo:
Peatlands are soil environments that accumulate water and organic carbon and function as records of paleo-environmental changes. The variability in the composition of organic matter is reflected in their morphological, physical, and chemical properties. The aim of this study was to characterize these properties in peatlands from the headwaters of the Rio Araçuaí (Araçuaí River) in different stages of preservation. Two cores from peatlands with different vegetation types (moist grassland and semideciduous seasonal forest) from the Rio Preto [Preto River] headwaters (conservation area) and the Córrego Cachoeira dos Borges [Cachoeira dos Borges stream] (disturbed area) were sampled. Both are tributaries of the Rio Araçuaí. Samples were taken from layers of 15 cm, and morphological, physical, and chemical analyses were performed. The 14C age and δ13C values were determined in three samples from each core and the vertical growth and organic carbon accumulation rates were estimated. Dendrograms were constructed for each peatland by hierarchical clustering of similar layers with data from 34 parameters. The headwater peatlands of the Rio Araçuaí have a predominance of organic material in an advanced stage of decomposition and their soils are classified as Typic Haplosaprists. The organic matter in the Histosols of the peatlands of the headwaters of the Rio Araçuaí shows marked differences with respect to its morphological, physical, and chemical composition, as it is influenced by the type of vegetation that colonizes it. The peat from the headwaters of the Córrego Cachoeira dos Borges is in a more advanced stage of degradation than the peat from the Rio Preto, which highlights the urgent need for protection of these ecosystems/soil environments.
Resumo:
Microarray gene expression profiles of fresh clinical samples of chronic myeloid leukaemia in chronic phase, acute promyelocytic leukaemia and acute monocytic leukaemia were compared with profiles from cell lines representing the corresponding types of leukaemia (K562, NB4, HL60). In a hierarchical clustering analysis, all clinical samples clustered separately from the cell lines, regardless of leukaemic subtype. Gene ontology analysis showed that cell lines chiefly overexpressed genes related to macromolecular metabolism, whereas in clinical samples genes related to the immune response were abundantly expressed. These findings must be taken into consideration when conclusions from cell line-based studies are extrapolated to patients.
Resumo:
The ability to obtain gene expression profiles from human disease specimens provides an opportunity to identify relevant gene pathways, but is limited by the absence of data sets spanning a broad range of conditions. Here, we analyzed publicly available microarray data from 16 diverse skin conditions in order to gain insight into disease pathogenesis. Unsupervised hierarchical clustering separated samples by disease as well as common cellular and molecular pathways. Disease-specific signatures were leveraged to build a multi-disease classifier, which predicted the diagnosis of publicly and prospectively collected expression profiles with 93% accuracy. In one sample, the molecular classifier differed from the initial clinical diagnosis and correctly predicted the eventual diagnosis as the clinical presentation evolved. Finally, integration of IFN-regulated gene programs with the skin database revealed a significant inverse correlation between IFN-β and IFN-γ programs across all conditions. Our study provides an integrative approach to the study of gene signatures from multiple skin conditions, elucidating mechanisms of disease pathogenesis. In addition, these studies provide a framework for developing tools for personalized medicine toward the precise prediction, prevention, and treatment of disease on an individual level.
Resumo:
In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.
Resumo:
A new issue, once again a bouquet of attractive papers. First of all the paper by Droit-Dupré et al. (10.1007/s00428-015-1724-9). The group studied colonic adenocarcinomas, not otherwise specified, by immunohistochemistry for the expression of markers of intestinal epithelial cell differentiation. Hierarchical clustering analysis identified a major cluster of two thirds of the case series, expressing cytokeratin 20, CDX2 and MUC2 and invariably mismatch repair competent, which they called crypt-like. In stage III colon cancer, the crypt-like cluster had a better prognosis. The paper is a relatively simple example of what is happening in cancer classification beyond morphology: multiparameter differentiation and (epi)genomic markers defining new subtypes of cancer with potential clinical significance in clinical decision making.
Resumo:
The present study examines the repertory of liturgical chant known as St. Petersburg Court Chant which emerged within the Imperial Court of St. Petersburg, Russia, and appeared in print in a number of revisions during the course of the 19th century, eventually to spread throughout the Russian Empire and even abroad. The study seeks answers to questions on the essence and composition of Court Chant, its history and liturgical background, and most importantly, its musical relationship to other repertories of Eastern Slavic chant. The research questions emerge from previous literary accounts of Court Chant (summarized in the Introduction), which have tended to be inaccurate and generally not based on critical research. The study is divided into eight main chapters. Chapter 1 provides a survey of the history of Eastern Slavic chant and the Imperial Court Chapel of St. Petersburg until 1917, with special emphasis on the history of singing traditional chant in polyphony, the status of the Court Chapel as a government authority, and its endeavours in publishing church music. Chapter 2 deals with the liturgical background of Eastern chant, the chant genres, and main repertories of Eastern Slavic chant. Chapter 3 concentrates on chant sources: it introduces the musical notations utilised, after which a typology of chant books is presented. The discussion continues with a survey of the sources of Court Chant and their content, the specimens selected for closer analysis, the comparative materials from other repertories, and ends with a commentary on some chant sources that have been excluded. The comparative sources include a specimen from around the beginning of the 12th century, a few manuscripts from the 17th century, and printed and manuscript chant books from the early 18th to early 20th century, covering the geographical area that delimits to the western Ukraine, Astrakhan, Nizhny Novgorod, and the Solovetsky Monastery. Chapter 4 presents the approach and methods used in the subsequent analytical comparisons. After a survey of the pitch organization of Eastern Slavic chant, the customary harmonization strategy of traditional chant polyphony is examined, according to which a method for meaningful analysis of the harmony is proposed. The method is based on the observation that the harmonic framework of chant polyphony derives from the standard pitch collection of monodic chant known as the Church Gamut, specific pitches of which form eight harmonic regions that behave like the usual tonalities of major and harmonic minor. Because of the considerable quantity of comparative chant forms, computer-assisted statistical methods are applied to the analysis of chant melodies. The primary chant forms and their respective comparative forms have been pre-processed into reduced chant prototypes and divided into redactions. The analyses are carried out by measuring the formal dissimilarities of the primary chant forms of the Court Chant repertory against each comparative form, and also by measuring the reciprocal dissimilarities of all chant versions in a redaction, the results of which are subjected to agglomerative hierarchical clustering in order to find out how the chant forms relate to each other. The dissimilarities are determined by applying a metric dissimilarity function that is based on the Levenshtein Distance. Chapter 5 provides the melodic and harmonic analyses of generic chants (chants used for multiple texts of different lengths), i.e., chants for stichera samoglasny and troparia, Chapter 6 of pseudo-generic chants (chants that are used for multiple texts but with certain restrictions), i.e., chants for heirmoi, prokeimena, and three other hymns, and Chapter 7 of non-generic chants, covering nine chants that in the Court repertory are not shared by multiple texts. The results are summarized and evaluated in Chapter 8. Accordingly, it can be established that, contrary to previous conceptions, melodically, Court Chant is in effect a full part of the wider Eastern Slavic chant tradition. Even if it is somewhat detached from the chant versions of the Synodal square-note chant books and the local tradition of Moscow, it is particularly close to chant forms of East Ukraine and some vernacular repertories from Russia. Respectively, the harmonization strategies of Court Chant do not show significant individuality in comparison with those of the available polyphonic comparative sources, the main difference being the part-writing, which generally conforms to western common practice standard, whereas the deviations from this tend to be more significant in other analysed repertories of polyphonic chant. Thus, insofar as the subsequent prevalence of Court Chant is not based on its forceful dissemination by authorities (as suggested in previous literature but for which little tangible evidence could be found in Chapter 1), in the present author’s interpretation, Court Chant attained its dominance principally because musically it was considered sufficiently traditional, and as a chant body supported by the government, was conveniently available in print in serviceable harmonizations.