35 resultados para Agglomerative Hierarchical Clustering
Resumo:
Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.
Resumo:
Macro- and microarrays are well-established technologies to determine gene functions through repeated measurements of transcript abundance. We constructed a chicken skeletal muscle-associated array based on a muscle-specific EST database, which was used to generate a tissue expression dataset of similar to 4500 chicken genes across 5 adult tissues (skeletal muscle, heart, liver, brain, and skin). Only a small number of ESTs were sufficiently well characterized by BLAST searches to determine their probable cellular functions. Evidence of a particular tissue-characteristic expression can be considered an indication that the transcript is likely to be functionally significant. The skeletal muscle macroarray platform was first used to search for evidence of tissue-specific expression, focusing on the biological function of genes/transcripts, since gene expression profiles generated across tissues were found to be reliable and consistent. Hierarchical clustering analysis revealed consistent clustering among genes assigned to 'developmental growth', such as the ontology genes and germ layers. Accuracy of the expression data was supported by comparing information from known transcripts and tissue from which the transcript was derived with macroarray data. Hybridization assays resulted in consistent tissue expression profile, which will be useful to dissect tissue-regulatory networks and to predict functions of novel genes identified after extensive sequencing of the genomes of model organisms. Screening our skeletal-muscle platform using 5 chicken adult tissues allowed us identifying 43 'tissue-specific' transcripts, and 112 co-expressed uncharacterized transcripts with 62 putative motifs. This platform also represents an important tool for functional investigation of novel genes; to determine expression pattern according to developmental stages; to evaluate differences in muscular growth potential between chicken lines, and to identify tissue-specific genes.
Resumo:
Background: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results: This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion: These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/.S3T source code and datasets can also be downloaded from the aforementioned website.
Resumo:
This paper analyses the presence of financial constraint in the investment decisions of 367 Brazilian firms from 1997 to 2004, using a Bayesian econometric model with group-varying parameters. The motivation for this paper is the use of clustering techniques to group firms in a totally endogenous form. In order to classify the firms we used a hybrid clustering method, that is, hierarchical and non-hierarchical clustering techniques jointly. To estimate the parameters a Bayesian approach was considered. Prior distributions were assumed for the parameters, classifying the model in random or fixed effects. Ordinate predictive density criterion was used to select the model providing a better prediction. We tested thirty models and the better prediction considers the presence of 2 groups in the sample, assuming the fixed effect model with a Student t distribution with 20 degrees of freedom for the error. The results indicate robustness in the identification of financial constraint when the firms are classified by the clustering techniques. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In breast cancer patients, primary chemotherapy is associated with the same survival benefits as adjuvant chemotherapy. Residual tumors represent a clinical challenge, Lis they may be resistant to additional cycles of the same drugs. Our aim was to identify differential transcripts expressed in residual tumors, after neoadjuvant chemotherapy, that might be related with tumor resistance. Hence, 16 patients with paired tumor samples, collected before and after treatment (4 cycles doxorubicin/cyclophosphamide, AC) had their gene expression evaluated on cDNA microarray slides containing 4,608 genes. Three hundred and eighty-nine genes were differentially expressed (paired Student`s t-test, pFDR<0.01) between pre- and post-chemotherapy samples and among the regulated functions were the JNK cascade and cell death. Unsupervised hierarchical clustering identified one branch comprising exclusively, eight pre-chemotherapy samples and another branch, including the former correspondent eight post-chemotherapy samples and other 16 paired pre/post-chemotherapy samples. No differences in clinical and tumor parameters could explain this clustering. Another group of I I patients with paired samples had expression of selected genes determined by real-time RT-PCR and CTGF and DUSP1 were confirmed more expressed in post- as compared to pre-chemotherapy samples. After neoadjuvant chemotherapy some residual samples may retain their molecular signature while others present significant changes in their gene expression, probably induced by the treatment. CTGF and DUSP1 overexpression in residual samples may be a reflection of resistance to further administration of AC regimen.
Resumo:
Objective. To explore the relationship between biomarkers of pulmonary arterial hypertension (PAH), interferon (IFN)-regulated gene expression, and the alternative activation pathway in systemic sclerosis (SSc). Methods. Peripheral blood mononuclear cells (PBMCs) were purified from healthy controls, patients with idiopathic PAH, and SSc patients (classified as having diffuse cutaneous SSc, limited cutaneous SSc [lcSSc] without PAH, and lcSSc with PAH). IFN-regulated and ""PAH biomarker"" genes were compared after supervised hierarchical clustering. Messenger RNA levels of selected IFN-regulated genes (Siglec1 and MX1), biomarker genes (IL13RA1, CCR1, and JAK2), and the alternative activation marker gene (MRC1) were analyzed on PBMCs and on CD14- and CD14+ cell populations. Interleukin-13 (IL-13) and IL-4 concentrations were measured in plasma by immunoassay. CD14, MRC1, and IL13RA1 surface expression was analyzed by flow cytometry. Results. Increased PBMC expression of both IFN-regulated and biomarker genes distinguished SSc patients from healthy controls. Expression of genes in the biomarker cluster, but not in the IFN-regulated cluster, distinguished lcSSc with PAH from lcSSc without PAH. The genes CCR1 (P < 0.001) and JAK2 (P < 0.001) were expressed more highly in lcSSc patients with PAH compared with controls and mainly by CD14+ cells. MRC1 expression was increased exclusively in lcSSc patients with PAH (P < 0.001) and correlated strongly with pulmonary artery pressure (r = 0.52, P = 0.03) and higher mortality (P = 0.02). MRC1 expression was higher in CD14+ cells and was greatly increased by stimulation with IL-13. IL-13 concentrations in plasma were most highly increased in lcSSc patients with PAH (P < 0.001). Conclusion. IFN-regulated and biomarker genes represent distinct, although related, clusters in lcSSc patients with PAH. MRC1, a marker for the effect of IL-13 on alternative monocyte/macrophage activation, is associated with this severe complication and is related to mortality.
Resumo:
The expression of peripheral tissue antigens (PTAs) in the thymus by medullary thymic epithelial cells (mTECs) is essential for the central self-tolerance in the generation of the T cell repertoire. Due to heterogeneity of autoantigen representation, this phenomenon has been termed promiscuous gene expression (PGE), in which the autoimmune regulator (Aire) gene plays a key role as a transcription factor in part of these genes. Here we used a microarray strategy to access PGE in cultured murine CD80(+) 3.10 mTEC line. Hierarchical clustering of the data allowed observation that PTA genes were differentially expressed being possible to found their respective induced or repressed mRNAs. To further investigate the control of PGE, we tested the hypothesis that genes involved in this phenomenon might also be modulated by transcriptional network. We then reconstructed such network based on the microarray expression data, featuring the guanylate cyclase 2d (Gucy2d) gene as a main node. In such condition, we established 167 positive and negative interactions with downstream PTA genes. Silencing Aire by RNA interference, Gucy2d while down regulated established a larger number (355) of interactions with PTA genes. T- and G-boxes corresponding to AIRE protein binding sites located upstream to ATG codon of Gucy2d supports this effect. These findings provide evidence that Aire plays a role in association with Gucy2d, which is connected to Several PTA genes and establishes a cascade-like transcriptional control of promiscuous gene expression in mTEC cells. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Urinary bladder cancer is the fourth most common malignancy in the Western world. Transitional cell carcinoma (TCC) is the most common subtype, accounting for about 90% of all bladder cancers. The TP53 gene plays an essential role in the regulation of the cell cycle and apoptosis and therefore contributes to cellular transformation and malignancy; however, little is known about the differential gene expression patterns in human tumors that present with the wild-type or mutated TP53 gene. Therefore, because gene profiling can provide new insights into the molecular biology of bladder cancer, the present study aimed to compare the molecular profiles of bladder cancer cell lines with different TP53 alleles, including the wild type (RT4) and two mutants (5637, with mutations in codons 280 and 72; and T24, a TP53 allele encoding an in-frame deletion of tyrosine 126). Unsupervised hierarchical clustering and gene networks were constructed based on data generated by cDNA microarrays using mRNA from the three cell lines. Differentially expressed genes related to the cell cycle, cell division, cell death, and cell proliferation were observed in the three cell lines. However, the cDNA microarray data did not cluster cell lines based on their TP53 allele. The gene profiles of the RT4 cells were more similar to those of T24 than to those of the 5637 cells. While the deregulation of both the cell cycle and the apoptotic pathways was particularly related to TCC, these alterations were not associated with the TP53 status.
Resumo:
Mesenchymal stem cells (MSC) are multipotent cells which can be obtained from several adult and fetal tissues including human umbilical cord units. We have recently shown that umbilical cord tissue (UC) is richer in MSC than umbilical cord blood (UCB) but their origin and characteristics in blood as compared to the cord remains unknown. Here we compared, for the first time, the exonic protein-coding and intronic noncoding RNA (ncRNA) expression profiles of MSC from match-paired UC and UCB samples, harvested from the same donors, processed simultaneously and under the same culture conditions. The patterns of intronic ncRNA expression in MSC from UC and UCB paired units were highly similar, indicative of their common donor origin. The respective exonic protein-coding transcript expression profiles, however, were significantly different. Hierarchical clustering based on protein-coding expression similarities grouped MSC according to their tissue location rather than original donor. Genes related to systems development, osteogenesis and immune system were expressed at higher levels in UCB, whereas genes related to cell adhesion, morphogenesis, secretion, angiogenesis and neurogenesis were more expressed in UC cells. These molecular differences verified in tissue-specific MSC gene expression may reflect functional activities influenced by distinct niches and should be considered when developing clinical protocols involving MSC from different sources. In addition, these findings reinforce our previous suggestion on the importance of banking the whole umbilical cord unit for research or future therapeutic use.
Resumo:
Differently from theoretical scale-free networks, most real networks present multi-scale behavior, with nodes structured in different types of functional groups and communities. While the majority of approaches for classification of nodes in a complex network has relied on local measurements of the topology/connectivity around each node, valuable information about node functionality can be obtained by concentric (or hierarchical) measurements. This paper extends previous methodologies based on concentric measurements, by studying the possibility of using agglomerative clustering methods, in order to obtain a set of functional groups of nodes, considering particular institutional collaboration network nodes, including various known communities (departments of the University of Sao Paulo). Among the interesting obtained findings, we emphasize the scale-free nature of the network obtained, as well as identification of different patterns of authorship emerging from different areas (e.g. human and exact sciences). Another interesting result concerns the relatively uniform distribution of hubs along concentric levels, contrariwise to the non-uniform pattern found in theoretical scale-free networks such as the BA model. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
In the southern region of Mato Grosso do Sul state, Brazil, a foot-and-mouth disease (FMD) epidemic started in September 2005. A total of 33 outbreaks were detected and 33,741 FMD-susceptible animals were slaughtered and destroyed. There were no reports of FMD cases in other species than bovines. Based on the data of this epidemic, it was carried out an analysis using the K-function and it was observed spatial clustering of outbreaks within a range of 25km. This observation may be related to the dynamics of foot-and-mouth disease spread and to the measures undertaken to control the disease dissemination. The control measures were effective once the disease did not spread to farms more than 47 km apart from the initial outbreaks.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
Biological neuronal networks constitute a special class of dynamical systems, as they are formed by individual geometrical components, namely the neurons. In the existing literature, relatively little attention has been given to the influence of neuron shape on the overall connectivity and dynamics of the emerging networks. The current work addresses this issue by considering simplified neuronal shapes consisting of circular regions (soma/axons) with spokes (dendrites). Networks are grown by placing these patterns randomly in the two-dimensional (2D) plane and establishing connections whenever a piece of dendrite falls inside an axon. Several topological and dynamical properties of the resulting graph are measured, including the degree distribution, clustering coefficients, symmetry of connections, size of the largest connected component, as well as three hierarchical measurements of the local topology. By varying the number of processes of the individual basic patterns, we can quantify relationships between the individual neuronal shape and the topological and dynamical features of the networks. Integrate-and-fire dynamics on these networks is also investigated with respect to transient activation from a source node, indicating that long-range connections play an important role in the propagation of avalanches.
Resumo:
This work shows the application of the analytic hierarchy process (AHP) in the full cost accounting (FCA) within the integrated resource planning (IRP) process. For this purpose, a pioneer case was developed and different energy solutions of supply and demand for a metropolitan airport (Congonhas) were considered [Moreira, E.M., 2005. Modelamento energetico para o desenvolvimento limpo de aeroporto metropolitano baseado na filosofia do PIR-O caso da metropole de Sao Paulo. Dissertacao de mestrado, GEPEA/USP]. These solutions were compared and analyzed utilizing the software solution ""Decision Lens"" that implements the AHP. The final part of this work has a classification of resources that can be considered to be the initial target as energy resources, thus facilitating the restraints of the IRP of the airport and setting parameters aiming at sustainable development. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
A chemotaxonomic analysis is described of a database containing various types of compounds from the Heliantheae tribe (Asteraceae) using Self-Organizing Maps (SOM). The numbers of occurrences of 9 chemical classes in different taxa of the tribe were used as variables. The study shows that SOM applied to chemical data can contribute to differentiate genera, subtribes, and groups of subtribes (subtribe branches), as well as to tribal and subtribal classifications of Heliantheae, exhibiting a high hit percentage comparable to that of an expert performance, and in agreement with the previous tribe classification proposed by Stuessy.