808 resultados para Agglomerative Hierarchical Clustering
Resumo:
© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)
Resumo:
Garlic is a spice and a medicinal plant; hence, there is an increasing interest in 'developing' new varieties with different culinary properties or with high content of nutraceutical compounds. Phenotypic traits and dominant molecular markers are predominantly used to evaluate the genetic diversity of garlic clones. However, 24 SSR markers (codominant) specific for garlic are available in the literature, fostering germplasm researches. In this study, we genotyped 130 garlic accessions from Brazil and abroad using 17 polymorphic SSR markers to assess the genetic diversity and structure. This is the first attempt to evaluate a large set of accessions maintained by Brazilian institutions. A high level of redundancy was detected in the collection (50 % of the accessions represented eight haplotypes). However, non-redundant accessions presented high genetic diversity. We detected on average five alleles per locus, Shannon index of 1.2, HO of 0.5, and HE of 0.6. A core collection was set with 17 accessions, covering 100 % of the alleles with minimum redundancy. Overall FST and D values indicate a strong genetic structure within accessions. Two major groups identified by both model-based (Bayesian approach) and hierarchical clustering (UPGMA dendrogram) techniques were coherent with the classification of accessions according to maturity time (growth cycle): early-late and midseason accessions. Assessing genetic diversity and structure of garlic collections is the first step towards an efficient management and conservation of accessions in genebanks, as well as to advance future genetic studies and improvement of garlic worldwide.
Resumo:
Macro- and microarrays are well-established technologies to determine gene functions through repeated measurements of transcript abundance. We constructed a chicken skeletal muscle-associated array based on a muscle-specific EST database, which was used to generate a tissue expression dataset of similar to 4500 chicken genes across 5 adult tissues (skeletal muscle, heart, liver, brain, and skin). Only a small number of ESTs were sufficiently well characterized by BLAST searches to determine their probable cellular functions. Evidence of a particular tissue-characteristic expression can be considered an indication that the transcript is likely to be functionally significant. The skeletal muscle macroarray platform was first used to search for evidence of tissue-specific expression, focusing on the biological function of genes/transcripts, since gene expression profiles generated across tissues were found to be reliable and consistent. Hierarchical clustering analysis revealed consistent clustering among genes assigned to 'developmental growth', such as the ontology genes and germ layers. Accuracy of the expression data was supported by comparing information from known transcripts and tissue from which the transcript was derived with macroarray data. Hybridization assays resulted in consistent tissue expression profile, which will be useful to dissect tissue-regulatory networks and to predict functions of novel genes identified after extensive sequencing of the genomes of model organisms. Screening our skeletal-muscle platform using 5 chicken adult tissues allowed us identifying 43 'tissue-specific' transcripts, and 112 co-expressed uncharacterized transcripts with 62 putative motifs. This platform also represents an important tool for functional investigation of novel genes; to determine expression pattern according to developmental stages; to evaluate differences in muscular growth potential between chicken lines, and to identify tissue-specific genes.
Resumo:
Background: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results: This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion: These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/.S3T source code and datasets can also be downloaded from the aforementioned website.
Resumo:
This paper analyses the presence of financial constraint in the investment decisions of 367 Brazilian firms from 1997 to 2004, using a Bayesian econometric model with group-varying parameters. The motivation for this paper is the use of clustering techniques to group firms in a totally endogenous form. In order to classify the firms we used a hybrid clustering method, that is, hierarchical and non-hierarchical clustering techniques jointly. To estimate the parameters a Bayesian approach was considered. Prior distributions were assumed for the parameters, classifying the model in random or fixed effects. Ordinate predictive density criterion was used to select the model providing a better prediction. We tested thirty models and the better prediction considers the presence of 2 groups in the sample, assuming the fixed effect model with a Student t distribution with 20 degrees of freedom for the error. The results indicate robustness in the identification of financial constraint when the firms are classified by the clustering techniques. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In breast cancer patients, primary chemotherapy is associated with the same survival benefits as adjuvant chemotherapy. Residual tumors represent a clinical challenge, Lis they may be resistant to additional cycles of the same drugs. Our aim was to identify differential transcripts expressed in residual tumors, after neoadjuvant chemotherapy, that might be related with tumor resistance. Hence, 16 patients with paired tumor samples, collected before and after treatment (4 cycles doxorubicin/cyclophosphamide, AC) had their gene expression evaluated on cDNA microarray slides containing 4,608 genes. Three hundred and eighty-nine genes were differentially expressed (paired Student`s t-test, pFDR<0.01) between pre- and post-chemotherapy samples and among the regulated functions were the JNK cascade and cell death. Unsupervised hierarchical clustering identified one branch comprising exclusively, eight pre-chemotherapy samples and another branch, including the former correspondent eight post-chemotherapy samples and other 16 paired pre/post-chemotherapy samples. No differences in clinical and tumor parameters could explain this clustering. Another group of I I patients with paired samples had expression of selected genes determined by real-time RT-PCR and CTGF and DUSP1 were confirmed more expressed in post- as compared to pre-chemotherapy samples. After neoadjuvant chemotherapy some residual samples may retain their molecular signature while others present significant changes in their gene expression, probably induced by the treatment. CTGF and DUSP1 overexpression in residual samples may be a reflection of resistance to further administration of AC regimen.
Resumo:
Objective. To explore the relationship between biomarkers of pulmonary arterial hypertension (PAH), interferon (IFN)-regulated gene expression, and the alternative activation pathway in systemic sclerosis (SSc). Methods. Peripheral blood mononuclear cells (PBMCs) were purified from healthy controls, patients with idiopathic PAH, and SSc patients (classified as having diffuse cutaneous SSc, limited cutaneous SSc [lcSSc] without PAH, and lcSSc with PAH). IFN-regulated and ""PAH biomarker"" genes were compared after supervised hierarchical clustering. Messenger RNA levels of selected IFN-regulated genes (Siglec1 and MX1), biomarker genes (IL13RA1, CCR1, and JAK2), and the alternative activation marker gene (MRC1) were analyzed on PBMCs and on CD14- and CD14+ cell populations. Interleukin-13 (IL-13) and IL-4 concentrations were measured in plasma by immunoassay. CD14, MRC1, and IL13RA1 surface expression was analyzed by flow cytometry. Results. Increased PBMC expression of both IFN-regulated and biomarker genes distinguished SSc patients from healthy controls. Expression of genes in the biomarker cluster, but not in the IFN-regulated cluster, distinguished lcSSc with PAH from lcSSc without PAH. The genes CCR1 (P < 0.001) and JAK2 (P < 0.001) were expressed more highly in lcSSc patients with PAH compared with controls and mainly by CD14+ cells. MRC1 expression was increased exclusively in lcSSc patients with PAH (P < 0.001) and correlated strongly with pulmonary artery pressure (r = 0.52, P = 0.03) and higher mortality (P = 0.02). MRC1 expression was higher in CD14+ cells and was greatly increased by stimulation with IL-13. IL-13 concentrations in plasma were most highly increased in lcSSc patients with PAH (P < 0.001). Conclusion. IFN-regulated and biomarker genes represent distinct, although related, clusters in lcSSc patients with PAH. MRC1, a marker for the effect of IL-13 on alternative monocyte/macrophage activation, is associated with this severe complication and is related to mortality.
Resumo:
The expression of peripheral tissue antigens (PTAs) in the thymus by medullary thymic epithelial cells (mTECs) is essential for the central self-tolerance in the generation of the T cell repertoire. Due to heterogeneity of autoantigen representation, this phenomenon has been termed promiscuous gene expression (PGE), in which the autoimmune regulator (Aire) gene plays a key role as a transcription factor in part of these genes. Here we used a microarray strategy to access PGE in cultured murine CD80(+) 3.10 mTEC line. Hierarchical clustering of the data allowed observation that PTA genes were differentially expressed being possible to found their respective induced or repressed mRNAs. To further investigate the control of PGE, we tested the hypothesis that genes involved in this phenomenon might also be modulated by transcriptional network. We then reconstructed such network based on the microarray expression data, featuring the guanylate cyclase 2d (Gucy2d) gene as a main node. In such condition, we established 167 positive and negative interactions with downstream PTA genes. Silencing Aire by RNA interference, Gucy2d while down regulated established a larger number (355) of interactions with PTA genes. T- and G-boxes corresponding to AIRE protein binding sites located upstream to ATG codon of Gucy2d supports this effect. These findings provide evidence that Aire plays a role in association with Gucy2d, which is connected to Several PTA genes and establishes a cascade-like transcriptional control of promiscuous gene expression in mTEC cells. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Urinary bladder cancer is the fourth most common malignancy in the Western world. Transitional cell carcinoma (TCC) is the most common subtype, accounting for about 90% of all bladder cancers. The TP53 gene plays an essential role in the regulation of the cell cycle and apoptosis and therefore contributes to cellular transformation and malignancy; however, little is known about the differential gene expression patterns in human tumors that present with the wild-type or mutated TP53 gene. Therefore, because gene profiling can provide new insights into the molecular biology of bladder cancer, the present study aimed to compare the molecular profiles of bladder cancer cell lines with different TP53 alleles, including the wild type (RT4) and two mutants (5637, with mutations in codons 280 and 72; and T24, a TP53 allele encoding an in-frame deletion of tyrosine 126). Unsupervised hierarchical clustering and gene networks were constructed based on data generated by cDNA microarrays using mRNA from the three cell lines. Differentially expressed genes related to the cell cycle, cell division, cell death, and cell proliferation were observed in the three cell lines. However, the cDNA microarray data did not cluster cell lines based on their TP53 allele. The gene profiles of the RT4 cells were more similar to those of T24 than to those of the 5637 cells. While the deregulation of both the cell cycle and the apoptotic pathways was particularly related to TCC, these alterations were not associated with the TP53 status.
Resumo:
A multivariate model using hierarchical clustering and discriminant analysis is used to identify clusters of community opportunity and community vulnerability across Australia's mega metropolitan regions, Variables used in the model measure aspects of structural economic change, occupational change, human capital, income, unemployment, family/household disadvantage, and housing stress. A nine-cluster solution is used to categorise communities across metropolitan space. Significant between-city variations in the incidence of these clusters of opportunity and vulnerability are apparent, suggesting the emergence of marked differentiation between Australia's mega metropolitan regions in their adjustments to changing economic and social conditions. JEL classification: C49, R11, R12.
Resumo:
Diseases and insect pests are major causes of low yields of common bean (Phaseolus vulgaris L.) in Latin America and Africa. Anthracnose, angular leaf spot and common bacterial blight are widespread foliar diseases of common bean that also infect pods and seeds. One thousand and eighty-two accessions from a common bean core collection from the primary centres of origin were investigated for reaction to these three diseases. Angular leaf spot and common bacterial blight were evaluated in the field at Santander de Quilichao, Colombia, and anthracnose was evaluated in a screenhouse in Popayan, Colombia. By using the 15-group level from a hierarchical clustering procedure, it was found that 7 groups were formed with mainly Andean common bean accessions (Andean gene pool), 7 groups with mainly Middle American accessions (Middle American gene pool), while 1 group contained mixed accessions. Consistent with the theory of co-evolution, it was generally observed that accessions from the Andean gene pool were resistant to Middle American pathogen isolates causing anthracnoxe, while the Middle American accessions were resistant to pathogen isolates from the Andes. Different combinations of resistance patterns were found, and breeders can use this information to select a specific group of accessions on the basis of their need.
Resumo:
Os avanços tecnológicos e científicos, na área da saúde, têm vindo a aliar áreas como a Medicina e a Matemática, cabendo à ciência adequar de forma mais eficaz os meios de investigação, diagnóstico, monitorização e terapêutica. Os métodos desenvolvidos e os estudos apresentados nesta dissertação resultam da necessidade de encontrar respostas e soluções para os diferentes desafios identificados na área da anestesia. A índole destes problemas conduz, necessariamente, à aplicação, adaptação e conjugação de diferentes métodos e modelos das diversas áreas da matemática. A capacidade para induzir a anestesia em pacientes, de forma segura e confiável, conduz a uma enorme variedade de situações que devem ser levadas em conta, exigindo, por isso, intensivos estudos. Assim, métodos e modelos de previsão, que permitam uma melhor personalização da dosagem a administrar ao paciente e por monitorizar, o efeito induzido pela administração de cada fármaco, com sinais mais fiáveis, são fundamentais para a investigação e progresso neste campo. Neste contexto, com o objetivo de clarificar a utilização em estudos na área da anestesia de um ajustado tratamento estatístico, proponho-me abordar diferentes análises estatísticas para desenvolver um modelo de previsão sobre a resposta cerebral a dois fármacos durante sedação. Dados obtidos de voluntários serão utilizados para estudar a interação farmacodinâmica entre dois fármacos anestésicos. Numa primeira fase são explorados modelos de regressão lineares que permitam modelar o efeito dos fármacos no sinal cerebral BIS (índice bispectral do EEG – indicador da profundidade de anestesia); ou seja estimar o efeito que as concentrações de fármacos têm na depressão do eletroencefalograma (avaliada pelo BIS). Na segunda fase deste trabalho, pretende-se a identificação de diferentes interações com Análise de Clusters bem como a validação do respetivo modelo com Análise Discriminante, identificando grupos homogéneos na amostra obtida através das técnicas de agrupamento. O número de grupos existentes na amostra foi, numa fase exploratória, obtido pelas técnicas de agrupamento hierárquicas, e a caracterização dos grupos identificados foi obtida pelas técnicas de agrupamento k-means. A reprodutibilidade dos modelos de agrupamento obtidos foi testada através da análise discriminante. As principais conclusões apontam que o teste de significância da equação de Regressão Linear indicou que o modelo é altamente significativo. As variáveis propofol e remifentanil influenciam significativamente o BIS e o modelo melhora com a inclusão do remifentanil. Este trabalho demonstra ainda ser possível construir um modelo que permite agrupar as concentrações dos fármacos, com base no efeito no sinal cerebral BIS, com o apoio de técnicas de agrupamento e discriminantes. Os resultados desmontram claramente a interacção farmacodinâmica dos dois fármacos, quando analisamos o Cluster 1 e o Cluster 3. Para concentrações semelhantes de propofol o efeito no BIS é claramente diferente dependendo da grandeza da concentração de remifentanil. Em suma, o estudo demostra claramente, que quando o remifentanil é administrado com o propofol (um hipnótico) o efeito deste último é potenciado, levando o sinal BIS a valores bastante baixos.
Resumo:
This paper deals with the establishment of a characterization methodology of electric power profiles of medium voltage (MV) consumers. The characterization is supported on the data base knowledge discovery process (KDD). Data Mining techniques are used with the purpose of obtaining typical load profiles of MV customers and specific knowledge of their customers’ consumption habits. In order to form the different customers’ classes and to find a set of representative consumption patterns, a hierarchical clustering algorithm and a clustering ensemble combination approach (WEACS) are used. Taking into account the typical consumption profile of the class to which the customers belong, new tariff options were defined and new energy coefficients prices were proposed. Finally, and with the results obtained, the consequences that these will have in the interaction between customer and electric power suppliers are analyzed.
Resumo:
With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm.
Resumo:
This paper describes a methodology that was developed for the classification of Medium Voltage (MV) electricity customers. Starting from a sample of data bases, resulting from a monitoring campaign, Data Mining (DM) techniques are used in order to discover a set of a MV consumer typical load profile and, therefore, to extract knowledge regarding to the electric energy consumption patterns. In first stage, it was applied several hierarchical clustering algorithms and compared the clustering performance among them using adequacy measures. In second stage, a classification model was developed in order to allow classifying new consumers in one of the obtained clusters that had resulted from the previously process. Finally, the interpretation of the discovered knowledge are presented and discussed.