2 resultados para r-functions
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.
Resumo:
With this work I elucidated new and unexpected mechanisms of two strong and highly specific transcription inhibitors: Triptolide and Campthotecin. Triptolide (TPL) is a diterpene epoxide derived from the Chinese plant Trypterigium Wilfoordii Hook F. TPL inhibits the ATPase activity of XPB, a subunit of the general transcription factor TFIIH. In this thesis I found that degradation of Rbp1 (the largest subunit of RNA Polymerase II) caused by TPL treatments, is preceded by an hyperphosphorylation event at serine 5 of the carboxy-terminal domain (CTD) of Rbp1. This event is concomitant with a block of RNA Polymerase II at promoters of active genes. The enzyme responsible for Ser5 hyperphosphorylation event is CDK7. Notably, CDK7 downregulation rescued both Ser5 hyperphosphorylation and Rbp1 degradation triggered by TPL. Camptothecin (CPT), derived from the plant Camptotheca acuminata, specifically inhibits topoisomerase 1 (Top1). We first found that CPT induced antisense transcription at divergent CpG islands promoter. Interestingly, by immunofluorescence experiments, CPT was found to induce a burst of R loop structures (DNA/RNA hybrids) at nucleoli and mitochondria. We then decided to investigate the role of Top1 in R loop homeostasis through a short interfering RNA approach (RNAi). Using DNA/RNA immunoprecipitation techniques coupled to NGS I found that Top1 depletion induces an increase of R loops at a genome-wide level. We found that such increase occurs on the entire gene body. At a subset of loci R loops resulted particularly stressed after Top1 depletion: some of these genes showed the formation of new R loops structures, whereas other loci showed a reduction of R loops. Interestingly we found that new peaks usually appear at tandem or divergent genes in the entire gene body, while losses of R loop peaks seems to be a feature specific of 3’ end regions of convergent genes.