5 resultados para Microarray Experiments

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cardiac morphogenesis is a complex process governed by evolutionarily conserved transcription factors and signaling molecules. The Drosophila cardiac tube is linear, made of 52 pairs of cardiomyocytes (CMs), which express specific transcription factor genes that have human homologues implicated in Congenital Heart Diseases (CHDs) (NKX2-5, GATA4 and TBX5). The Drosophila cardiac tube is linear and composed of a rostral portion named aorta and a caudal one called heart, distinguished by morphological and functional differences controlled by Hox genes, key regulators of axial patterning. Overexpression and inactivation of the Hox gene abdominal-A (abd-A), which is expressed exclusively in the heart, revealed that abd-A controls heart identity. The aim of our work is to isolate the heart-specific cisregulatory sequences of abd-A direct target genes, the realizator genes granting heart identity. In each segment of the heart, four pairs of cardiomyocytes (CMs) express tinman (tin), homologous to NKX2-5, and acquire strong contractile and automatic rhythmic activities. By tyramide amplified FISH, we found that seven genes, encoding ion channels, pumps or transporters, are specifically expressed in the Tin-CMs of the heart. We initially used online available tools to identify their heart-specific cisregutatory modules by looking for Conserved Non-coding Sequences containing clusters of binding sites for various cardiac transcription factors, including Hox proteins. Based on these data we generated several reporter gene constructs and transgenic embryos, but none of them showed reporter gene expression in the heart. In order to identify additional abd-A target genes, we performed microarray experiments comparing the transcriptomes of aorta versus heart and identified 144 genes overexpressed in the heart. In order to find the heart-specific cis-regulatory regions of these target genes we developed a new bioinformatic approach where prediction is based on pattern matching and ordered statistics. We first retrieved Conserved Noncoding Sequences from the alignment between the D.melanogaster and D.pseudobscura genomes. We scored for combinations of conserved occurrences of ABD-A, ABD-B, TIN, PNR, dMEF2, MADS box, T-box and E-box sites and we ranked these results based on two independent strategies. On one hand we ranked the putative cis-regulatory sequences according to best scored ABD-A biding sites, on the other hand we scored according to conservation of binding sites. We integrated and ranked again the two lists obtained independently to produce a final rank. We generated nGFP reporter construct flies for in vivo validation. We identified three 1kblong heart-specific enhancers. By in vivo and in vitro experiments we are determining whether they are direct abd-A targets, demonstrating the role of a Hox gene in the realization of heart identity. The identified abd-A direct target genes may be targets also of the NKX2-5, GATA4 and/or TBX5 homologues tin, pannier and Doc genes, respectively. The identification of sequences coregulated by a Hox protein and the homologues of transcription factors causing CHDs, will provide a mean to test whether these factors function as Hox cofactors granting cardiac specificity to Hox proteins, increasing our knowledge on the molecular mechanisms underlying CHDs. Finally, it may be investigated whether these Hox targets are involved in CHDs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Brown rot caused by Monilinia laxa and Monilinia fructigena is considered one of the most important diseases affecting Prunus species. Although some losses can result from the rotten fruits in the orchard, most of the damage is caused to fruits during the post-harvest phase. Several studies reported that brown rot incidence during fruit development highly varies; it was found that at a period corresponding to the the pit hardening stage, fruit susceptibility drastically decreases, to be quickly restored afterwards. However the molecular basis of this phenomenon is still not well understood. Furthermore, no difference in the rot incidence was found between wound and un-wound fruits, suggesting that resistance associated more to a specifc biochemical response of the fruit, rather than to a higher mechanical resistance. So far, the interaction Monilinia-peach was analyzed through chemical approaches. In this study, a bio-molecular approach was undertaken in order to reveal alteration in gene expression associated to the variation of susceptibility. In this thesis three different methods for gene expression analysis were used to analyze the alterations in gene expression occurring in peach fruits during the pit hardening stage, in a period encompassing the temporary change in Monilinia susceptibility: real time PCR, microarray and cDNA AFLP techniques. In 2005, peach fruits (cv.K2) were weekly harvested during a 19-week long-period, starting from the fourth week after full bloom, until full maturity. At each sampling time, three replicates of 5 fruits each were dipped in the M.laxa conidial suspension or in distilled water, as negative control. The fruits were maintained at room temperature for 3 hours; afterwards, they were peeled with a scalpel; the peel was immediately frozen in liquid nitrogen and transferred to -80 °C until use. The degree of susceptibility of peach fruit to the pathogen was determined on 3 replicates of 20 fruits each, as percentage of infected fruits, after one week at 20 °C. Real time PCR analysis was performed to study the variation in expression of those genes encoding for the enzymes of the phenylpropanoid pathway (phenylalanine ammonia lyase (PAL), chalcone synthase (CHS), cinnamate 4-hydroxylase (C4H), leucoanthocyanidine reductase (LAR), hydroxycinnamoyl CoA quinate hydroxycinnamoyl transferase (HQT) and of the jasmonate pathway, such as lipoxygenase (LOX), both involved in the production of important defense compounds. Alteration in gene expression was monitored on fruit samples of a period encompassing the pit hardening stage and the corresponding temporary resistance to M.laxa infections, weekly, from the 6thto the 12th week after full bloom (AFB) inoculated with M. laxa or mock-inoculated. The data suggest a critical change in the expression level of the phenylpropanoid pathway from the 7th to the 8th week AFB; such change could be directly physiologically associated to the peach growth and it could indirectly determine the decrease of susceptibility of peach fruit to Monilinia rot during the subsequent weeks. To investigate on the transcriptome variation underneath the temporary loss of susceptibility of peach fruits to Monilinia rot, the microarray and the cDNA AFLP techniques were used. The samples harvested on the 8th week AFB (named S, for susceptible ones) and on the 12th week AFB (named R, for resistant ones) were compared, both inoculated or mock-inoculated. The microarray experiments were carried out at the University of Padua (Dept. of Environmental Agronomy and Crop Science), using the μPEACH1.0 microarray together with the suited protocols. The analysis showed that 30 genes (corresponding to the 0.6% of the total sequences (4806) contained in the μPeach1.0 microarray) were found up-regulated and 31 ( 0.6%) down regulated in RH vs. SH fruits. On the other hand, 20 genes (0.4%) were shown to be up-regulated and 13 (0.3%) down-regulated in the RI vs. SI fruit. No genes were found differentially expressed in the mock-inoculated resistant fruits (RH) vs. the inoculated resistant ones (RI). Among the up-regulated genes an ATP sulfurylase, an heat shock protein 70, the major allergen Pru P1, an harpin inducing protein and S-adenosylmethionine decarboxylase were found, conversely among the down-regulated ones, cinnamyl alcohol dehydrogenase, an histidine- containing phosphotransfer protein and the ferritin were found. The microarray experimental results and the data indirectly derived, were tested by Real Time PCR analysis. cDNA AFLP analysis was also performed on the same samples. 339 transcript derived fragments considered significant for Monilinia resistance, were selected, sequenced and classified. Genes potentially involved in cell rescue and defence were well represented (8%); several genes (12.1%) involved in the protein folding, post-transductional modification and genes (9.2%) involved in cellular transport were also found. A further 10.3% of genes were classified as involved in the metabolism of aminoacid, carbohydrate and fatty acid. On the other hand, genes involved in the protein synthesis (5.7%) and in signal transduction and communication (5.7%) were found. Among the most interesting genes found differentially expressed between susceptible and resistant fruits, genes encoding for pathogenesis related (PR) proteins were found. To investigate on the association of Monilinia resistance and PR biological function, the major allergen Pru P1 (GenBank accession AM493970) and its isoform (here named Pru P2), were expressed in heterologous system and in vitro assayed for their anti-microbial activity. The ribonuclease activity of the recombinant Pru P1 and Pru P2 proteins was assayed against peach total RNA. As the other PR10 proteins, they showed a ribonucleolytic activity, that could be important to contrast pathogen penetration. Moreover Pru P1 and Pru P2 recombinant proteins were checked for direct antimicrobial activity. No inhibitory effect of Pru P1 or Pru P2 was detected against the selected fungi.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Adhesion, immune evasion and invasion are key determinants during bacterial pathogenesis. Pathogenic bacteria possess a wide variety of surface exposed and secreted proteins which allow them to adhere to tissues, escape the immune system and spread throughout the human body. Therefore, extensive contacts between the human and the bacterial extracellular proteomes take place at the host-pathogen interface at the protein level. Recent researches emphasized the importance of a global and deeper understanding of the molecular mechanisms which underlie bacterial immune evasion and pathogenesis. Through the use of a large-scale, unbiased, protein microarray-based approach and of wide libraries of human and bacterial purified proteins, novel host-pathogen interactions were identified. This approach was first applied to Staphylococcus aureus, cause of a wide variety of diseases ranging from skin infections to endocarditis and sepsis. The screening led to the identification of several novel interactions between the human and the S. aureus extracellular proteomes. The interaction between the S. aureus immune evasion protein FLIPr (formyl-peptide receptor like-1 inhibitory protein) and the human complement component C1q, key players of the offense-defense fighting, was characterized using label-free techniques and functional assays. The same approach was also applied to Neisseria meningitidis, major cause of bacterial meningitis and fulminant sepsis worldwide. The screening led to the identification of several potential human receptors for the neisserial adhesin A (NadA), an important adhesion protein and key determinant of meningococcal interactions with the human host at various stages. The interaction between NadA and human LOX-1 (low-density oxidized lipoprotein receptor) was confirmed using label-free technologies and cell binding experiments in vitro. Taken together, these two examples provided concrete insights into S. aureus and N. meningitidis pathogenesis, and identified protein microarray coupled with appropriate validation methodologies as a powerful large scale tool for host-pathogen interactions studies.