The objective of this study was to comprehensively compare the genomic profiles in the breast of parous and nulliparous postmenopausal women to identify genes that permanently change their expression following pregnancy. The study was designed as a two-phase approach. In the discovery phase, we compared breast genomic profiles of 37 parous with 18 nulliparous postmenopausal women. In the validation phase, confirmation of the genomic patterns observed in the discovery phase was sought in an independent set of 30 parous and 22 nulliparous postmenopausal women. RNA was hybridized to Affymetrix HG_U133 Plus 2.0 oligonucleotide arrays containing probes to 54,675 transcripts, scanned and the images analyzed using Affymetrix GCOS software. Surrogate variable analysis, logistic regression, and significance analysis of microarrays were used to identify statistically significant differences in expression of genes. The false discovery rate (FDR) approach was used to control for multiple comparisons. We found that 208 genes (305 probe sets) were differentially expressed between parous and nulliparous women in both discovery and validation phases of the study at an FDR of 10% and with at least a 1.25-fold change. These genes are involved in regulation of transcription, centrosome organization, RNA splicing, cell-cycle control, adhesion, and differentiation. The results provide initial evidence that full-term pregnancy induces long-term genomic changes in the breast. The genomic signature of pregnancy could be used as an intermediate marker to assess potential chemopreventive interventions with hormones mimicking the effects of pregnancy for prevention of breast cancer.


Contexte: Les champignons mycorhiziens à arbuscules (AMF) établissent des relations symbiotiques avec la plupart des plantes grâce à leurs réseaux d’hyphes qui s’associent avec les racines de leurs hôtes. De précédentes études ont révélé des niveaux de variation génétique extrêmes pour des loci spécifiques permettant de supposer que les AMF peuvent contenir des milliers de noyaux génétiquement divergents dans un même cytoplasme. Si aucun processus de reproduction sexuée n’a jusqu’ici été observé chez ces mycorhizes, on constate cependant que des niveaux élevés de variation génétique peuvent être maintenus à la fois par l’échange de noyaux entre hyphes et par des processus fréquents de recombinaison entre noyaux. Les AMF se propagent par l’intermédiaire de spores qui contiennent chacune un échantillon d’une population initiale de noyaux hétérogènes, directement hérités du mycélium parent. À notre connaissance les AMF sont les seuls organismes qui ne passent jamais par un stade mononucléaire, ce qui permet aux noyaux de diverger génétiquement dans un même cytoplasme. Ces aspects singuliers de la biologie des AMF rendent l’estimation de leur diversité génétique problématique. Ceci constitue un défi majeur pour les écologistes sur le terrain mais également pour les biologistes moléculaires dans leur laboratoire. Au-delà même des problématiques de diversité spécifique, l’amplitude du polymorphisme entre noyaux mycorhiziens est mal connue. Le travail proposé dans ce manuscrit de thèse explore donc les différents aspects de l’architecture génomique singulière des AMF. Résultats L’ampleur du polymorphisme intra-isolat a été déjà observée pour la grande sous-unité d’ARN ribosomal de l’isolat Glomus irregulare DAOM-197198 (précédemment identifié comme G. intraradices) et pour le gène de la polymerase1-like (PLS) de Glomus etunicatum isolat NPI. Dans un premier temps, nous avons pu confirmer ces résultats et nous avons également pu constater que ces variations étaient transcrites. Nous avons ensuite pu mettre en évidence la présence d’un goulot d’étranglement génétique au moment de la sporulation pour le locus PLS chez l’espèce G. etunicatum illustrant les importants effets d’échantillonnage qui se produisaient entre chaque génération de spore. Enfin, nous avons estimé la différentiation génétique des AMF en utilisant à la fois les réseaux de gènes appliqués aux données de séquençage haut-débit ainsi que cinq nouveaux marqueurs génomiques en copie unique. Ces analyses révèlent que la différenciation génomique est présente de manière systématique dans deux espèces (G. irregulare et G. diaphanum). Conclusions Les résultats de cette thèse fournissent des preuves supplémentaires en faveur du scénario d’une différenciation génomique entre noyaux au sein du même isolat mycorhizien. Ainsi, au moins trois membres du genre Glomus, G. irregulare, G. diaphanum and G. etunicatum, apparaissent comme des organismes dont l’organisation des génomes ne peut pas être décrit d’après un modèle Mendélien strict, ce qui corrobore l’hypothèse que les noyaux mycorhiziens génétiquement différenciés forment un pangenome.


The genome structure of Colletotrichum lindemuthianum in a set of diverse isolates was investigated using a combination of physical and molecular approaches. Flow cytometric measurement of genome size revealed significant variation between strains, with the smallest genome representing 59% of the largest. Southern-blot profiles of a cloned fungal telomere revealed a total chromosome number varying from 9 to 12. Chromosome separations using pulsed-field gel electrophoresis (PFGE) showed that these chromosomes belong to two distinct size classes: a variable number of small (< 2.5 Mb) polymorphic chromosomes and a set of unresolved chromosomes larger than 7 Mb. Two dispersed repeat elements were shown to cluster on distinct polymorphic minichromosomes. Single-copy flanking sequences from these repeat-containing clones specifically marked distinct small chromosomes. These markers were absent in some strains, indicating that part of the observed variability in genome organization may be explained by the presence or absence, in a given strain, of dispensable genomic regions and/or chromosomes.


The genome of the most virulent among 22 Brazilian geographical isolates of Spodoptera frugiperda nucleopolyhedrovirus, isolate 19 (SfMNPV-1 9), was completely sequenced and shown to comprise 132 565 bp and 141 open reading frames (ORFs). A total of 11 ORFs with no homology to genes in the GenBank database were found. Of those, four had typical baculovirus; promoter motifs and polyadenylation sites. Computer-simulated restriction enzyme cleavage patterns of SfMNPV-1 9 were compared with published physical maps of other SfMNPV isolates. Differences were observed in terms of the restriction profiles and genome size. Comparison of SfMNPV-1 9 with the sequence of the SfMNPV isolate 3AP2 indicated that they differed due to a 1427 bp deletion, as well as by a series of smaller deletions and point mutations. The majority of genes of SfMNPV-1 9 were conserved in the closely related Spodoptera exigua NPV (SeMNPV) and Agrotis segetum NPV (AgseMNPV-A), but a few regions experienced major changes and rearrangements. Synthenic maps for the genomes of group 11 NPVs revealed that gene collinearity was observed only within certain clusters. Analysis of the dynamics of gene gain and loss along the phylogenetic tree of the NPVs showed that group 11 had only five defining genes and supported the hypothesis that these viruses form ten highly divergent ancient lineages. Crucially, more than 60% of the gene gain events followed a power-law relation to genetic distance among baculoviruses, indicative of temporal organization in the gene accretion process.


Legumes develop root nodules from pluripotent stem cells in the rootpericycle in response to mitogenic activation by a decorated chitin-likenodulation factor synthesized in Rhizobium bacteria. The soybean genes encoding the receptor for such signals were cloned using map-based cloning approaches. Pluripotent cells in the root pericycle and the outer or inner cortex undergo repeated cell divisions to initiate a composite nodule primordium that develops to a functional nitrogen-fixing nodule. The process itself is autoregulated, leading to the characteristic nodulation of the upper root system. Autoregulation of nodulation (AON) in all legumes is controlled in part by a leucine-rich repeat receptor kinase gene (GmNARK). Mutations of GmNARK, and its other legume orthologues, result in abundant nodulation caused by the loss of a yet-undefined negative nodulation repressor system. AON receptor kinases are involved in perception of a long distance, root-derived signal, to negatively control nodule proliferation. GmNARK and LjHAR1 are expressed in phloem parenchyma. GmNARK kinase domain interacts with Kinase Associated Protein Phosphatase (KAPP). NARK gene expression did not mirror biological NARK activity in nodulation control, as q-RT-PCR in soybean revealed high NARK expression in roots, root tips, leaves, petioles, stems and hypocotyls, while shoot and root apical meristems were devoid of NARK RNA. High through-put transcript analysis in soybean leaf and root indicated that major genes involved in JA synthesis or response are preferentially down-regulated in leaf but not root of wild type, but not NARK mutants, suggesting that AON signaling may in part be controlled by events relating to hormone metabolism. Ethylene and abscisic acid insensitive mutants of L. japonicus are described. Nodulation in legumes has significance to global economies and ecologies, as the nitrogen input into the biosphere allows food, feed and biofuel production without the inherent costs associated with nitrogen fertilization [1]. Nodulation involves the production of a new organ capable of nitrogen fixation [2] and as such is an excellent system to study plant – microbe interaction, plant development, long distance signaling and functional genomics of stem cell proliferation [3, 4]. Concerted international effort over the last 20 years, using a combination of induced mutagenesis followed by gene discovery (forward genetics), and molecular/biochemical approaches revealed a complex developmental pathway that ‘loans’ genetic programs from various sources and orchestrates these into a novel contribution. We report our laboratory’s contribution to the present analysis in the field.


Apple consumption is highly recomended for a healthy diet and is the most important fruit produced in temperate climate regions. Unfortunately, it is also one of the fruit that most ofthen provoks allergy in atopic patients and the only treatment available up to date for these apple allergic patients is the avoidance. Apple allergy is due to the presence of four major classes of allergens: Mal d 1 (PR-10/Bet v 1-like proteins), Mal d 2 (Thaumatine-like proteins), Mal d 3 (Lipid transfer protein) and Mal d 4 (profilin). In this work new advances in the characterization of apple allergen gene families have been reached using a multidisciplinary approach. First of all, a genomic approach was used for the characterization of the allergen gene families of Mal d 1 (task of Chapter 1), Mal d 2 and Mal d 4 (task of Chapter 5). In particular, in Chapter 1 the study of two large contiguos blocks of DNA sequences containing the Mal d 1 gene cluster on LG16 allowed to acquire many new findings on number and orientation of genes in the cluster, their physical distances, their regulatory sequences and the presence of other genes or pseudogenes in this genomic region. Three new members were discovered co-localizing with the other Mal d 1 genes of LG16 suggesting that the complexity of the genetic base of allergenicity will increase with new advances. Many retrotranspon elements were also retrieved in this cluster. Due to the developement of molecular markers on the two sequences, the anchoring of the physical and the genetic map of the region has been successfully achieved. Moreover, in Chapter 5 the existence of other loci for the Thaumatine-like protein family in apple (Mal d 2.03 on LG4 and Mal d 2.02 on LG17) respect the one reported up to now was demonstred for the first time. Also one new locus for profilins (Mal d 4.04) was mapped on LG2, close to the Mal d 4.02 locus, suggesting a cluster organization for this gene family, as is well reported for Mal d 1 family. Secondly, a methodological approach was used to set up an highly specific tool to discriminate and quantify the expression of each Mal d 1 allergen gene (task of Chapter 2). In aprticular, a set of 20 Mal d 1 gene specific primer pairs for the quantitative Real time PCR technique was validated and optimized. As a first application, this tool was used on leaves and fruit tissues of the cultivar Florina in order to identify the Mal d 1 allergen genes that are expressed in different tissues. The differential expression retrieved in this study revealed a tissue-specificity for some Mal d 1 genes: 10/20 Mal d 1 genes were expressed in fruits and, indeed, probably more involved in the allergic reactions; while 17/20 Mal d 1 genes were expressed in leaves challenged with the fungus Venturia inaequalis and therefore probably interesting in the study of the plant defense mechanism. In Chapter 3 the specific expression levels of the 10 Mal d 1 isoallergen genes, found to be expressed in fruits, were studied for the first time in skin and flesh of apples of different genotypes. A complex gene expression profile was obtained due to the high gene-, tissue- and genotype-variability. Despite this, Mal d 1.06A and Mal d 1.07 expression patterns resulted particularly associated with the degree of allergenicity of the different cultivars. They were not the most expressed Mal d 1 genes in apple but here it was hypotized a relevant importance in the determination of allergenicity for both qualitative and quantitative aspects of the Mal d 1 gene expression levels. In Chapter 4 a clear modulation for all the 17 PR-10 genes tested in young leaves of Florina after challenging with the fungus V. inaequalis have been reported but with a peculiar expression profile for each gene. Interestingly, all the Mal d 1 genes resulted up-regulated except Mal d 1.10 that was down-regulated after the challenging with the fungus. The differences in direction, timing and magnitude of induction seem to confirm the hypothesis of a subfunctionalization inside the gene family despite an high sequencce and structure similarity. Moreover, a modulation of PR-10 genes was showed both in compatible (Gala-V. inaequalis) and incompatible (Florina-V. inaequalis) interactions contribute to validate the hypothesis of an indirect role for at least some of these proteins in the induced defense responses. Finally, a certain modulation of PR-10 transcripts retrieved also in leaves treated with water confirm their abilty to respond also to abiotic stress. To conclude, the genomic approach used here allowed to create a comprehensive inventory of all the genes of allergen families, especially in the case of extended gene families like Mal d 1. This knowledge can be considered a basal prerequisite for many further studies. On the other hand, the specific transcriptional approach make it possible to evaluate the Mal d 1 genes behavior on different samples and conditions and therefore, to speculate on their involvement on apple allergenicity process. Considering the double nature of Mal d 1 proteins, as apple allergens and as PR-10 proteins, the gene expression analysis upon the attack of the fungus created the base for unravel the Mal d 1 biological functions. In particular, the knowledge acquired in this work about the PR-10 genes putatively more involved in the specific Malus-V. inaequalis interaction will be helpful, in the future, to drive the apple breeding for hypo-allergenicity genotype without compromise the mechanism of response of the plants to stress conditions. For the future, the survey of the differences in allergenicity among cultivars has to be be thorough including other genotypes and allergic patients in the tests. After this, the allelic diversity analysis with the high and low allergenic cultivars on all the allergen genes, in particular on the ones with transcription levels correlated to allergencity, will provide the genetic background of the low ones. This step from genes to alleles will allow the develop of molecular markers for them that might be used to effectively addressed the apple breeding for hypo-allergenicity. Another important step forward for the study of apple allergens will be the use of a specific proteomic approach since apple allergy is a multifactor-determined disease and only an interdisciplinary and integrated approach can be effective for its prevention and treatment.


Here I will focus on three main topics that best address and include the projects I have been working in during my three year PhD period that I have spent in different research laboratories addressing both computationally and practically important problems all related to modern molecular genomics. The first topic is the use of livestock species (pigs) as a model of obesity, a complex human dysfunction. My efforts here concern the detection and annotation of Single Nucleotide Polymorphisms. I developed a pipeline for mining human and porcine sequences. Starting from a set of human genes related with obesity the platform returns a list of annotated porcine SNPs extracted from a new set of potential obesity-genes. 565 of these SNPs were analyzed on an Illumina chip to test the involvement in obesity on a population composed by more than 500 pigs. Results will be discussed. All the computational analysis and experiments were done in collaboration with the Biocomputing group and Dr.Luca Fontanesi, respectively, under the direction of prof. Rita Casadio at the Bologna University, Italy. The second topic concerns developing a methodology, based on Factor Analysis, to simultaneously mine information from different levels of biological organization. With specific test cases we develop models of the complexity of the mRNA-miRNA molecular interaction in brain tumors measured indirectly by microarray and quantitative PCR. This work was done under the supervision of Prof. Christine Nardini, at the “CAS-MPG Partner Institute for Computational Biology” of Shangai, China (co-founded by the Max Planck Society and the Chinese Academy of Sciences jointly) The third topic concerns the development of a new method to overcome the variety of PCR technologies routinely adopted to characterize unknown flanking DNA regions of a viral integration locus of the human genome after clinical gene therapy. This new method is entirely based on next generation sequencing and it reduces the time required to detect insertion sites, decreasing the complexity of the procedure. This work was done in collaboration with the group of Dr. Manfred Schmidt at the Nationales Centrum für Tumorerkrankungen (Heidelberg, Germany) supervised by Dr. Annette Deichmann and Dr. Ali Nowrouzi. Furthermore I add as an Appendix the description of a R package for gene network reconstruction that I helped to develop for scientific usage (http://www.bioconductor.org/help/bioc-views/release/bioc/html/BUS.html).


A series of human-rodent somatic cell hybrids were investigated by Southern blot analysis for the presence or absence of twenty-six molecular markers and three isozyme loci from human chromosome 19. Based on the co-retention of these markers in the various independent hybrid clones containing portions of human chromosome 19 and on pulsed field mapping, chromosome 19 is divided into twenty ordered regions. The most likely marker order for the chromosome is: (LDLR, C3)-(cen-MANNB)-D19S7-PEPD-D19S9-GPI-TGF$ \beta$-(CYP2A, NCA, CGM2, BCKAD)-PSG1a-(D19S8, XRCC1)-(D19S19, ATP1A3)-(D19S37, APOC2)-CKMM-ERCC2-ERCC1-(D19S62, D19S51)-D19S6-D19S50-D19S22-(CGB, FTL)-qter.^ The region of 19q between the proximal marker D19S7 and the distal gene coding for the beta subunit of chorionic gonadotropin (CGB) is about 37 Mb in size and covers about 37 cM genetic distance. The ration of genetic to physical distance on 19q is therefore very close to the genomic average OF 1 cM/Mb. Estimates of physical distances for intervals between chromosome 19 markers were calculated using a mapping function which estimates distances based on the number of breaks in hybrid clone panels. The consensus genetic distances between individual markers (established at HBM10) were compared to these estimates of physical distances. The close agreement between the two estimates suggested that spontaneously broken hybrids are as appropriate for this type of study as radiation hybrids.^ All three DNA repair genes located on chromosome 19 were found to have homologues on Chinese hamster chromosome 9, which is hemizygous in CHO cells, providing an explanation for the apparent ease with which mutations at these loci were identified in CHO cells. Homologues of CKMM and TGF$\beta$ (from human chromosome 19q) and a mini-satellite DNA specific to the distal region of human chromosome 19q were also mapped to Chinese hamster 9. Markers from 19p did not map to this hamster chromosome. Thus the q-arm of chromosome 19, at least between the genes PEPD and ERCC1, appears to be a linkage group which is conserved intact between humans and Chinese hamsters. ^


Previous restriction analysis of cloned equine DNA and genomic DNA of equine peripheral blood mononuclear cells had indicated the existence of one c epsilon, one c alpha and up to six c gamma genes in the haploid equine genome. The c epsilon and c alpha genes have been aligned on a 30 kb DNA fragment in the order 5' c epsilon-c alpha 3'. Here we describe the alignment of the equine c mu and c gamma genes by deletion analysis of one IgM, four IgG and two equine light chain expressing heterohybridomas. This analysis establishes the existence of six c gamma genes per haploid genome. The genomic alignment of the cH-genes is 5' c mu/(/) c gamma 1/(/) c gamma 2/(/) c gamma 3/(/) c gamma 4/(/) c gamma 5/(/) c gamma 6/(/) c epsilon-c alpha 3', naming the c gamma genes according to their position relative to c mu. For three of the c gamma genes the corresponding IgG isotypes could be identified as IgGa for c gamma 1, IgG(T) for c gamma 3 and IgGb for c gamma 4.


Organization of transgenes in rice transformed through direct DNA transfer strongly suggests a two-phase integration mechanism. In the “preintegration” phase, transforming plasmid molecules (either intact or partial) are spliced together. This gives rise to rearranged transgenic sequences, which upon integration do not contain any interspersed plant genomic sequences. Subsequently, integration of transgenic DNA into the host genome is initiated. Our experiments suggest that the original site of integration acts as a hot spot, facilitating subsequent integration of successive transgenic molecules at the same locus. The resulting transgenic locus may have plant DNA separating the transgenic sequences. Our data indicate that transformation through direct DNA transfer, specifically particle bombardment, generally results in a single transgenic locus as a result of this two-phase integration mechanism. Transgenic plants generated through such processes may, therefore, be more amenable to breeding programs as the single transgenic locus will be easier to characterize genetically. Results from direct DNA transfer experiments suggest that in the absence of protein factors involved in exogenous DNA transfer through Agrobacterium, the qualitative and/or quantitative efficiency of transformation events is not compromised. Our results cast doubt on the role of Agrobacterium vir genes in the integration process.


A set of oat–maize chromosome addition lines with individual maize (Zea mays L.) chromosomes present in plants with a complete oat (Avena sativa L.) chromosome complement provides a unique opportunity to analyze the organization of centromeric regions of each maize chromosome. A DNA sequence, MCS1a, described previously as a maize centromere-associated sequence, was used as a probe to isolate cosmid clones from a genomic library made of DNA purified from a maize chromosome 9 addition line. Analysis of six cosmid clones containing centromeric DNA segments revealed a complex organization. The MCS1a sequence was found to comprise a portion of the long terminal repeats of a retrotransposon-like repeated element, termed CentA. Two of the six cosmid clones contained regions composed of a newly identified family of tandem repeats, termed CentC. Copies of CentA and tandem arrays of CentC are interspersed with other repetitive elements, including the previously identified maize retroelements Huck and Prem2. Fluorescence in situ hybridization revealed that CentC and CentA elements are limited to the centromeric region of each maize chromosome. The retroelements Huck and Prem2 are dispersed along all maize chromosomes, although Huck elements are present in an increased concentration around centromeric regions. Significant variation in the size of the blocks of CentC and in the copy number of CentA elements, as well as restriction fragment length variations were detected within the centromeric region of each maize chromosome studied. The different proportions and arrangements of these elements and likely others provide each centromeric region with a unique overall structure.