965 resultados para Genes Regulatory Sequences
Resumo:
MicroRNAs (miRNAs) are major post-transcriptional regulators of gene expression, yet their origins and functional evolution in mammals remain little understood due to the lack of appropriate comparative data. Using RNA sequencing, we have generated extensive and comparable miRNA data for five organs in six species that represent all main mammalian lineages and birds (the evolutionary outgroup) with the aim to unravel the evolution of mammalian miRNAs. Our analyses reveal an overall expansion of miRNA repertoires in mammals, with threefold accelerated birth rates of miRNA families in placentals and marsupials, facilitated by the de novo emergence of miRNAs in host gene introns. Generally, our analyses suggest a high rate of miRNA family turnover in mammals with many newly emerged miRNA families being lost soon after their formation. Selectively preserved mammalian miRNA families gradually evolved higher expression levels, as well as altered mature sequences and target gene repertoires, and were apparently mainly recruited to exert regulatory functions in nervous tissues. However, miRNAs that originated on the X chromosome evolved high expression levels and potentially diverse functions during spermatogenesis, including meiosis, through selectively driven duplication-divergence processes. Overall, our study thus provides detailed insights into the birth and evolution of mammalian miRNA genes and the associated selective forces.
Resumo:
Members of the leucine-rich repeat protein family are involved in diverse functions including protein phosphatase 2-inhibition, cell cycle regulation, gene regulation and signalling pathways. A novel Schistosoma mansoni gene, called SmLANP, presenting homology to various genes coding for proteins that belong to the super family of leucine-rich repeat proteins, was characterized here. SmLANP was 1184bp in length as determined from cDNA and genomic sequences and encoded a 296 amino acid open reading frame that spanning from 6 to 894bp. The predicted amino acid sequence had a calculated molecular weight of 32kDa. Analysis of the predicted sequence indicated the presence of 3 leucine-rich domains (LRR) located in the N-terminal region and an aspartic acid rich region in the C-terminal end. SmLANP transcript is expressed in all stages of the S. mansoni life cycle analyzed, exhibiting the highest expression level in males. The SmLANP protein was expressed in a GST expression system and antibodies raised in mice against the recombinant protein. By immunolocalization assay, using adult worms, it was shown that the protein is mainly present in the cell nucleus through the whole body and strongly expressed along the tegument cell body nuclei of adult worms. As members of this family are usually involved in protein-protein interaction, a yeast two hybrid assay was conducted to identify putative binding partners for SmLANP. Thirty-six possible partners were identified, and a protein ATP synthase subunit alpha was confirmed by pull down assays, as a binding partner of the SmLANP protein.
Resumo:
OBJECTIVE To study the molecular genetic and clinical features of cerebral cavernous malformations (CCM) in a cohort of Spanish patients. METHODS We analyzed the CCM1, CCM2, and CCM3 genes by MLPA and direct sequencing of exons and intronic boundaries in 94 familial forms and 41 sporadic cases of CCM patients of Spanish extraction. When available, RNA studies were performed seeking for alternative or cryptic splicing. RESULTS A total of 26 pathogenic mutations, 22 of which predict truncated proteins, were identified in 29 familial forms and in three sporadic cases. The repertoire includes six novel non-sense and frameshift mutations in CCM1 and CCM3. We also found four missense mutations, one of them located at the third NPXY motif of CCM1 and another one that leads to cryptic splicing of CCM1 exon 6. We found four genomic deletions with the loss of the whole CCM2 gene in one patient and a partial loss of CCM1and CCM2 genes in three other patients. Four families had mutations in CCM3. The results include a high frequency of intronic variants, although most of them localize out of consensus splicing sequences. The main symptoms associated to clinical debut consisted of cerebral haemorrhage, migraines and epileptic seizures. The rare co-occurrence of CCM with Noonan and Chiari syndromes and delayed menarche is reported. CONCLUSIONS Analysis of CCM genes by sequencing and MLPA has detected mutations in almost 35% of a Spanish cohort (36% of familial cases and 10% of sporadic patients). The results include 13 new mutations of CCM genes and the main clinical symptoms that deserves consideration in molecular diagnosis and genetic counselling of cerebral cavernous malformations.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
The analysis of conservation between the human and mouse genomes resulted in the identification of a large number of conserved nongenic sequences (CNGs). The functional significance of this nongenic conservation remains unknown, however. The availability of the sequence of a third mammalian genome, the dog, allows for a large-scale analysis of evolutionary attributes of CNGs in mammals. We have aligned 1638 previously identified CNGs and 976 conserved exons (CODs) from human chromosome 21 (Hsa21) with their orthologous sequences in mouse and dog. Attributes of selective constraint, such as sequence conservation, clustering, and direction of substitutions were compared between CNGs and CODs, showing a clear distinction between the two classes. We subsequently performed a chromosome-wide analysis of CNGs by correlating selective constraint metrics with their position on the chromosome and relative to their distance from genes. We found that CNGs appear to be randomly arranged in intergenic regions, with no bias to be closer or farther from genes. Moreover, conservation and clustering of substitutions of CNGs appear to be completely independent of their distance from genes. These results suggest that the majority of CNGs are not typical of previously described regulatory elements in terms of their location. We propose models for a global role of CNGs in genome function and regulation, through long-distance cis or trans chromosomal interactions.
Resumo:
The opportunistic ubiquitous pathogen Pseudomonas aeruginosa strain PAOl is a versatile Gram-negative bacterium that has the extraordinary capacity to colonize a wide diversity of ecological niches and to cause severe and persistent infections in humans. To ensure an optimal coordination of the genes involved in nutrient utilization, this bacterium uses the NtrB/C and/or the CbrA/B two-component systems, to sense nutrients availability and to regulate in consequence the expression of genes involved in their uptake and catabolism. NtrB/C is specialized in nitrogen utilization, while the CbrA/B system is involved in both carbon and nitrogen utilization and both systems activate their target genes expression in concert with the alternative sigma factor RpoN. Moreover, the NtrB/C and CbrA/B two- component systems regulate the secondary metabolism of the bacterium, such as the production of virulence factors. In addition to the fine-tuning transcriptional regulation, P. aeruginosa can rapidly modulate its metabolism using small non-coding regulatory RNAs (sRNAs), which regulate gene expression at the post-transcriptional level by diverse and sophisticated mechanisms and contribute to the fast physiological adaptability of this bacterium. In our search for novel RpoN-dependent sRNAs modulating the nutritional adaptation of P. aeruginosa PAOl, we discovered NrsZ (Nitrogen regulated sRNA), a novel RpoN-dependent sRNA that is induced under nitrogen starvation by the NtrB/C two-component system. NrsZ has a unique architecture, formed of three similar stem-loop structures (SL I, II and II) separated by variant spacer sequences. Moreover, this sRNA is processed in short individual stem-loop molecules, by internal cleavage involving the endoribonuclease RNAse E. Concerning NrsZ functions in P. aeruginosa PAOl, this sRNA was shown to trigger the swarming motility and the rhamnolipid biosurfactants production. This regulation is due to the NrsZ-mediated activation of rhlA expression, a gene encoding for an enzyme essential for swarming motility and rhamnolipids production. Interestingly, the SL I structure of NrsZ ensures its regulatory function on rhlA expression, suggesting that the similar SLs are the functional units of this modular sRNA. However, the regulatory mechanism of action of NrsZ on rhlA expression activation remains unclear and is currently being investigated. Additionally, the NrsZ regulatory network was investigated by a transcriptome analysis, suggesting that numerous genes involved in both primary and secondary metabolism are regulated by this sRNA. To emphasize the importance of NrsZ, we investigated its conservation in other Pseudomonas species and demonstrated that NrsZ is conserved and expressed under nitrogen limitation in Pseudomonas protegens Pf-5, Pseudomonas putida KT2442, Pseudomonas entomophila L48 and Pseudomonas syringae pv. tomato DC3000, strains having different ecological features, suggesting an important role of NrsZ in the adaptation of Pseudomonads to nitrogen starvation. Interestingly the architecture of the different NrsZ homologs is similarly composed by SL structures and variant spacer sequences. However, the number of SL repetitions is not identical, and one to six SLs were predicted on the different NrsZ homologs. Moreover, NrsZ is processed in short molecules in all the strains, similarly to what was previously observed in P. aeruginosa PAOl, and the heterologous expression of the NrsZ homologs restored rhlA expression, swarming motility and rhamnolipids production in the P. aeruginosa NrsZ mutant. In many aspects, NrsZ is an atypical sRNA in the bacterial panorama. To our knowledge, NrsZ is the first described sRNA induced by the NtrB/C. Moreover, its unique modular architecture and its processing in similar short SL molecules suggest that NrsZ belongs to a novel family of bacterial sRNAs. -- L'agent pathogène opportuniste et ubiquitaire Pseudomonas aeruginosa souche PAOl est une bactérie Gram négative versatile ayant l'extraordinaire capacité de coloniser différentes niches écologiques et de causer des infections sévères et persistantes chez l'être humain. Afin d'assurer une coordination optimale des gènes impliqués dans l'utilisation de différents nutriments, cette bactérie se sert de systèmes à deux composants tel que NtrB/C et CbrA/B afin de détecter la disponibilité des ressources nutritives, puis de réguler en conséquence l'expression des gènes impliqués dans leur importation et leur catabolisme. Le système NtrB/C régule l'utilisation des sources d'azote alors que le système CbrA/B est impliqué à la fois dans l'utilisation des sources de carbone et d'azote. Ces deux systèmes activent l'expression de leurs gènes-cibles de concert avec le facteur sigma alternatif RpoN. En outre, NtrB/C et CbrA/B régulent aussi le métabolisme secondaire, contrôlant notamment la production d'importants facteurs de virulence. En plus de toutes ces régulations génétiques fines ayant lieu au niveau transcriptionnel, P. aeruginosa est aussi capable de moduler son métabolisme en se servant de petits ARNs régulateurs non-codants (ARNncs), qui régulent l'expression génétique à un niveau post- transcriptionnel par divers mécanismes sophistiqués et contribuent à rendre particulièrement rapide l'adaptation physiologique de cette bactérie. Au cours de nos recherches sur de nouveaux ARNncs dépendant du facteur sigma RpoN et impliqués dans l'adaptation nutritionnelle de P. aeruginosa PAOl, nous avons découvert NrsZ (Nitrogen regulated sRNA), un ARNnc induit par la cascade NtrB/C-RpoN en condition de carence en azote. NrsZ a une architecture unique, composée de trois structures en tige- boucle (TB I, II et III) hautement similaires et séparées par des « espaceurs » ayant des séquences variables. De plus, cet ARNnc est clivé en petits fragments correspondant au trois molécules en tige-boucle, par un processus de clivage interne impliquant l'endoribonucléase RNase E. Concernant les fonctions de NrsZ chez P. aeruginosa PAOl, cet ARNnc est capable d'induire la motilité de type « swarming » et la production de biosurfactants, nommés rhamnolipides. Cette régulation est due à l'activation par NrsZ de l'expression de rhlA, un gène essentiel pour la motilité de type swarming et pour la production de rhamnolipides. Étonnamment, la structure TB I est capable d'assurer à elle seule la fonction régulatrice de NrsZ sur l'expression de rhlA, suggérant que ces molécules TBs sont les unités fonctionnelles de cet ARNnc modulaire. Cependant, le mécanisme moléculaire par lequel NrsZ active l'expression de rhlA demeure à ce jour incertain et est actuellement à l'étude. En plus, le réseau de régulations médiées par NrsZ a été étudié par une analyse de transcriptome qui a indiqué que de nombreux gènes impliqués dans le métabolisme primaire ou secondaire seraient régulés par NrsZ. Pour accentuer l'importance de NrsZ, nous avons étudié sa conservation dans d'autres espèces de Pseudomonas. Ainsi, nous avons démontré que NrsZ est conservé et exprimé en situation de carence d'azote par les souches Pseudomonas protegens Pf-5, Pseudomonas putida KT2442, Pseudomonas entomophila L48, Pseudomonas syringae pv. tomato DC3000, quatre espèces ayant des caractéristiques écologiques très différentes, suggérant que NrsZ joue un rôle important dans l'adaptation du genre Pseudomonas envers la carence en azote. Chez toutes les souches étudiées, les différents homologues de NrsZ présentent une architecture similaire faite de TBs conservées et d'espaceurs. Cependant, le nombre de TBs n'est pas identique et peut varier de une à six copies selon la souche. Les différentes versions de NrsZ sont clivées en petites molécules dans ces quatre souches, comme il a été observé chez P. aeruginosa PAOl. De plus, l'expression hétérologue des différentes variantes de NrsZ est capable de restaurer l'expression de rhlA, la motilité swarming et la production de rhamnolipides dans une souche de P. aeruginosa dont nrsZ a été inactivé. Par bien des aspects, NrsZ est un ARNnc atypique dans le monde bactérien. À notre connaissance, NrsZ est le premier ARNnc décrit comme étant régulé par le système NtrB/C. De plus, son unique architecture modulaire et son clivage en petites molécules similaires suggèrent que NrsZ appartient à une nouvelle famille d'ARNncs bactériens.
Resumo:
Developmentally regulated mechanisms involving alternative RNA splicing and/or polyadenylation, as well as transcription termination, are implicated in controlling the levels of secreted mu (mu s), membrane mu (mu m) and delta immunoglobulin (Ig) heavy chain mRNAs during B cell differentiation (mu gene encodes the mu heavy chain). Using expression vectors constructed with genomic DNA segments composed of the mu m polyadenylation signal region, we analyzed poly(A) site utilization and termination of transcription in stably transfected myeloma cells and in murine fibroblast L cells. We found that the gene segment containing the mu m poly(A) signals, along with 536 bp of downstream flanking sequence, acted as a transcription terminator in both myeloma cells and L cell fibroblasts. Neither a 141-bp DNA fragment (which directed efficient polyadenylation at the mu m site), nor the 536-bp flanking nucleotide sequence alone, were sufficient to obtain a similar regulation. This shows that the mu m poly(A) region plays a central role in controlling developmentally regulated transcription termination by blocking downstream delta gene expression. Because this gene segment exhibited the same RNA processing and termination activities in fibroblasts, it appears that these processes are not tissue-specific.
Resumo:
Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS (http://genome.imim.es/datasets/abs2005/index.html) is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs.
Resumo:
Several approaches have been developed to estimate both the relative and absolute rates of speciation and extinction within clades based on molecular phylogenetic reconstructions of evolutionary relationships, according to an underlying model of diversification. However, the macroevolutionary models established for eukaryotes have scarcely been used with prokaryotes. We have investigated the rate and pattern of cladogenesis in the genus Aeromonas (γ-Proteobacteria, Proteobacteria, Bacteria) using the sequences of five housekeeping genes and an uncorrelated relaxed-clock approach. To our knowledge, until now this analysis has never been applied to all the species described in a bacterial genus and thus opens up the possibility of establishing models of speciation from sequence data commonly used in phylogenetic studies of prokaryotes. Our results suggest that the genus Aeromonas began to diverge between 248 and 266 million years ago, exhibiting a constant divergence rate through the Phanerozoic, which could be described as a pure birth process.
Resumo:
While there is evidence that the two ubiquitously expressed thyroid hormone (T3) receptors, TRalpha1 and TRbeta1, have distinct functional specificities, the mechanism by which they discriminate potential target genes remains largely unexplained. In this study, we demonstrate that the thyroid hormone response elements (TRE) from the malic enzyme and myelin basic protein genes (METRE and MBPTRE) respectively, are not functionally equivalent. The METRE, which is a direct repeat motif with a 4-base pair gap between the two half-site hexamers binds thyroid hormone receptor as a heterodimer with 9-cis-retinoic acid receptor (RXR) and mediates a high T3-dependent activation in response to TRalpha1 or TRbeta1 in NIH3T3 cells. In contrast, the MBPTRE, which consists of an inverted palindrome formed by two hexamers spaced by 6 base pairs, confers an efficient transactivation by TRbeta1 but a poor transactivation by TRalpha1. While both receptors form heterodimers with RXR on MBPTRE, the poor transactivation by TRalpha1 correlates also with its ability to bind efficiently as a monomer. This monomer, which is only observed with TRalpha1 bound to MBPTRE, interacts neither with N-CoR nor with SRC-1, explaining its functional inefficacy. However, in Xenopus oocytes, in which RXR proteins are not detectable, the transactivation mediated by TRalpha1 and TRbeta1 is equivalent and independent of a RXR supply, raising the question of the identity of the thyroid hormone receptor partner in these cells. Thus, in mammalian cells, the binding characteristics of TRalpha1 to MBPTRE (i.e. high monomer binding efficiency and low transactivation activity) might explain the particular pattern of T3 responsiveness of MBP gene expression during central nervous system development.
Resumo:
Alternative splicing produces multiple isoforms from the same gene, thus increasing the number of transcripts of the species. Alternative splicing is a virtually ubiquitous mechanism in eukaryotes, for example more than 90% of protein-coding genes in human are alternatively spliced. Recent evolutionary studies showed that alternative splicing is a fast evolving and highly species- specific mechanism. The rapid evolution of alternative splicing was considered as a contribution to the phenotypic diversity between species. However, the function of many isoforms produced by alternative splicing remains unclear and they might be the result of noisy splicing. Thus, the functional relevance of alternative splicing and the evolutionary mechanisms of its rapid divergence among species are still poorly understood. During my thesis, I performed a large-scale analysis of the regulatory mechanisms that drive the rapid evolution of alternative splicing. To study the evolution of alternative splicing regulatory mechanisms, I used an extensive RNA-sequencing dataset comprising 12 tetrapod species (human, chimpanzee and bonobo, gorilla, orangutan, macaque, marmoset, mouse, opossum, platypus, chicken and frog) and 8 tissues (cerebellum, brain, heart, kidney, liver, testis, placenta and ovary). To identify the catalogue of alternative splicing eis-acting regulatory elements in the different tetrapod species, I used a previously defined computational approach. This approach is a statistical analysis of exons/introns and splice sites composition and relies on a principle of compensation between splice sites strength and the presence of additional regulators. With an evolutionary comparative analysis of the exonic eis-acting regulators, I showed that these regulatory elements are generally shared among primates and more conserved than non-regulatory elements. In addition, I showed that the usage of these regulatory elements is also more conserved than expected by chance. In addition to the identification of species- specific eis-acting regulators, these results may explain the rapid evolution of alternative splicing. I also developed a new approach based on evolutionary sequence changes and corresponding alternative splicing changes to identify potential splicing eis-acting regulators in primates. The identification of lineage-specific substitutions and corresponding lineage-specific alternative splicing changes, allowed me to annotate the genomic sequences that might have played a role in the alternative splicing pattern differences among primates. Finally, I showed that the identified splicing eis-acting regulator datasets are enriched in human disease-causing mutations, thus confirming their biological relevance.
Resumo:
Several approaches have been developed to estimate both the relative and absolute rates of speciation and extinction within clades based on molecular phylogenetic reconstructions of evolutionary relationships, according to an underlying model of diversification. However, the macroevolutionary models established for eukaryotes have scarcely been used with prokaryotes. We have investigated the rate and pattern of cladogenesis in the genus Aeromonas (γ-Proteobacteria, Proteobacteria, Bacteria) using the sequences of five housekeeping genes and an uncorrelated relaxed-clock approach. To our knowledge, until now this analysis has never been applied to all the species described in a bacterial genus and thus opens up the possibility of establishing models of speciation from sequence data commonly used in phylogenetic studies of prokaryotes. Our results suggest that the genus Aeromonas began to diverge between 248 and 266 million years ago, exhibiting a constant divergence rate through the Phanerozoic, which could be described as a pure birth process.
Resumo:
Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.
Resumo:
The human immunoglobulin lambda variable locus (IGLV) is mapped at chromosome 22 band q11.1-q11.2. The 30 functional germline v-lambda genes sequenced untill now have been subgrouped into 10 families (Vl1 to Vl10). The number of Vl genes has been estimated at approximately 70. This locus is formed by three gene clusters (VA, VB and VC) that encompass the variable coding genes (V) responsible for the synthesis of lambda-type Ig light chains, and the Jl-Cl cluster with the joining segments and the constant genes. Recently the entire variable lambda gene locus was mapped by contig methodology and its one- megabase DNA totally sequenced. All the known functional V-lambda genes and pseudogenes were located. We screened a human genomic DNA cosmid library and isolated a clone with an insert of 37 kb (cosmid 8.3) encompassing four functional genes (IGLV7S1, IGLV1S1, IGLV1S2 and IGLV5a), a pseudogene (VlA) and a vestigial sequence (vg1) to study in detail the positions of the restriction sites surrounding the Vl genes. We generated a high resolution restriction map, locating 31 restriction sites in 37 kb of the VB cluster, a region rich in functional Vl genes. This mapping information opens the perspective for further RFLP studies and sequencing