958 resultados para PROTEIN-CODING GENES
Resumo:
The obligate intracellular bacterium Chlamydia trachomatis is a human pathogen of major public health significance. Strains can be classified into 15 main serovars (A to L3) that preferentially cause ocular infections (A-C), genital infections (D-K) or lymphogranuloma venereum (LGV) (L1-L3), but the molecular basis behind their distinct tropism, ecological success and pathogenicity is not welldefined. Most chlamydial research demands culture in eukaryotic cell lines, but it is not known if stains become laboratory adapted. By essentially using genomics and transcriptomics, we aimed to investigate the evolutionary patterns underlying the adaptation of C. trachomatis to the different human tissues, given emphasis to the identification of molecular patterns of genes encoding hypothetical proteins, and to understand the adaptive process behind the C. trachomatis in vivo to in vitro transition. Our results highlight a positive selection-driven evolution of C. trachomatis towards nichespecific adaptation, essentially targeting host-interacting proteins, namely effectors and inclusion membrane proteins, where some of them also displayed niche-specific expression patterns. We also identified potential "ocular-specific" pseudogenes, and pointed out the major gene targets of adaptive mutations associated with LGV infections. We further observed that the in vivo-derived genetic makeup of C. trachomatis is not significantly compromised by its long-term laboratory propagation. In opposition, its introduction in vitro has the potential to affect the phenotype, likely yielding virulence attenuation. In fact, we observed a "genital-specific" rampant inactivation of the virulence gene CT135, which may impact the interpretation of data derived from studies requiring culture. Globally, the findings presented in this Ph.D. thesis contribute for the understanding of C.trachomatis adaptive evolution and provides new insights into the biological role of C. trachomatishypothetical proteins. They also launch research questions for future functional studies aiming toclarify the determinants of tissue tropism, virulence or pathogenic dissimilarities among C. trachomatisstrains.
Resumo:
A stringent branch-site codon model was used to detect positive selection in vertebrate evolution. We show that the test is robust to the large evolutionary distances involved. Positive selection was detected in 77% of 884 genes studied. Most positive selection concerns a few sites on a single branch of the phylogenetic tree: Between 0.9% and 4.7% of sites are affected by positive selection depending on the branches. No functional category was overrepresented among genes under positive selection. Surprisingly, whole genome duplication had no effect on the prevalence of positive selection, whether the fish-specific genome duplication or the two rounds at the origin of vertebrates. Thus positive selection has not been limited to a few gene classes, or to specific evolutionary events such as duplication, but has been pervasive during vertebrate evolution.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.
Resumo:
Intron splicing is one of the most important steps involved in the maturation process of a pre-mRNA. Although the sequence profiles around the splice sites have been studied extensively, the levels of sequence identity between the exonic sequences preceding the donor sites and the intronic sequences preceding the acceptor sites has not been examined as thoroughly. In this study we investigated identity patterns between the last 15 nucleotides of the exonic sequence preceding the 5' splice site and the intronic sequence preceding the 3' splice site in a set of human protein-coding genes that do not exhibit intron retention. We found that almost 60% of consecutive exons and introns in human protein-coding genes share at least two identical nucleotides at their 3' ends and, on average, the sequence identity length is 2.47 nucleotides. Based on our findings we conclude that the 3' ends of exons and introns tend to have longer identical sequences within a gene than when being taken from different genes. Our results hold even if the pairs are non-consecutive in the transcription order. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The fate of redundant genes resulting from genome duplication is poorly understood. Previous studies indicated that ribosomal RNA genes from one parental origin are epigenetically silenced during interspecific hybridization or polyploidization. Regulatory mechanisms for protein-coding genes in polyploid genomes are unknown, partly because of difficulty in studying expression patterns of homologous genes. Here we apply amplified fragment length polymorphism (AFLP)–cDNA display to perform a genome-wide screen for orthologous genes silenced in Arabidopsis suecica, an allotetraploid derived from Arabidopsis thaliana and Cardaminopsis arenosa. We identified ten genes that are silenced from either A. thaliana or C. arenosa origin in A. suecica and located in four of the five A. thaliana chromosomes. These genes represent a variety of RNA and predicted proteins including four transcription factors such as TCP3. The silenced genes in the vicinity of TCP3 are hypermethylated and reactivated by blocking DNA methylation, suggesting epigenetic regulation is involved in the expression of orthologous genes in polyploid genomes. Compared with classic genetic mutations, epigenetic regulation may be advantageous for selection and adaptation of polyploid species during evolution and development.
Resumo:
Few promoters are active at high levels in all cells. Of these, the majority encode structural RNAs transcribed by RNA polymerases I or III and are not accessible for the expression of proteins. An exception are the small nuclear RNAs (snRNAs) transcribed by RNA polymerase II. Although snRNA biosynthesis is unique and thought not to be compatible with synthesis of functional mRNA, we have tested these promoters for their ability to express functional mRNAs. We have used the murine U1a and U1b snRNA gene promoters to express the Escherichia coli lacZ gene and the human alpha-globin gene from either episomal or integrated templates by transfection, or infection into a variety of mammalian cell types. Equivalent expression of beta-galactosidase was obtained from < 250 nucleotides of 5'-flanking sequence containing the complete promoter of either U1 snRNA gene or from the 750-nt cytomegalovirus promoter and enhancer regions. The mRNA was accurately initiated at the U1 start site, efficiently spliced and polyadenylylated, and localized to polyribosomes. Recombinant adenovirus containing the U1b-lacZ chimeric gene transduced and expressed beta-galactosidase efficiently in human 293 cells and airway epithelial cells in culture. Viral vectors containing U1 snRNA promoters may be an attractive alternative to vectors containing viral promoters for persistent high-level expression of therapeutic genes or proteins.
Resumo:
Background: Myelodysplastic syndromes (MDS) are a group of clonal hematological disorders characterized by ineffective hematopoiesis with morphological evidence of marrow cell dysplasia resulting in peripheral blood cytopenia. Microarray technology has permitted a refined high-throughput mapping of the transcriptional activity in the human genome. Non-coding RNAs (ncRNAs) transcribed from intronic regions of genes are involved in a number of processes related to post-transcriptional control of gene expression, and in the regulation of exon-skipping and intron retention. Characterization of ncRNAs in progenitor cells and stromal cells of MDS patients could be strategic for understanding gene expression regulation in this disease. Methods: In this study, gene expression profiles of CD34(+) cells of 4 patients with MDS of refractory anemia with ringed sideroblasts (RARS) subgroup and stromal cells of 3 patients with MDS-RARS were compared with healthy individuals using 44 k combined intron-exon oligoarrays, which included probes for exons of protein-coding genes, and for non-coding RNAs transcribed from intronic regions in either the sense or antisense strands. Real-time RT-PCR was performed to confirm the expression levels of selected transcripts. Results: In CD34(+) cells of MDS-RARS patients, 216 genes were significantly differentially expressed (q-value <= 0.01) in comparison to healthy individuals, of which 65 (30%) were non-coding transcripts. In stromal cells of MDS-RARS, 12 genes were significantly differentially expressed (q-value <= 0.05) in comparison to healthy individuals, of which 3 (25%) were non-coding transcripts. Conclusions: These results demonstrated, for the first time, the differential ncRNA expression profile between MDS-RARS and healthy individuals, in CD34(+) cells and stromal cells, suggesting that ncRNAs may play an important role during the development of myelodysplastic syndromes.
Resumo:
The clear cell subtype of renal cell carcinoma (RCC) is the most lethal and prevalent cancer of the urinary system. To investigate the molecular changes associated with malignant transformation in clear cell RCC, the gene expression profiles of matched samples of tumor and adjacent non-neoplastic tissue were obtained from six patients. A custom-built cDNA microarray platform was used, comprising 2292 probes that map to exons of genes and 822 probes for noncoding RNAs mapping to intronic regions. Intronic transcription was detected in all normal and neoplastic renal tissues. A subset of 55 transcripts was significantly down-regulated in clear cell RCC relative to the matched nontumor tissue as determined by a combination of two statistical tests and leave-one-out patient cross-validation. Among the down-regulated transcripts, 49 mapped to untranslated or coding exons and 6 were intronic relative to known exons of protein-coding genes. Lower levels of expression of SIN3B, TRIP3, SYNJ2BP and NDE1 (P<0.02), and of intronic transcripts derived from SND1 and ACTN4 loci (P<0.05), were confirmed in clear cell RCC by Real-time RT-PCR. A subset of 25 transcripts was deregulated in additional six nonclear cell RCC samples, pointing to common transcriptional alterations in RCC irrespective of the histological subtype or differentiation state of the tumor. Our results indicate a novel set of tumor suppressor gene candidates, including noncoding intronic RNAs, which may play a significant role in malignant transformations of normal renal cells. (C) 2008 Wiley-Liss, Inc.
Resumo:
Abstract Background Myelodysplastic syndromes (MDS) are a group of clonal hematological disorders characterized by ineffective hematopoiesis with morphological evidence of marrow cell dysplasia resulting in peripheral blood cytopenia. Microarray technology has permitted a refined high-throughput mapping of the transcriptional activity in the human genome. Non-coding RNAs (ncRNAs) transcribed from intronic regions of genes are involved in a number of processes related to post-transcriptional control of gene expression, and in the regulation of exon-skipping and intron retention. Characterization of ncRNAs in progenitor cells and stromal cells of MDS patients could be strategic for understanding gene expression regulation in this disease. Methods In this study, gene expression profiles of CD34+ cells of 4 patients with MDS of refractory anemia with ringed sideroblasts (RARS) subgroup and stromal cells of 3 patients with MDS-RARS were compared with healthy individuals using 44 k combined intron-exon oligoarrays, which included probes for exons of protein-coding genes, and for non-coding RNAs transcribed from intronic regions in either the sense or antisense strands. Real-time RT-PCR was performed to confirm the expression levels of selected transcripts. Results In CD34+ cells of MDS-RARS patients, 216 genes were significantly differentially expressed (q-value ≤ 0.01) in comparison to healthy individuals, of which 65 (30%) were non-coding transcripts. In stromal cells of MDS-RARS, 12 genes were significantly differentially expressed (q-value ≤ 0.05) in comparison to healthy individuals, of which 3 (25%) were non-coding transcripts. Conclusions These results demonstrated, for the first time, the differential ncRNA expression profile between MDS-RARS and healthy individuals, in CD34+ cells and stromal cells, suggesting that ncRNAs may play an important role during the development of myelodysplastic syndromes.
Resumo:
The nucleotide sequences of four genes encoding Trimeresurus gramineus (green habu snake, crotalinae) venom gland phospholipase A2 (PLA2; phosphatidylcholine 2-acylhydrolase, EC 3.1.1.4) isozymes were compared internally and externally with those of six genes encoding Trimeresurus flavoviridis (habu snake, crotalinae) venom gland PLA2 isozymes. The numbers of nucleotide substitutions per site (KN) for the noncoding regions including introns were one-third to one-eighth of the numbers of nucleotide substitutions per synonymous site (KS) for the protein-coding regions of exons, indicating that the noncoding regions are much more conserved than the protein-coding regions. The KN values for the introns were found to be nearly equivalent to those of introns of T. gramineus and T. flavoviridis TATA box-binding protein genes, which are assumed to be a general (nonvenomous) gene. Thus, it is evident that the introns of venom gland PLA2 isozyme genes have evolved at a similar rate to those of nonvenomous genes. The numbers of nucleotide substitutions per nonsynonymous site (KA) were close to or larger than the KS values for the protein-coding regions in venom gland PLA2 isozyme genes. All of the data combined reveal that Darwinian-type accelerated evolution has universally occurred only in the protein-coding regions of crotalinae snake venom PLA2 isozyme genes.
Resumo:
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
Resumo:
Association studies have revealed expression quantitative trait loci (eQTLs) for a large number of genes. However, the causative variants that regulate gene expression levels are generally unknown. We hypothesized that copy-number variation of sequence repeats contribute to the expression variation of some genes. Our laboratory has previously identified that the rare expansion of a repeat c.-174CGGGGCGGGGCG in the promoter region of the CSTB gene causes a silencing of the gene, resulting in progressive myoclonus epilepsy. Here, we genotyped the repeat length and quantified CSTB expression by quantitative real-time polymerase chain reaction in 173 lymphoblastoid cell lines (LCLs) and fibroblast samples from the GenCord collection. The majority of alleles contain either two or three copies of this repeat. Independent analysis revealed that the c.-174CGGGGCGGGGCG repeat length is strongly associated with CSTB expression (P = 3.14 × 10(-11)) in LCLs only. Examination of both genotyped and imputed single-nucleotide polymorphisms (SNPs) within 2 Mb of CSTB revealed that the dodecamer repeat represents the strongest cis-eQTL for CSTB in LCLs. We conclude that the common two or three copy variation is likely the causative cis-eQTL for CSTB expression variation. More broadly, we propose that polymorphic tandem repeats may represent the causative variation of a fraction of cis-eQTLs in the genome.
Resumo:
HD (Huntington's disease) is a late onset heritable neurodegenerative disorder that is characterized by neuronal dysfunction and death, particularly in the cerebral cortex and medium spiny neurons of the striatum. This is followed by progressive chorea, dementia and emotional dysfunction, eventually resulting in death. HD is caused by an expanded CAG repeat in the first exon of the HD gene that results in an abnormally elongated polyQ (polyglutamine) tract in its protein product, Htt (Huntingtin). Wild-type Htt is largely cytoplasmic; however, in HD, proteolytic N-terminal fragments of Htt form insoluble deposits in both the cytoplasm and nucleus, provoking the idea that mutHtt (mutant Htt) causes transcriptional dysfunction. While a number of specific transcription factors and co-factors have been proposed as mediators of mutHtt toxicity, the causal relationship between these Htt/transcription factor interactions and HD pathology remains unknown. Previous work has highlighted REST [RE1 (repressor element 1)-silencing transcription factor] as one such transcription factor. REST is a master regulator of neuronal genes, repressing their expression. Many of its direct target genes are known or suspected to have a role in HD pathogenesis, including BDNF (brain-derived neurotrophic factor). Recent evidence has also shown that REST regulates transcription of regulatory miRNAs (microRNAs), many of which are known to regulate neuronal gene expression and are dysregulated in HD. Thus repression of miRNAs constitutes a second, indirect mechanism by which REST can alter the neuronal transcriptome in HD. We will describe the evidence that disruption to the REST regulon brought about by a loss of interaction between REST and mutHtt may be a key contributory factor in the widespread dysregulation of gene expression in HD.