952 resultados para DNA sequence
Resumo:
In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
Background: Despite the continuous production of genome sequence for a number of organisms,reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularlytrue for genomes for which there is not a large collection of known gene sequences, such as therecently published chicken genome. We used the chicken sequence to test comparative andhomology-based gene-finding methods followed by experimental validation as an effective genomeannotation method.Results: We performed experimental evaluation by RT-PCR of three different computational genefinders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram wascomputed and each component of it was evaluated. The results showed that de novo comparativemethods can identify up to about 700 chicken genes with no previous evidence of expression, andcan correctly extend about 40% of homology-based predictions at the 5' end.Conclusions: De novo comparative gene prediction followed by experimental verification iseffective at enhancing the annotation of the newly sequenced genomes provided by standardhomology-based methods.
Resumo:
PHO1 has been recently identified as a protein involved in the loading of inorganic phosphate into the xylem of roots in Arabidopsis. The genome of Arabidopsis contains 11 members of the PHO1 gene family. The cDNAs of all PHO1 homologs have been cloned and sequenced. All proteins have the same topology and harbor a SPX tripartite domain in the N-terminal hydrophilic portion and an EXS domain in the C-terminal hydrophobic portion. The SPX and EXS domains have been identified in yeast (Saccharomyces cerevisiae) proteins involved in either phosphate transport or sensing or in sorting proteins to endomembranes. The Arabidopsis genome contains additional proteins of unknown function containing either a SPX or an EXS domain. Phylogenetic analysis indicated that the PHO1 family is subdivided into at least three clusters. Reverse transcription-PCR revealed a broad pattern of expression in leaves, roots, stems, and flowers for most genes, although two genes are expressed exclusively in flowers. Analysis of the activity of the promoter of all PHO1 homologs using promoter-beta-glucuronidase fusions revealed a predominant expression in the vascular tissues of roots, leaves, stems, or flowers. beta-Glucuronidase expression is also detected for several promoters in nonvascular tissue, including hydathodes, trichomes, root tip, root cortical/epidermal cells, and pollen grains. The expression pattern of PHO1 homologs indicates a likely role of the PHO1 proteins not only in the transfer of phosphate to the vascular cylinder of various tissues but also in the acquisition of phosphate into cells, such as pollen or root epidermal/cortical cells.
Resumo:
OBJECTIVE: To identify the genetic causes underlying early-onset autosomal recessive retinitis pigmentosa (arRP) in the Spanish population and describe the associated phenotype. DESIGN: Case series. PARTICIPANTS: A total of 244 unrelated families affected by early-onset arRP. METHODS: Homozygosity mapping or exome sequencing analysis was performed in 3 families segregating arRP. A mutational screening was performed in 241 additional unrelated families for the p.Ser452Stop mutation. Haplotype analysis also was conducted. Individuals who were homozygotes, double heterozygotes, or carriers of mutations in RP1 underwent an ophthalmic evaluation to establish a genotype-phenotype correlation. MAIN OUTCOME MEASURES: DNA sequence variants, homozygous regions, haplotypes, best-corrected visual acuity, visual field assessments, electroretinogram responses, and optical coherence tomography images. RESULTS: Four novel mutations in RP1 were identified. The new mutation p.Ser542Stop was present in 11 of 244 (4.5%) of the studied families. All chromosomes harboring this mutation shared the same haplotype. All patients presented a common phenotype with an early age of onset and a prompt macular degeneration, whereas the heterozygote carriers did not show any signs of retinitis pigmentosa (RP). CONCLUSIONS: p.Ser542Stop is a single founder mutation and the most prevalent described mutation in the Spanish population. It causes early-onset RP with a rapid macular degeneration and is responsible for 4.5% of all cases. Our data suggest that the implication of RP1 in arRP may be underestimated. FINANCIAL DISCLOSURE(S): The author(s) have no proprietary or commercial interest in any materials discussed in this article.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
BACKGROUND: MYCN oncogene amplification has been defined as the most important prognostic factor for neuroblastoma (NB), the most common solid extracranial neoplasm in children. High copy numbers are strongly associated with rapid tumor progression and poor outcome, independently of tumor stage or patient age, and this has become an important factor in treatment stratification. PROCEDURE: By real-time quantitative PCR analysis, we evaluated the clinical relevance of circulating MYCN DNA of 267 patients with locoregional or metastatic NB in children less than 18 months of age. RESULTS: For patients in this age group with INSS stage 4 or 4S NB and stage 3 patients, serum-based determination of MYCN DNA sequences had good sensitivity (85%, 83%, and 75% respectively) and high specificity (100%) when compared to direct tumor gene determination. In contrast, the approach showed low sensitivity patients with stages 1 and 2 disease. CONCLUSION: Our results show that the sensitivity of the serum-based MYCN DNA sequence determination depends on the stage of the disease. However, this simple, reproducible assay may represent a reasonably sensitive and very specific tool to assess tumor MYCN status in cases with stage 3 and metastatic disease for whom a wait and see strategy is often recommended.
Resumo:
In the ecologically important arbuscular mycorrhizal fungi (AMF), Sod1 encodes a functional polypeptide that confers increased tolerance to oxidative stress and that is upregulated inside the roots during early steps of the symbiosis with host plants. It is still unclear whether its expression is directed at scavenging reactive oxygen species (ROS) produced by the host, if it plays a role in the fungus-host dialogue, or if it is a consequence of oxidative stress from the surrounding environment. All these possibilities are equally likely, and molecular variation at the Sod1 locus can possibly have adaptive implications for one or all of the three mentioned functions. In this paper, we analyzed the diversity of the Sod1 gene in six AMF species, as well as 14 Glomus intraradices isolates from a single natural population. By sequencing this locus, we identified a large amount of nucleotide and amino acid molecular diversity both among AMF species and individuals, suggesting a rapid divergence of its codons. The Sod1 gene was monomorphic within each isolate we analyzed, and quantitative PCR strongly suggest this locus is present as a single copy in G. intraradices. Maximum-likelihood analyses performed using a variety of models for codon evolution indicated that a number of amino acid sites most likely evolved under the regime of positive selection among AMF species. In addition, we found that some isolates of G. intraradices from a natural population harbor very divergent orthologous Sod1 sequences, and our analysis suggested that diversifying selection, rather than recombination, was responsible for the persistence of this molecular diversity within the AMF population.
Resumo:
Parasites of the Leishmania Viannia subgenus are major causative agents of mucocutaneous leishmaniasis (MCL), a disease characterised by parasite dissemination (metastasis) from the original cutaneous lesion to form debilitating secondary lesions in the nasopharyngeal mucosa. We employed a protein profiling approach to identify potential metastasis factors in laboratory clones of L. (V.) guyanensis with stable phenotypes ranging from highly metastatic (M+) through infrequently metastatic (M+/M-) to non-metastatic (M-). Comparison of the soluble proteomes of promastigotes by two-dimensional electrophoresis revealed two abundant protein spots specifically associated with M+ and M+/M- clones (Met2 and Met3) and two others exclusively expressed in M- parasites (Met1 and Met4). The association between clinical disease phenotype and differential expression of Met1-Met4 was less clear in L. Viannia strains from mucosal (M+) or cutaneous (M-) lesions of patients. Identification of Met1-Met4 by biological mass spectrometry (LC-ES-MS/MS) and bioinformatics revealed that M+ and M- clones express distinct acidic and neutral isoforms of both elongation factor-1 subunit beta (EF-1beta) and cytosolic tryparedoxin peroxidase (TXNPx). This interchange of isoforms may relate to the mechanisms by which the activities of EF-1beta and TXNPx are modulated, and/or differential post-translational modification of the gene product(s). The multiple metabolic functions of EF-1 and TXNPx support the plausibility of their participation in parasite survival and persistence and thereby, metastatic disease. Both polypeptides are active in resistance to chemical and oxidant stress, providing a basis for further elucidation of the importance of antioxidant defence in the pathogenesis underlying MCL.
Resumo:
We report a Spanish family with autosomal-dominant non-neuropathic hereditary amyloidosis with a unique hepatic presentation and death from liver failure, usually by the sixth decade. The disease is caused by a previously unreported deletion/insertion mutation in exon 4 of the apolipoprotein AI (apoAI) gene encoding loss of residues 60-71 of normal mature apoAI and insertion at that position of two new residues, ValThr. Affected individuals are heterozygous for this mutation and have both normal apoAI and variant molecules bearing one extra positive charge, as predicted from the DNA sequence. The amyloid fibrils are composed exclusively of NH2-terminal fragments of the variant, ending mainly at positions corresponding to residues 83 and 92 in the mature wild-type sequence. Amyloid fibrils derived from the other three known amyloidogenic apoAI variants are also composed of similar NH2-terminal fragments. All known amyloidogenic apoAI variants carry one extra positive charge in this region, suggesting that it may be responsible for their enhanced amyloidogenicity. In addition to causing a new phenotype, this is the first deletion mutation to be described in association with hereditary amyloidosis and it significantly extends the value of the apoAI model for investigation of molecular mechanisms of amyloid fibrillogenesis.
Resumo:
Background and Aims Paleoclimatic data indicate that an abrupt climate change occurred at the Eocene-Oligocene (E-O) boundary affecting the distribution of tropical forests on Earth. The same period has seen the emergence of South-East (SE) Asia, caused by the collision of the Eurasian and Australian plates. How the combination of these climatic and geomorphological factors affected the spatio-temporal history of angiosperms is little known. This topic is investigated by using the worldwide sapindaceous clade as a case study. Methods Analyses of divergence time inference, diversification and biogeography (constrained by paleogeography) are applied to a combined plastid and nuclear DNA sequence data set. Biogeographical and diversification analyses are performed over a set of trees to take phylogenetic and dating uncertainty into account. Results are analysed in the context of past climatic fluctuations. Key Results An increase in the number of dispersal events at the E-O boundary is recorded, which intensified during the Miocene. This pattern is associated with a higher rate in the emergence of new genera. These results are discussed in light of the geomorphological importance of SE Asia, which acted as a tropical bridge allowing multiple contacts between areas and additional speciation across landmasses derived from Laurasia and Gondwana. Conclusions This study demonstrates the importance of the combined effect of geomorphological (the emergence of most islands in SE Asia approx. 30 million years ago) and climatic (the dramatic E-O climate change that shifted the tropical belt and reduced sea levels) factors in shaping species distribution within the sapindaceous clade.
Resumo:
In human transcriptional regulation, DNA-sequence-specific factors can associate with intermediaries that orchestrate interactions with a diverse set of chromatin-modifying enzymes. One such intermediary is HCFC1 (also known as HCF-1). HCFC1, first identified in herpes simplex virus transcription, has a poorly defined role in cellular transcriptional regulation. We show here that, in HeLa cells, HCFC1 is observed bound to 5400 generally active CpG-island promoters. Examination of the DNA sequences underlying the HCFC1-binding sites revealed three sequence motifs associated with the binding of (1) ZNF143 and THAP11 (also known as Ronin), (2) GABP, and (3) YY1 sequence-specific transcription factors. Subsequent analysis revealed colocalization of HCFC1 with these four transcription factors at ∼90% of the 5400 HCFC1-bound promoters. These studies suggest that a relatively small number of transcription factors play a major role in HeLa-cell transcriptional regulation in association with HCFC1.
Resumo:
BACKGROUND: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. CONCLUSION: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.
Resumo:
We describe the transcriptional potentiation in estrogen responsive transcription extracts of the Xenopus vitellogenin B1 gene promoter through the formation of a positioned nucleosome. Nuclease digestion and hydroxyl radical cleavage indicate that strong, DNA sequence-directed positioning of a nucleosome occurs between -300 and -140 relative to the start site of transcription. Deletion of this DNA sequence abolishes the potentiation of transcription due to nucleosome assembly. The wrapping of DNA around the histone core of the nucleosome positioned between -300 and -140 creates a static loop in which distal estrogen receptor binding sites are brought close to proximal promoter elements. This might facilitate interactions between the trans-acting factors themselves and/or RNA polymerase. Such a nucleosome provides an example of how chromatin structure might have a positive effect on the transcription process.
Resumo:
In addition to differences in protein-coding gene sequences, changes in expression resulting from mutations in regulatory sequences have long been hypothesized to be responsible for phenotypic differences between species. However, unlike comparison of genome sequences, few studies, generally restricted to pairwise comparisons of closely related mammalian species, have assessed between-species differences at the transcriptome level. They reported that gene expression evolves at different rates in various organs and in a pattern that is overall consistent with neutral models of evolution. In the first part of my thesis, I investigated the evolution of gene expression in therian mammals (i.e.7 placental and marsupials), based on microarray data from human, mouse and the gray short-tailed opossum (Monodelphis domestica). In addition to autosomal genes, a special focus was given to the evolution of X-linked genes. The therian X chromosome was recently shown to be younger than previously thought and to harbor a specific gene content (e.g., genes involved in brain or reproductive functions) that is thought to have been shaped by specific sex-related evolutionary forces. Sex chromosomes derive from ordinary autosomes and their differentiation led to the degeneration of the Y chromosome (in mammals) or W chromosome (in birds). Consequently, X- or Z-linked genes differ in gene dose between males and females such that the heterogametic sex has half the X/Z gene dose compared to the ancestral state. To cope with this dosage imbalance, mammals have been reported to have evolved mechanisms of dosage compensation.¦In the first project, I could first show that transcriptomes evolve at different rates in different organs. Out of the five tissues I investigated, the testis is the most rapidly evolving organ at the gene expression level while the brain has the most conserved transcriptome. Second, my analyses revealed that mammalian gene expression evolution is compatible with a neutral model, where the rates of change in gene expression levels is linked to the efficiency of purifying selection in a given lineage, which, in turn, is determined by the long-term effective population size in that lineage. Thus, the rate of DNA sequence evolution, which could be expected to determine the rate of regulatory sequence change, does not seem to be a major determinant of the rate of gene expression evolution. Thus, most gene expression changes seem to be (slightly) deleterious. Finally, X-linked genes seem to have experienced elevated rates of gene expression change during the early stage of X evolution. To further investigate the evolution of mammalian gene expression, we generated an extensive RNA-Seq gene expression dataset for nine mammalian species and a bird. The analyses of this dataset confirmed the patterns previously observed with microarrays and helped to significantly deepen our view on gene expression evolution.¦In a specific project based on these data, I sought to assess in detail patterns of evolution of dosage compensation in amniotes. My analyses revealed the absence of male to female dosage compensation in monotremes and its presence in marsupials and, in addition, confirmed patterns previously described for placental mammals and birds. I then assessed the global level of expression of X/Z chromosomes and contrasted this with its ancestral gene expression levels estimated from orthologous autosomal genes in species with non-homologous sex chromosomes. This analysis revealed a lack of up-regulation for placental mammals, the level of expression of X-linked genes being proportional to gene dose. Interestingly, the ancestral gene expression level was at least partially restored in marsupials as well as in the heterogametic sex of monotremes and birds. Finally, I investigated alternative mechanisms of dosage compensation and found that gene duplication did not seem to be a widespread mechanism to restore the ancestral gene dose. However, I could show that placental mammals have preferentially down-regulated autosomal genes interacting with X-linked genes which underwent gene expression decrease, and thus identified a novel alternative mechanism of dosage compensation.