33 resultados para Sequence analysis
Resumo:
Background: Despite the continuous production of genome sequence for a number of organisms,reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularlytrue for genomes for which there is not a large collection of known gene sequences, such as therecently published chicken genome. We used the chicken sequence to test comparative andhomology-based gene-finding methods followed by experimental validation as an effective genomeannotation method.Results: We performed experimental evaluation by RT-PCR of three different computational genefinders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram wascomputed and each component of it was evaluated. The results showed that de novo comparativemethods can identify up to about 700 chicken genes with no previous evidence of expression, andcan correctly extend about 40% of homology-based predictions at the 5' end.Conclusions: De novo comparative gene prediction followed by experimental verification iseffective at enhancing the annotation of the newly sequenced genomes provided by standardhomology-based methods.
Resumo:
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Resumo:
Immunity-related GTPases (IRG) play an important role in defense against intracellular pathogens. One member of this gene family in humans, IRGM, has been recently implicated as a risk factor for Crohn's disease. We analyzed the detailed structure of this gene family among primates and showed that most of the IRG gene cluster was deleted early in primate evolution, after the divergence of the anthropoids from prosimians ( about 50 million years ago). Comparative sequence analysis of New World and Old World monkey species shows that the single-copy IRGM gene became pseudogenized as a result of an Alu retrotransposition event in the anthropoid common ancestor that disrupted the open reading frame (ORF). We find that the ORF was reestablished as a part of a polymorphic stop codon in the common ancestor of humans and great apes. Expression analysis suggests that this change occurred in conjunction with the insertion of an endogenous retrovirus, which altered the transcription initiation, splicing, and expression profile of IRGM. These data argue that the gene became pseudogenized and was then resurrected through a series of complex structural events and suggest remarkable functional plasticity where alleles experience diverse evolutionary pressures over time. Such dynamism in structure and evolution may be critical for a gene family locked in an arms race with an ever-changing repertoire of intracellular parasites.
Resumo:
The As Pontes basin (12 km2), NW Iberian Peninsula, is bounded by a double restraining bend of a dextral strike-slip fault, which is related to the western onshore end of the Pyrenean belt. Surface and subsurface data obtained from intensive coal exploration and mining in the basin since the 1960s together with additional structural and stratigraphic sequence analysis allowed us to determine the geometric relationships between tectonic structures and stratigraphic markers. The small size of the basin and the large amount of quality data make the As Pontes basin a unique natural laboratory for improving our understanding of the origin and evolution of restraining bends. The double restraining bend is the end stage of the structural evolution of a compressive underlapping stepover, where the basin was formed. During the first stage (stepover stage), which began ca. 30 Ma ago (latest Rupelian) and lasted 3.4 My, two small isolated basins bounded by thrusts and normal faults were formed. For 1.3 My, the strike-slip faults, which defined the stepover, grew towards each other until joining and forming the double restraining bend, which bounds one large As Pontes basin (transition stage). The history of the basin was controlled by the activity of the double restraining bend for a further 3.4 My (restraining bend stage) and ended in mid-Aquitanian times (ca. 22 Ma).
Resumo:
Background: The RPS4 gene codifies for ribosomal protein S4, a very well-conserved protein present in all kingdoms. In primates, RPS4 is codified by two functional genes located on both sex chromosomes: the RPS4X and RPS4Y genes. In humans, RPS4Y is duplicated and the Y chromosome therefore carries a third functional paralog: RPS4Y2, which presents a testis-specific expression pattern. Results: DNA sequence analysis of the intronic and cDNA regions of RPS4Y genes from species covering the entire primate phylogeny showed that the duplication event leading to the second Y-linked copy occurred after the divergence of New World monkeys, about 35 million years ago. Maximum likelihood analyses of the synonymous and non-synonymous substitutions revealed that positive selection was acting on RPS4Y2 gene in the human lineage, which represents the first evidence of positive selection on a ribosomal protein gene. Putative positive amino acid replacements affected the three domains of the protein: one of these changes is located in the KOW protein domain and affects the unique invariable position of this motif, and might thus have a dramatic effect on the protein function.Conclusion: Here, we shed new light on the evolutionary history of RPS4Y gene family, especially on that of RPS4Y2. The results point that the RPS4Y1 gene might be maintained to compensate gene dosage between sexes, while RPS4Y2 might have acquired a new function, at least in the lineage leading to humans.
Resumo:
Background: Freshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system. Results: In order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function. Conclusions: We report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most expressed genes of S. mediterranea. Analysis of the annotated data promises to contribute to identification of gene families poorly characterized at a functional level. The Smed454 transcriptome data will assist in the molecular characterization of S. mediterranea as a model organism, which will be useful to a broad scientific community.
Resumo:
Adenoviruses of primates include human (HAdV) and simian (SAdV) isolates classified into 8 species (Human Adenovirus A to G, and Simian Adenovirus A). In this study, a novel adenovirus was isolated from a colony of cynomolgus macaques (Macaca fascicularis) and subcultured in VERO cells. Its complete genome was purified and a region encompassing the hexon gene, the protease gene, the DNA binding protein (DBP) and the 100 kDa protein was amplified by PCR and sequenced by primer walking. Sequence analysis of these four genes showed that the new isolate had 80% identity to other primate adenoviruses and lacked recombination events. The study of the evolutionary relationships of this new monkey AdV based on the combined sequences of the four genes supported a close relationship to SAdV-3 and SAdV-6, lineages isolated from Rhesus monkeys. The clade formed by these three types is separated from the remaining clades and establishes a novel branch that is related to species HAdV-A, F and G. However, the genetic distance corresponding to the newly isolated monkey AdV considerably differs from these as to belong to a new, not yet established species. Results presented here widen our knowledge on SAdV and represents an important contribution to the understanding of the evolutionary history of primate adenoviruses.
Resumo:
Background Freshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system. Results In order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function. Conclusions We report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most expressed genes of S. mediterranea. Analysis of the annotated data promises to contribute to identification of gene families poorly characterized at a functional level. The Smed454 transcriptome data will assist in the molecular characterization of S. mediterranea as a model organism, which will be useful to a broad scientific community.
Resumo:
Background: Non-long terminal repeat (non-LTR) retrotransposons have contributed to shaping the structure and function of genomes. In silico and experimental approaches have been used to identify the non-LTR elements of the urochordate Ciona intestinalis. Knowledge of the types and abundance of non-LTR elements in urochordates is a key step in understanding their contribution to the structure and function of vertebrate genomes. Results: Consensus elements phylogenetically related to the I, LINE1, LINE2, LOA and R2 elements of the 14 eukaryotic non-LTR clades are described from C. intestinalis. The ascidian elements showed conservation of both the reverse transcriptase coding sequence and the overall structural organization seen in each clade. The apurinic/apyrimidinic endonuclease and nucleic-acid-binding domains encoded upstream of the reverse transcriptase, and the RNase H and the restriction enzyme-like endonuclease motifs encoded downstream of the reverse transcriptase were identified in the corresponding Ciona families. Conclusions: The genome of C. intestinalis harbors representatives of at least five clades of non-LTR retrotransposons. The copy number per haploid genome of each element is low, less than 100, far below the values reported for vertebrate counterparts but within the range for protostomes. Genomic and sequence analysis shows that the ascidian non-LTR elements are unmethylated and flanked by genomic segments with a gene density lower than average for the genome. The analysis provides valuable data for understanding the evolution of early chordate genomes and enlarges the view on the distribution of the non-LTR retrotransposons in eukaryotes.
Resumo:
Background: Non-long terminal repeat (non-LTR) retrotransposons have contributed to shaping the structure and function of genomes. In silico and experimental approaches have been used to identify the non-LTR elements of the urochordate Ciona intestinalis. Knowledge of the types and abundance of non-LTR elements in urochordates is a key step in understanding their contribution to the structure and function of vertebrate genomes. Results: Consensus elements phylogenetically related to the I, LINE1, LINE2, LOA and R2 elements of the 14 eukaryotic non-LTR clades are described from C. intestinalis. The ascidian elements showed conservation of both the reverse transcriptase coding sequence and the overall structural organization seen in each clade. The apurinic/apyrimidinic endonuclease and nucleic-acid-binding domains encoded upstream of the reverse transcriptase, and the RNase H and the restriction enzyme-like endonuclease motifs encoded downstream of the reverse transcriptase were identified in the corresponding Ciona families. Conclusions: The genome of C. intestinalis harbors representatives of at least five clades of non-LTR retrotransposons. The copy number per haploid genome of each element is low, less than 100, far below the values reported for vertebrate counterparts but within the range for protostomes. Genomic and sequence analysis shows that the ascidian non-LTR elements are unmethylated and flanked by genomic segments with a gene density lower than average for the genome. The analysis provides valuable data for understanding the evolution of early chordate genomes and enlarges the view on the distribution of the non-LTR retrotransposons in eukaryotes.
Resumo:
Adenoviruses of primates include human (HAdV) and simian (SAdV) isolates classified into 8 species (Human Adenovirus A to G, and Simian Adenovirus A). In this study, a novel adenovirus was isolated from a colony of cynomolgus macaques (Macaca fascicularis) and subcultured in VERO cells. Its complete genome was purified and a region encompassing the hexon gene, the protease gene, the DNA binding protein (DBP) and the 100 kDa protein was amplified by PCR and sequenced by primer walking. Sequence analysis of these four genes showed that the new isolate had 80% identity to other primate adenoviruses and lacked recombination events. The study of the evolutionary relationships of this new monkey AdV based on the combined sequences of the four genes supported a close relationship to SAdV-3 and SAdV-6, lineages isolated from Rhesus monkeys. The clade formed by these three types is separated from the remaining clades and establishes a novel branch that is related to species HAdV-A, F and G. However, the genetic distance corresponding to the newly isolated monkey AdV considerably differs from these as to belong to a new, not yet established species. Results presented here widen our knowledge on SAdV and represents an important contribution to the understanding of the evolutionary history of primate adenoviruses.
Resumo:
Background: In July 2005 an outbreak of acute gastroenteritis occurred on a residential summer camp in the province of Barcelona (northeast of Spain). Forty-four people were affected among residents and employees. All of them had in common a meal at lunch time on 13 July (paella, round of beef and fruit). The aim of this study was to investigate a foodborne norovirus outbreak that occurred in the residential summer camp and in which the implication of a food handler was demonstrated by laboratory tests. Methods: A retrospective cohort study was designed. Personal or telephone interview was carried out to collect demographic, clinical and microbiological data of the exposed people, as well as food consumption in the suspected lunch. Food handlers of the mentioned summer camp were interviewed. Ten stool samples were requested from symptomatic exposed residents and the three food handlers that prepared the suspected food. Stools were tested for bacteries and noroviruses. Norovirus was detected using RT-PCR and sequence analysis. Attack rate, relative risks (RR) and its 95% confidence intervals (CI) were calculated to assess the association between food consumption and disease. Results: The global attack rate of the outbreak was 55%. The main symptoms were abdominal pain (90%), nausea (85%), vomiting (70%) and diarrhoea (42.5%). The disease remitted in 24-48 hours. Norovirus was detected in seven faecal samples, one of them was from an asymptomatic food handler who had not eaten the suspected food (round of beef), but cooked and served the lunch. Analysis of the two suspected foods isolated no pathogenic bacteria and detected no viruses. Molecular analysis showed that the viral strain was the same in ill patients and in the asymptomatic food handler (genotype GII.2 Melksham-like). Conclusions: In outbreaks of foodborne disease, the search for viruses in affected patients and all food handlers, even in those that are asymptomatic, is essential. Health education of food handlers with respect to hand washing should be promoted.
Resumo:
The As Pontes basin (12 km2), NW Iberian Peninsula, is bounded by a double restraining bend of a dextral strike-slip fault, which is related to the western onshore end of the Pyrenean belt. Surface and subsurface data obtained from intensive coal exploration and mining in the basin since the 1960s together with additional structural and stratigraphic sequence analysis allowed us to determine the geometric relationships between tectonic structures and stratigraphic markers. The small size of the basin and the large amount of quality data make the As Pontes basin a unique natural laboratory for improving our understanding of the origin and evolution of restraining bends. The double restraining bend is the end stage of the structural evolution of a compressive underlapping stepover, where the basin was formed. During the first stage (stepover stage), which began ca. 30 Ma ago (latest Rupelian) and lasted 3.4 My, two small isolated basins bounded by thrusts and normal faults were formed. For 1.3 My, the strike-slip faults, which defined the stepover, grew towards each other until joining and forming the double restraining bend, which bounds one large As Pontes basin (transition stage). The history of the basin was controlled by the activity of the double restraining bend for a further 3.4 My (restraining bend stage) and ended in mid-Aquitanian times (ca. 22 Ma).
Resumo:
Background: Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs).Results: We identified 137 polymorphic variants in 115 different amino acid tandem repeats. Of these, 77 contained amino acid substitutions and 60 contained gaps (expansions or contractions of the repeat unit). The analysis showed that at least about 21% of the repeats might be polymorphic in humans. We compared the mutations found in different types of amino acid repeats and in adjacent regions. Overall, repeats showed a five-fold increase in the number of gap mutations compared to adjacent regions, reflecting the action of slippage within the repetitive structures. Gap and substitution mutations were very differently distributed between different amino acid repeat types. Among repeats containing gap variants we identified several disease and candidate disease genes.Conclusion: This is the first report at a genome-wide scale of the types of mutations occurring in the amino acid repeat component of the human proteome. We show that the mutational dynamics of different amino acid repeat types are very diverse. We provide a list of loci with highly variable repeat structures, some of which may be potentially involved in disease.
Resumo:
The gibbon genome exhibits extensive karyotypic diversity with an increased rate of chromosomal rearrangements during evolution. In an effort to understand the mechanistic origin and implications of these rearrangement events, we sequenced 24 synteny breakpoint regions in the white-cheeked gibbon (Nomascus leucogenys, NLE) in the form of high-quality BAC insert sequences (4.2 Mbp). While there is a significant deficit of breakpoints in genes, we identified seven human gene structures involved in signaling pathways (DEPDC4, GNG10), phospholipid metabolism (ENPP5, PLSCR2), beta-oxidation (ECH1), cellular structure and transport (HEATR4), and transcription (ZNF461), that have been disrupted in the NLE gibbon lineage. Notably, only three of these genes show the expected evolutionary signatures of pseudogenization. Sequence analysis of the breakpoints suggested both nonclassical nonhomologous end-joining (NHEJ) and replication-based mechanisms of rearrangement. A substantial number (11/24) of human-NLE gibbon breakpoints showed new insertions of gibbon-specific repeats and mosaic structures formed from disparate sequences including segmental duplications, LINE, SINE, and LTR elements. Analysis of these sites provides a model for a replication-dependent repair mechanism for double-strand breaks (DSBs) at rearrangement sites and insights into the structure and formation of primate segmental duplications at sites of genomic rearrangements during evolution.