210 resultados para Gene annotation
em University of Queensland eSpace - Australia
Resumo:
The current prediction or genes in the Plasmodium falciparum genome database relies upon a limited number of specially developed computer algorithms. We have re-annotated the sequence of chromosome 2 of P. falciparum by a computer-assisted manual analysis. which is described here. Of 161 newly predicted introns, we have experimentally confirmed 98. We regard 110 introns from the previously published analyses as probable, we delete 3, change 26 and add 135. We recognise 214 genes in chromosome 2. We have predicted introns in 121 genes. The increased complexity or gene structure on chromosome 2 is likely to be mirrored by the entire genome. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
T he international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM(2), comprised 60,770 full- length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein- coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full- length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web- based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full- length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding ( including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full- length cDNAs. The total number of distinct non- protein- coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and. nal expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
Resumo:
The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.
Resumo:
Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Resumo:
Manual curation has long been held to be the gold standard for functional annotation of DNA sequence. Our experience with the annotation of more than 20,000 full-length cDNA sequences revealed problems with this approach, including inaccurate and inconsistent assignment of gene names, as well as many good assignments that were difficult to reproduce using only computational methods. For the FANTOM2 annotation of more than 60,000 cDNA clones, we developed a number of methods and tools to circumvent some of these problems, including an automated annotation pipeline that provides high-quality preliminary annotation for each sequence by introducing an uninformative filter that eliminates uninformative annotations, controlled vocabularies to accurately reflect both the functional assignments and the evidence supporting them, and a highly refined, Web-based manual annotation tool that allows users to view a wide array of sequence analyses and to assign gene names and putative functions using a consistent nomenclature. The ultimate utility of our approach is reflected in the low rate of reassignment of automated assignments by manual curation. Based on these results, we propose a new standard for large-scale annotation, in which the initial automated annotations are manually investigated and then computational methods are iteratively modified and improved based on the results of manual curation.
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Resumo:
The term secretome has been defined as a set of secreted proteins (Grimmond et al. [2003] Genome Res 13:1350-1359). The term secreted protein encompasses all proteins exported from the cell including growth factors, extracellular proteinases, morphogens, and extracellular matrix molecules. Defining the genes encoding secreted proteins that change in expression during organogenesis, the dynamic secretome, is likely to point to key drivers of morphogenesis. Such secreted proteins are involved in the reciprocal interactions between the ureteric bud (UB) and the metanephric mesenchyme (AM) that occur during organogenesis of the metanephros. Some key metanephric secreted proteins have been identified, but many remain to be determined. In this study, microarray expression profiling of E10.5, E11.5, and E13.5 kidney and consensus bioinformatic analysis were used to define a dynamic secretome of early metanephric development. In situ hybridisation was used to confirm microarray results and clarify spatial expression patterns for these genes. Forty-one secreted factors were dynamically expressed between the E10.5 and E13.5 timeframe profiled, and 25 of these factors had not previously been implicated in kidney development. A text-based anatomical ontology was used to spatially annotate the expression pattern of these genes in cultured metanephric explants.
Resumo:
Cdca4 (Hepp) was originally identified as a gene expressed specifically in hematopoietic progenitor cells as opposed to hematopoietic stem cells. More recently, it has been shown to stimulate p53 activity and also lead to p53-independent growth inhibition when overexpressed. We independently isolated the murine Cdca4 gene in a genomic expression-based screen for genes involved in mammalian craniofacial development, and show that Cdca4 is expressed in a spatio-temporally restricted pattern during mouse embryogenesis. In addition to expression in the facial primordia including the pharyngeal arches, Cdca4 is expressed in the developing limb buds, brain, spinal cord, dorsal root ganglia, teeth, eye and hair follicles. Along with a small number of proteins from a range of species, the predicted CDCA4 protein contains a novel SERTA motif in addition to cyclin A-binding and PHD bromodomain-binding regions of homology. While the function of the SERTA domain is unknown, proteins containing this domain have previously been linked to cell cycle progression and chromatin remodelling. Using in silico database mining we have extended the number of evolutionarily conserved orthologues of known SERTA domain proteins and identified an uncharacterised member of the SERTA domain family, SERTAD4, with orthologues to date in human, mouse, rat, dog, cow, Tetraodon and chicken. Immunolocalisation of transiently and stably transfected epitope-tagged CDCA4 protein in mammalian cells suggests that it resides predominantly in the nucleus throughout all stages of the cell cycle. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we describe the Vannotea system - an application designed to enable collaborating groups to discuss and annotate collections of high quality images, video, audio or 3D objects. The system has been designed specifically to capture and share scholarly discourse and annotations about multimedia research data by teams of trusted colleagues within a research or academic environment. As such, it provides: authenticated access to a web browser search interface for discovering and retrieving media objects; a media replay window that can incorporate a variety of embedded plug-ins to render different scientific media formats; an annotation authoring, editing, searching and browsing tool; and session logging and replay capabilities. Annotations are personal remarks, interpretations, questions or references that can be attached to whole files, segments or regions. Vannotea enables annotations to be attached either synchronously (using jabber message passing and audio/video conferencing) or asynchronously and stand-alone. The annotations are stored on an Annotea server, extended for multimedia content. Their access, retrieval and re-use is controlled via Shibboleth identity management and XACML access policies.
Resumo:
The endosymbiotic bacterium Wolbachia pipientis infects a wide range of arthropods, in which it induces a variety of reproductive phenotypes, including cytoplasmic incompatibility (CI), parthenogenesis, male killing, and reversal of genetic sex determination. The recent sequencing and annotation of the first Wolbachia genome revealed an unusually high number of genes encoding ankyrin domain (ANK) repeats. These ANK genes are likely to be important in mediating the Wolbachia-host interaction. In this work we determined the distribution and expression of the different ANK genes found in the sequenced Wolbachia wMel genome in nine Wolbachia strains that induce different phenotypic effects in their hosts. A comparison of the ANK genes of wMel and the non-CI-inducing wAu Wolbachia strain revealed significant differences between the strains. This was reflected in sequence variability in shared genes that could result in alterations in the encoded proteins, such as motif deletions, amino acid insertions, and in some cases disruptions due to insertion of transposable elements and premature stops. In addition, one wMel ANK gene, which is part of an operon, was absent in the wAu genome. These variations are likely to affect the affinity, function, and cellular location of the predicted proteins encoded by these genes.
Resumo:
Wolbachia are maternally inherited intracellular bacteria that infect a wide range of arthropods and nematodes and are associated with various reproductive abnormalities in their hosts. Insect-associated Wolbachia form a monophyletic clade in the α-Proteobacteria and recently have been separated into two supergroups (A and B) and 19 groups. Our recent polymerase chain reaction (PCR) survey using wsp specific primers indicated that various strains of Wolbachia were present in mosquitoes collected from Southeast Asia. Here, we report the phylogenetic relationship of the Wolbachia strains found in these mosquitoes using wsp gene sequences. Our phylogenetic analysis revealed eight new Wolbachia strains, five in the A supergroup and three in the B supergroup. Most of the Wolbachia strains present in Southeast Asian mosquitoes belong to the established Mors, Con, and Pip groups.
Resumo:
Wolbachia endosymbiotic bacteria are widespread in arthropods and are also present in filarial nematodes. Almost all filarial species so far examined have been found to harbor these endosymbionts. The sequences of only three genes have been published for nematode Wolbachia (i.e., the genes coding for the proteins FtsZ and catalase and for 16S rRNA). Here we present the sequences of the genes coding for the Wolbachia surface protein (WSP) from the endosymbionts of eight species of filaria. Complete gene sequences were obtained from the endosymbionts of two different species, Dirofilaria immitis and Brugia malayi. These sequences allowed us to design general primers for amplification of the wsp gene from the Wolbachia of all filarial species examined. For these species, partial WSP sequences (about 600 base pairs) were obtained with these primers. Phylogenetic analysis groups these nematode wsp sequences into a coherent cluster. Within the nematode cluster, wsp-based Wolbachia phylogeny matches a previous phylogeny obtained with ftsZ gene sequences, with a good consistency of the phylogeny of hosts (nematodes) and symbionts (Wolbachia). In addition, different individuals of the same host species (Dirofilaria immitis and Wuchereria bancrofti) show identical wsp gene sequences.
Resumo:
The dnaA region of Wolbachia, an intracellular bacterial parasite of insects, is unique. A glnA cognate was found upstream of the dnaA gene, while neither of the two open reading frames detected downstream of dnaA has any homologue in the database. This unusual gene arrangement may reflect requirements associated with the unique ecological niche this agent occupies.
Resumo:
The maternally inherited intracellular symbiont Wolbachia pipientis is well known for inducing a variety of reproductive abnormalities in the diverse arthropod hosts it infects. It has been implicated in causing cytoplasmic incompatibility, parthenogenesis, and the feminization of genetic males in different hosts. The molecular mechanisms by which this fastidious intracellular bacterium causes these reproductive and developmental abnormalities have not yet been determined. In this paper, we report on (i) the purification of one of the most abundantly expressed Wolbachia proteins from infected Drosophila eggs and (ii) the subsequent cloning and characterization of the gene (wsp) that encodes it. The functionality of the wsp promoter region was also successfully tested in Escherichia coli. Comparison of sequences of this gene from different strains of Wolbachia revealed a high level of variability. This sequence variation correlated with the ability of certain Wolbachia strains to induce or rescue the cytoplasmic incompatibility phenotype in infected insects. As such, this gene will be a very useful tool for Wolbachia strain typing and phylogenetic analysis, as well as understanding the molecular basis of the interaction of Wolbachia with its host.
Resumo:
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.