38 resultados para Annotation de génomes
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST),program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.
Resumo:
Most of the tasks in genome annotation can be at least partially automated. Since this annotation is time-consuming, facilitating some parts of the process - thus freeing the specialist to carry out more valuable tasks - has been the motivation of many tools and annotation environments. In particular, annotation of protein function can benefit from knowledge about enzymatic processes. The use of sequence homology alone is not a good approach to derive this knowledge when there are only a few homologues of the sequence to be annotated. The alternative is to use motifs. This paper uses a symbolic machine learning approach to derive rules for the classification of enzymes according to the Enzyme Commission (EC). Our results show that, for the top class, the average global classification error is 3.13%. Our technique also produces a set of rules relating structural to functional information, which is important to understand the protein tridimensional structure and determine its biological function. © 2009 Springer Berlin Heidelberg.
Resumo:
The cellular and molecular characteristics of a cell line (BME26) derived from embryos of the cattle tick Rhipicephalus (Boophilus) microplus were studied. The cells contained glycogen inclusions, numerous mitochondria, and vesicles with heterogeneous electron densities dispersed throughout the cytoplasm. Vesicles contained lipids and sequestered palladium meso-porphyrin (Pd-mP) and rhodamine-hemoglobin, suggesting their involvement in the autophagic and endocytic pathways. The cells phagocytosed yeast and expressed genes encoding the antimicrobial peptides (microplusin and defensin). A cDNA library was made and 898 unique mRNA sequences were obtained. Among them, 556 sequences were not significantly similar to any sequence found in public databases. Annotation using Gene Ontology revealed transcripts related to several different functional classes. We identified transcripts involved in immune response such as ferritin, serine proteases, protease inhibitors,. antimicrobial peptides, heat shock protein, glutathione S-transferase, peroxidase, and NADPH oxidase. BME26 cells transfected with a plasmid carrying a red fluorescent protein reporter gene (DsRed2) transiently expressed DsRed2 for up to 5 weeks. We conclude that BME26 can be used to experimentally analyze diverse biological processes that occur in R. (B.) microplus such as the innate immune response to tick-borne pathogens. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
A detailed genome mapping analysis of 213,636 expressed sequence tags (EST) derived from nontumor and tumor tissues of the oral cavity, larynx, pharynx, and thyroid was done. Transcripts matching known human genes were identified; potential new splice variants were flagged and subjected to manual curation, pointing to 788 putatively new alternative splicing isoforms, the majority (75%) being insertion events. A subset of 34 new splicing isoforms (5% of 788 events) was selected and 23 (68%) were confirmed by reverse transcription-PCR and DNA sequencing. Putative new genes were revealed, including six transcripts mapped to well-studied chromosomes such as 22, as well as transcripts that mapped to 253 intergenic regions. In addition, 2,251 noncoding intronic RNAs, eventually involved in transcriptional regulation, were found. A set of 250 candidate markers for loss of heterozygosis or gene amplification was selected by identifying transcripts that mapped to genomic regions previously known to be frequently amplified or deleted in head, neck, and thyroid tumors. Three of these markers were evaluated by quantitative reverse transcription-PCR in an independent set of individual samples. Along with detailed clinical data about tumor origin, the information reported here is now publicly available on a dedicated Web site as a resource for further biological investigation. This first in silico reconstruction of the head, neck, and thyroid transcriptomes points to a wealth of new candidate markers that can be used for future studies on the molecular basis of these tumors. Similar analysis is warranted for a number of other tumors for which large EST data sets are available.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTEs were assembled into 81,429 contigs. of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTEs sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTEs coincided with DNA regions predicted as encoding exons by GENSCAN.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Xanthomonas axonopodis pv. citri (Xac) causes citrus canker and the completion of the Xac genome sequence has opened up the possibility of investigating basic cellular mechanisms at the genomic level. Copper compounds have been extensively used in agriculture to control plant diseases. The copA and copB genes, identified by annotation of the Xac genome, encode homologues of proteins involved in copper resistance. A gene expression assay by Northern blotting revealed that copA and copB are expressed as a unique transcript specifically induced by copper. Synthesis of the gene products was also induced by copper, reaching a maximum level at 4 h after addition of copper to the culture medium. CopA was a cytosolic protein and CopB was detected in the cytoplasmic membrane. The gene encoding CopA was disrupted by the insertion of a transposon, leading to mutant strains that were unable to grow in culture medium containing copper, even at the lowest CUSO4 concentration tested (0.25 mM), whereas the wild-type strain was able to grow in the presence of 1 mM copper. Cell suspensions of the wild-type and mutant strains in different copper concentrations were inoculated in lemon leaves to analyse their ability to induce citrus canker symptoms. Cells of mutant strains showed higher sensitivity than the wild-type strain in the presence of copper, i.e. they were not able to induce citrus canker symptoms at high copper concentrations and exhibited a more retarded growth in planta.
Resumo:
Whereas genome sequencing defines the genetic potential of an organism, transcript sequencing defines the utilization of this potential and links the genome with most areas of biology. To exploit the information within the human genome in the fight against cancer, we have deposited some two million expressed sequence tags (ESTs) from human tumors and their corresponding normal tissues in the public databases. The data currently define approximate to23,500 genes, of which only approximate to1,250 are still represented only by ESTs. Examination of the EST coverage of known cancer-related (CR) genes reveals that <1% do not have corresponding ESTs, indicating that the representation of genes associated with commonly studied tumors is high. The careful recording of the origin of all ESTs we have produced has enabled detailed definition of where the genes they represent are expressed in the human body. More than 100,000 ESTs are available for seven tissues, indicating a surprising variability of gene usage that has led to the discovery of a significant number of genes with restricted expression, and that may thus be therapeutically useful. The ESTs also reveal novel nonsynonymous germline variants (although the one-pass nature of the data necessitates careful validation) and many alternatively spliced transcripts. Although widely exploited by the scientific community, vindicating our totally open source policy, the EST data generated still provide extensive information that remains to be systematically explored, and that may further facilitate progress toward both the understanding and treatment of human cancers.
Resumo:
Transposable elements are major components of plant genomes and they influence their evolution, acting as recombination hot spots, acquiring specific cell functions or becoming part of protein-coding regions. The latter is the subject of the present analysis. This study is a report on the annotation of transposable elements (TEs) in expressed sequences of Coffea arabica, Coffea canephora and Coffea racemosa, showing the occurrence of 383 ESTs and 142 unigenes with TE fragments in these three Coffea species. Based on selected unigenes, it was possible to suggest 26 putative proteins with TE-cassette insertions, demonstrating a likely contribution to protein variability. The genes for two of those proteins, the fertility restorer (FR) and the pyrophosphate-dependent phosphofructokinase (PPi-PFKs) genes, were selected for evaluating the impact of TE-cassettes on host gene evolution of other plant genomes (Arabidopsis thaliana, Oryza sativa and populus trichocarpa). This survey allowed identifying a FR gene in O. sativa harboring multiple insertions of LTR retrotransposons that originated new exons, which however does not necessarily mean a case of molecular domestication. A possible transduction event of a fragment of the PPi-PFK beta-subunit gene mediated by Helitron ATREPX1 in Arabidopsis thaliana was also highlighted.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)