935 resultados para Databases, Nucleic Acid
Resumo:
Polymerases and nucleases are enzymes processing DNA and RNA. They are involved in crucial processes for cell life by performing the extension and the cleavage of nucleic acid chains during genome replication and maintenance. Additionally, both enzymes are often associated to several diseases, including cancer. In order to catalyze the reaction, most of them operate via the two-metal-ion mechanism. For this, despite showing relevant differences in structure, function and catalytic properties, they share common catalytic elements, which comprise the two catalytic ions and their first-shell acidic residues. Notably, recent studies of different metalloenzymes revealed the recurrent presence of additional elements surrounding the active site, thus suggesting an extended two-metal-ion-centered architecture. However, whether these elements have a catalytic function and what is their role is still unclear. In this work, using state-of-the-art computational techniques, second- and third-shell elements are showed to act in metallonucleases favoring the substrate positioning and leaving group release. In particular, in hExo1 a transient third metal ion is recruited and positioned near the two-metal-ion site by a structurally conserved acidic residue to assist the leaving group departure. Similarly, in hFEN1 second- and third-shell Arg/Lys residues operate the phosphate steering mechanism through (i) substrate recruitment, (ii) precise cleavage localization, and (iii) leaving group release. Importantly, structural comparisons of hExo1, hFEN1 and other metallonucleases suggest that similar catalytic mechanisms may be shared by other enzymes. Overall, the results obtained provide an extended vision on parallel strategies adopted by metalloenzymes, which employ divalent metal ion or positively charged residues to ensure efficient and specific catalysis. Furthermore, these outcomes may have implications for de novo enzyme engineering and/or drug design to modulate nucleic acid processing.
Resumo:
In traditional criminal investigation, uncertainties are often dealt with using a combination of common sense, practical considerations and experience, but rarely with tailored statistical models. For example, in some countries, in order to search for a given profile in the national DNA database, it must have allelic information for six or more of the ten SGM Plus loci for a simple trace. If the profile does not have this amount of information then it cannot be searched in the national DNA database (NDNAD). This requirement (of a result at six or more loci) is not based on a statistical approach, but rather on the feeling that six or more would be sufficient. A statistical approach, however, could be more rigorous and objective and would take into consideration factors such as the probability of adventitious matches relative to the actual database size and/or investigator's requirements in a sensible way. Therefore, this research was undertaken to establish scientific foundations pertaining to the use of partial SGM Plus loci profiles (or similar) for investigation.
Resumo:
In this paper we included a very broad representation of grass family diversity (84% of tribes and 42% of genera). Phylogenetic inference was based on three plastid DNA regions rbcL, matK and trnL-F, using maximum parsimony and Bayesian methods. Our results resolved most of the subfamily relationships within the major clades (BEP and PACCMAD), which had previously been unclear, such as, among others the: (i) BEP and PACCMAD sister relationship, (ii) composition of clades and the sister-relationship of Ehrhartoideae and Bambusoideae + Pooideae, (iii) paraphyly of tribe Bambuseae, (iv) position of Gynerium as sister to Panicoideae, (v) phylogenetic position of Micrairoideae. With the presence of a relatively large amount of missing data, we were able to increase taxon sampling substantially in our analyses from 107 to 295 taxa. However, bootstrap support and to a lesser extent Bayesian inference posterior probabilities were generally lower in analyses involving missing data than those not including them. We produced a fully resolved phylogenetic summary tree for the grass family at subfamily level and indicated the most likely relationships of all included tribes in our analysis.
Resumo:
The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, experimentally defined by a transcription start site (TSS). There may be multiple promoter entries for a single gene. The underlying experimental evidence comes from journal articles and, starting from release 73, from 5' ESTs of full-length cDNA clones used for so-called in silico primer extension. Access to promoter sequences is provided by pointers to TSS positions in nucleotide sequence entries. The annotation part of an EPD entry includes a description of the type and source of the initiation site mapping data, links to other biological databases and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Web-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria and to navigate to related databases exploiting different cross-references. Tools for analysing sequence motifs around TSSs defined in EPD are provided by the signal search analysis server. EPD can be accessed at http://www.epd. isb-sib.ch.
Resumo:
The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (approximately 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome.
Resumo:
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Resumo:
Selective pressures related to gene function and chromosomal architecture are acting on genome sequences and can be revealed, for instance, by appropriate genometric methods. Cumulative nucleotide skew analyses, i.e., GC, TA, and ORF orientation skews, predict the location of the origin of DNA replication for 88 out of 100 completely sequenced bacterial chromosomes. These methods appear fully reliable for proteobacteria, Gram-positives, and spirochetes as well as for euryarchaeotes. Based on this genome architecture information, coorientation analyses reveal that in prokaryotes, ribosomal RNA (rRNA) genes encoding the small and large ribosomal subunits are all transcribed in the same direction as DNA replication; that is, they are located along the leading strand. This result offers a simple and reliable method for circumscribing the region containing the origin of the DNA replication and reveals a strong selective pressure acting on the orientation of rRNA genes similar to the weaker one acting on the orientation of ORFs. Rate of coorientation of transfer RNA (tRNA) genes with DNA replication appears to be taxon-specific. Analyzing nucleotide biases such as GC and TA skews of genes and plotting one against the other reveals a taxonomic clusterization of species. All ribosomal RNA genes are enriched in Gs and depleted in Cs, the only so far known exception being the rRNA genes of deuterostomian mitochondria. However, this exception can be explained by the fact that in the chromosome of the human mitochondrion, the model of the deuterostomian organelle genome, DNA replication, and rRNA transcription proceed in opposite directions. A general rule is deduced from prokaryotic and mitochondrial genomes: ribosomal RNA genes that are transcribed in the same direction as the DNA replication are enriched in Gs, and those transcribed in the opposite direction are depleted in Gs.
Resumo:
HTPSELEX is a public database providing access to primary and derived data from high-throughput SELEX experiments aimed at characterizing the binding specificity of transcription factors. The resource is primarily intended to serve computational biologists interested in building models of transcription factor binding sites from large sets of binding sequences. The guiding principle is to make available all information that is relevant for this purpose. For each experiment, we try to provide accurate information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, assembled clone sequences (concatemers) and complete sets of in vitro selected protein-binding tags. In addition, we offer in-house derived binding sites models. HTPSELEX also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols. The FTP site contains the trace archives and database flatfiles. The web server offers user-friendly interfaces for viewing individual entries and quality-controlled download of SELEX sequence libraries according to a user-defined sequencing quality threshold. HTPSELEX is available from ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex/ and http://www.isrec.isb-sib.ch/htpselex.
Resumo:
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
Resumo:
In recent years, both homing endonucleases (HEases) and zinc-finger nucleases (ZFNs) have been engineered and selected for the targeting of desired human loci for gene therapy. However, enzyme engineering is lengthy and expensive and the off-target effect of the manufactured endonucleases is difficult to predict. Moreover, enzymes selected to cleave a human DNA locus may not cleave the homologous locus in the genome of animal models because of sequence divergence, thus hampering attempts to assess the in vivo efficacy and safety of any engineered enzyme prior to its application in human trials. Here, we show that naturally occurring HEases can be found, that cleave desirable human targets. Some of these enzymes are also shown to cleave the homologous sequence in the genome of animal models. In addition, the distribution of off-target effects may be more predictable for native HEases. Based on our experimental observations, we present the HomeBase algorithm, database and web server that allow a high-throughput computational search and assignment of HEases for the targeting of specific loci in the human and other genomes. We validate experimentally the predicted target specificity of candidate fungal, bacterial and archaeal HEases using cell free, yeast and archaeal assays.
Resumo:
The Complete Arabidopsis Transcriptome Micro Array (CATMA) database contains gene sequence tag (GST) and gene model sequences for over 70% of the predicted genes in the Arabidopsis thaliana genome as well as primer sequences for GST amplification and a wide range of supplementary information. All CATMA GST sequences are specific to the gene for which they were designed, and all gene models were predicted from a complete reannotation of the genome using uniform parameters. The database is searchable by sequence name, sequence homology or direct SQL query, and is available through the CATMA website at http://www.catma.org/.
Resumo:
The Gene Ontology (GO) Consortium (http://www.geneontology.org) (GOC) continues to develop, maintain and use a set of structured, controlled vocabularies for the annotation of genes, gene products and sequences. The GO ontologies are expanding both in content and in structure. Several new relationship types have been introduced and used, along with existing relationships, to create links between and within the GO domains. These improve the representation of biology, facilitate querying, and allow GO developers to systematically check for and correct inconsistencies within the GO. Gene product annotation using GO continues to increase both in the number of total annotations and in species coverage. GO tools, such as OBO-Edit, an ontology-editing tool, and AmiGO, the GOC ontology browser, have seen major improvements in functionality, speed and ease of use.
Resumo:
Massively parallel signature sequencing (MPSS) generates millions of short sequence tags corresponding to transcripts from a single RNA preparation. Most MPSS tags can be unambiguously assigned to genes, thereby generating a comprehensive expression profile of the tissue of origin. From the comparison of MPSS data from 32 normal human tissues, we identified 1,056 genes that are predominantly expressed in the testis. Further evaluation by using MPSS tags from cancer cell lines and EST data from a wide variety of tumors identified 202 of these genes as candidates for encoding cancer/testis (CT) antigens. Of these genes, the expression in normal tissues was assessed by RT-PCR in a subset of 166 intron-containing genes, and those with confirmed testis-predominant expression were further evaluated for their expression in 21 cancer cell lines. Thus, 20 CT or CT-like genes were identified, with several exhibiting expression in five or more of the cancer cell lines examined. One of these genes is a member of a CT gene family that we designated as CT45. The CT45 family comprises six highly similar (>98% cDNA identity) genes that are clustered in tandem within a 125-kb region on Xq26.3. CT45 was found to be frequently expressed in both cancer cell lines and lung cancer specimens. Thus, MPSS analysis has resulted in a significant extension of our knowledge of CT antigens, leading to the discovery of a distinctive X-linked CT-antigen gene family.
Resumo:
Ancient asexuals have been considered to be a contradiction of the basic tenets of evolutionary theory. Barred from rearranging genetic variation by recombination, their reduced number of gene arrangements is thought to hamper their response to changing environments. For the same reason, it should be difficult for them to avoid the build-up of deleterious mutations. Several groups of taxonomically diverse organisms are thought to be ancient asexuals, although clear evidence for or against the existence of recombination events is scarce. Several methods have recently been developed for predicting recombination events by analyzing aligned sequences of a given region of DNA that all originate from one species. The methods are based on phylogenetic, substitution, and compatibility analyses. Here we present the results of analyses of sequence data from different loci studied in several groups of evolutionarily distant species that are considered to be ancient asexuals, using seven different types of analysis. The groups of organisms were the arbuscular mycorrhizal fungi (Glomales), Darwinula stevensoni (Darwinuloidea crustacean ostracods) and the bdelloid rotifers (Bdelloidea), which are thought to have been asexual for the last 400, 25-100, and 35-40 Myr, respectively. The seven different analytical methods evaluated the evolutionary relationships among haplotypes, and these methods had previously been shown to be reliable for predicting the occurrence of recombination events. Despite the different degree of genetic variation among the different groups of organisms, at least some evidence for recombination was found in all species groups. In particular, predictions of recombination events in the arbuscular mycorrhizal fungi were frequent. Predictions of recombination were also found for sequence data that have previously been used to infer the absence of recombination in bdelloid rotifers. Although our results have to be taken with some caution because they could signal very ancient recombination events or possibly other genetic variation of nonrecombinant origin, they suggest that some cryptic recombination events may exist in these organisms.
Resumo:
Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.