992 resultados para Database accession number
Resumo:
Upon the completion of the Saccharomyces cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) Nature, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the Saccharomyces Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at genome-www.stanford.edu/Saccharomyces/.
Resumo:
The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu) is a database that documents experimentally determined protein–protein interactions. Since January 2000 the number of protein–protein interactions in DIP has nearly tripled to 3472 and the number of proteins to 2659. New interactive tools have been developed to aid in the visualization, navigation and study of networks of protein interactions.
Resumo:
Aminoacyl-tRNA synthetases (AARSs) are at the center of the question of the origin of life. They constitute a family of enzymes integrating the two levels of cellular organization: nucleic acids and proteins. AARSs arose early in evolution and are believed to be a group of ancient proteins. They are responsible for attaching amino acid residues to their cognate tRNA molecules, which is the first step in the protein synthesis. The role they play in a living cell is essential for the precise deciphering of the genetic code. The analysis of AARSs evolutionary history was not possible for a long time due to a lack of a sufficiently large number of their amino acid sequences. The emerging picture of synthetases’ evolution is a result of recent achievements in genomics [Woese,C., Olsen,G.J., Ibba,M. and Söll,D. (2000) Microbiol. Mol. Biol. Rev., 64, 202–236]. In this paper we present a short introduction to the AARSs database. The updated database contains 1047 AARS primary structures from archaebacteria, eubacteria, mitochondria, chloroplasts and eukaryotic cells. It is the compilation of amino acid sequences of all AARSs known to date, which are available as separate entries via the WWW at http://biobase s.ibch.poznan.pl/aars/.
Resumo:
The Conserved Key Amino Acid Positions DataBase (CKAAPs DB) provides access to an analysis of structurally similar proteins with dissimilar sequences where key residues within a common fold are identified. The derivation and significance of CKAAPs starting from pairwise structure alignments is described fully in Reddy et al. [Reddy,B.V.B., Li,W.W., Shindyalov,I.N. and Bourne,P.E. (2000) Proteins, in press]. The CKAAPs identified from this theoretical analysis are provided to experimentalists and theoreticians for potential use in protein engineering and modeling. It has been suggested that CKAAPs may be crucial features for protein folding, structural stability and function. Over 170 substructures, as defined by the Combinatorial Extension (CE) database, which are found in approximately 3000 representative polypeptide chains have been analyzed and are available in the CKAAPs DB. CKAAPs DB also provides CKAAPs of the representative set of proteins derived from the CE and FSSP databases. Thus the database contains over 5000 representative polypeptide chains, covering all known structures in the PDB. A web interface to a relational database permits fast retrieval of structure-sequence alignments, CKAAPs and associated statistics. Users may query by PDB ID, protein name, function and Enzyme Classification number. Users may also submit protein alignments of their own to obtain CKAAPs. An interface to display CKAAPs on each structure from a web browser is also being implemented. CKAAPs DB is maintained by the San Diego Supercomputer Center and accessible at the URL http://ckaaps.sdsc.edu.
Resumo:
The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of structurally similar proteins along with their structure alignments. Using CE, structure–structure alignments can provide insights into biological function. When a protein of known function is shown to be structurally similar to a protein of unknown function, a relationship might be inferred; a relationship not necessarily detectable from sequence comparison alone. Establishing structure–structure relationships in this way is of great importance as we enter an era of structural genomics where there is a likelihood of an increasing number of structures with unknown functions being determined. Thus the CE database is an example of a useful tool in the annotation of protein structures of unknown function. Comparisons can be performed on the complete PDB or on a structurally representative subset of proteins. The source protein(s) can be from the PDB (updated monthly) or uploaded by the user. CE provides sequence alignments resulting from structural alignments and Cartesian coordinates for the aligned structures, which may be analyzed using the supplied Compare3D Java applet, or downloaded for further local analysis. Searches can be run from the CE web site, http://cl.sdsc.edu/ce.html, or the database and software downloaded from the site for local use.
Resumo:
Methylation of cytosine in the 5 position of the pyrimidine ring is a major modification of the DNA in most organisms. In eukaryotes, the distribution and number of 5-methylcytosines (5mC) along the DNA is heritable but can also change with the developmental state of the cell and as a response to modifications of the environment. While DNA methylation probably has a number of functions, scientific interest has recently focused on the gene silencing effect methylation can have in eukaryotic cells. In particular, the discovery of changes in the methylation level during cancer development has increased the interest in this field. In the past, a vast amount of data has been generated with different levels of resolution ranging from 5mC content of total DNA to the methylation status of single nucleotides. We present here a database for DNA methylation data that attempts to unify these results in a common resource. The database is accessible via WWW (http://www.methdb.de). It stores information about the origin of the investigated sample and the experimental procedure, and contains the DNA methylation data. Query masks allow for searching for 5mC content, species, tissue, gene, sex, phenotype, sequence ID and DNA type. The output lists all available information including the relative gene expression level. DNA methylation patterns and methylation profiles are shown both as a graphical representation and as G/A/T/C/5mC-sequences or tables with sequence positions and methylation levels, respectively.
Resumo:
Despite a rise in anti-EU rhetoric and a growing assertiveness in Ankara’s relations with Brussels, Turkey will continue to seek closer integration with the European Union in the coming years. The current stalemate in the accession process has been a source of irritation to Recep Tayyip Erdoğan’s government. Nonetheless, a complete collapse of accession talks would be a much worse scenario for the ruling AKP party. Currently, the government is primarily interested in keeping the negotiation process alive, rather than hoping to gain full membership any time soon. Erdoğan’s government will likely seek to continue the accession talks because the AKP is acutely aware of their importance for the country’s domestic politics, for its the economy, and – although to a lesser extent – for Turkey’s international standing. The opportunity to capitalise on this process will encourage the Turkish government to avoid crises in its relations with the EU, or to at least mitigate the impact of any potential diplomatic fallouts.
Resumo:
The present data set provides a tab separated text file compressed in a zip archive. The file includes metadata for each TaraOceans V9 rDNA metabarcode including the following fields: md5sum = unique identifier; lineage = taxonomic path associated to the metabarcode; pid = % identity to the closest reference barcode from V9_PR2; sequence = nucleotide sequence of the metabarcode; refs = identity of the best hit reference sequence(s); TARA_xxx = number of occurrences of this barcode in each of the 334 samples; totab = total abundance of the barcode ; cid = identifier of the OTU to which the barcode belongs; and taxogroup = high-taxonomic level assignation of this barcode. The file also includes three categories of functional annotations: (1) Chloroplast: yes, presence of permanent chloroplast; no, absence of permanent chloroplast ; NA, undetermined. (2) Symbiont (small partner): parasite, the species is a parasite; commensal, the species is a commensal; mutualist, the species is a mutualist symbiont, most often a microalgal taxon involved in photosymbiosis; no the species is not involved in a symbiosis as small partner; NA, undetermined. (3) Symbiont (host): photo, the host species relies on a mutualistic microalgal photosymbiont to survive (obligatory photosymbiosis); photo_falc, same as photo, but facultative relationship; photo_klep, the host species maintains chloroplasts from microalgal prey(s) to survive; photo_klep_falc, same as photo_klep, but facultative; Nfix, the host species must interact with a mutualistic symbiont providing N2 fixation to survive; Nfix_falc, same as Nfix, but facultative; no, the species is not involved in any mutualistic symbioses; NA, undetermined. For example, the collodarian/Brandtodinium symbiosis is annotated: Chloroplast, "no"; Symbiont (small), "no"; Symbiont (host), "photo", for the collodarian host; and: Chloroplast, "yes"; Symbiont (small), "mutualist"; Symbiont (host), "no", for the dinoflagellate microalgal endosymbiont.chloroplast = "yes", "no" or "NA"; symbiont.small = "parasite", "commensal", "mutualist", "no" or "NA"; symbiont.host = "photo", "photo_falc", "photo_klep", "Nfix", no or NA; benef = "Nfix", "no" or "NA"; trophism = Metazoa , heterotroph , NA , photosymbiosis , phototroph according to the previous fields.
Resumo:
Prepared for Illinois Hazardous Waste Research and Information Center, HWRIC project number 87-005.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
Candida albicans is a pathogen commonly infecting patients who receive immunosuppressive drug therapy, long-term catheterization, or those who suffer from acquired immune deficiency syndrome (AIDS). The major factor accountable for pathogenicity of C. albicans is host immune status. Various virulence molecules, or factors, of are also responsible for the disease progression. Virulence proteins are published in public databases but they normally lack detailed functional annotations. We have developed CandiVF, a specialized database of C. albicans virulence factors (http://antigen.i2r.a-star.edu.sg/Templar/DB/CandiVF/) to facilitate efficient extraction and analysis of data aimed to assist research on immune responses, pathogenesis, prevention, and control of candidiasis. CandiVF contains a large number of annotated virulence proteins, including secretory, cell wall-associated, membrane, cytoplasmic, and nuclear proteins. This database has in-built bioinformatics tools including keyword and BLAST search, visualization of 3D-structures, HLA-DR epitope prediction, virulence descriptors, and virulence factors ontology.
Resumo:
With rapid advances in video processing technologies and ever fast increments in network bandwidth, the popularity of video content publishing and sharing has made similarity search an indispensable operation to retrieve videos of user interests. The video similarity is usually measured by the percentage of similar frames shared by two video sequences, and each frame is typically represented as a high-dimensional feature vector. Unfortunately, high complexity of video content has posed the following major challenges for fast retrieval: (a) effective and compact video representations, (b) efficient similarity measurements, and (c) efficient indexing on the compact representations. In this paper, we propose a number of methods to achieve fast similarity search for very large video database. First, each video sequence is summarized into a small number of clusters, each of which contains similar frames and is represented by a novel compact model called Video Triplet (ViTri). ViTri models a cluster as a tightly bounded hypersphere described by its position, radius, and density. The ViTri similarity is measured by the volume of intersection between two hyperspheres multiplying the minimal density, i.e., the estimated number of similar frames shared by two clusters. The total number of similar frames is then estimated to derive the overall similarity between two video sequences. Hence the time complexity of video similarity measure can be reduced greatly. To further reduce the number of similarity computations on ViTris, we introduce a new one dimensional transformation technique which rotates and shifts the original axis system using PCA in such a way that the original inter-distance between two high-dimensional vectors can be maximally retained after mapping. An efficient B+-tree is then built on the transformed one dimensional values of ViTris' positions. Such a transformation enables B+-tree to achieve its optimal performance by quickly filtering a large portion of non-similar ViTris. Our extensive experiments on real large video datasets prove the effectiveness of our proposals that outperform existing methods significantly.
Resumo:
This thesis describes the development of a complete data visualisation system for large tabular databases, such as those commonly found in a business environment. A state-of-the-art 'cyberspace cell' data visualisation technique was investigated and a powerful visualisation system using it was implemented. Although allowing databases to be explored and conclusions drawn, it had several drawbacks, the majority of which were due to the three-dimensional nature of the visualisation. A novel two-dimensional generic visualisation system, known as MADEN, was then developed and implemented, based upon a 2-D matrix of 'density plots'. MADEN allows an entire high-dimensional database to be visualised in one window, while permitting close analysis in 'enlargement' windows. Selections of records can be made and examined, and dependencies between fields can be investigated in detail. MADEN was used as a tool for investigating and assessing many data processing algorithms, firstly data-reducing (clustering) methods, then dimensionality-reducing techniques. These included a new 'directed' form of principal components analysis, several novel applications of artificial neural networks, and discriminant analysis techniques which illustrated how groups within a database can be separated. To illustrate the power of the system, MADEN was used to explore customer databases from two financial institutions, resulting in a number of discoveries which would be of interest to a marketing manager. Finally, the database of results from the 1992 UK Research Assessment Exercise was analysed. Using MADEN allowed both universities and disciplines to be graphically compared, and supplied some startling revelations, including empirical evidence of the 'Oxbridge factor'.
Resumo:
This paper discusses the use of a Model developed by Aston Business School to record the work load of its academic staff. By developing a database to register annual activity in all areas of teaching, administration and research the School has created a flexible tool which can be used for facilitating both day-to-day managerial and longer term strategic decisions. This paper gives a brief outline of the Model and discusses the factors which were taken into account when setting it up. Particular attention is paid to the uses made of the Model and the problems encountered in developing it. The paper concludes with an appraisal of the Model’s impact and of additional developments which are currently being considered. Aston Business School has had a Load Model in some form for many years. The Model has, however, been refined over the past five years, so that it has developed into a form which can be used for a far greater number of purposes within the School. The Model is coordinated by a small group of academic and administrative staff, chaired by the Head of the School. This group is responsible for the annual cycle of collecting and inputting data, validating returns, carrying out analyses of the raw data, and presenting the mater ial to different sections of the School. The authors of this paper are members of this steer ing group.