952 resultados para BIOINFORMATICS DATABASES
Resumo:
The study of the Schistosoma mansoni genome, one of the etiologic agents of human schistosomiasis, is essential for a better understanding of the biology and development of this parasite. In order to get an overview of all S. mansoni catalogued gene sequences, we performed a clustering analysis of the parasite mRNA sequences available in public databases. This was made using softwares PHRAP and CAP3. The consensus sequences, generated after the alignment of cluster constituent sequences, allowed the identification by database homology searches of the most expressed genes in the worm. We analyzed these genes and looked for a correlation between their high expression and parasite metabolism and biology. We observed that the majority of these genes is related to the maintenance of basic cell functions, encoding genes whose products are related to the cytoskeleton, intracellular transport and energy metabolism. Evidences are presented here that genes for aerobic energy metabolism are expressed in all the developmental stages analyzed. Some of the most expressed genes could not be identified by homology searches and may have some specific functions in the parasite.
Resumo:
In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3%) sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds). Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8%) contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds). The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds). From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.
Resumo:
The analysis of genetic data for human immunodeficiency virus type 1 (HIV-1) and human T-cell lymphotropic virus type 1 (HTLV-1) is essential to improve treatment and public health strategies as well as to select strains for vaccine programs. However, the analysis of large quantities of genetic data requires collaborative efforts in bioinformatics, computer biology, molecular biology, evolution, and medical science. The objective of this study was to review and improve the molecular epidemiology of HIV-1 and HTLV-1 viruses isolated in Brazil using bioinformatic tools available in the Laboratório Avançado de Sáude Pública (Lasp) bioinformatics unit. The analysis of HIV-1 isolates confirmed a heterogeneous distribution of the viral genotypes circulating in the country. The Brazilian HIV-1 epidemic is characterized by the presence of multiple subtypes (B, F1, C) and B/F1 recombinant virus while, on the other hand, most of the HTLV-1 sequences were classified as Transcontinental subgroup of the Cosmopolitan subtype. Despite the high variation among HIV-1 subtypes, protein glycosylation and phosphorylation domains were conserved in the pol, gag, and env genes of the Brazilian HIV-1 strains suggesting constraints in the HIV-1 evolution process. As expected, the functional protein sites were highly conservative in the HTLV-1 env gene sequences. Furthermore, the presence of these functional sites in HIV-1 and HTLV-1 strains could help in the development of vaccines that pre-empt the viral escape process.
Resumo:
BACKGROUND: Despite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularly true for genomes for which there is not a large collection of known gene sequences, such as the recently published chicken genome. We used the chicken sequence to test comparative and homology-based gene-finding methods followed by experimental validation as an effective genome annotation method. RESULTS: We performed experimental evaluation by RT-PCR of three different computational gene finders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram was computed and each component of it was evaluated. The results showed that de novo comparative methods can identify up to about 700 chicken genes with no previous evidence of expression, and can correctly extend about 40% of homology-based predictions at the 5' end. CONCLUSIONS: De novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods.
Resumo:
Expert curation and complete collection of mutations in genes that affect human health is essential for proper genetic healthcare and research. Expert curation is given by the curators of gene-specific mutation databases or locus-specific databases (LSDBs). While there are over 700 such databases, they vary in their content, completeness, time available for curation, and the expertise of the curator. Curation and LSDBs have been discussed, written about, and protocols have been provided for over 10 years, but there have been no formal recommendations for the ideal form of these entities. This work initiates a discussion on this topic to assist future efforts in human genetics. Further discussion is welcome.
Resumo:
[Contents] - Introduction - Selected existing genetic database : distinctive features, ethical problems and the public debate - The ethical debate : principles, values and interests : the ethical foundations of guidelines - Selected issues of consensus and of controversy - Ethical issues of human genetic databases and the future This book compares the new area of biobanking with the tradition of ethically accepted classical research and highlights the distinctive features of existing databases and guidelines
Resumo:
BACKGROUND. Bioinformatics is commonly featured as a well assorted list of available web resources. Although diversity of services is positive in general, the proliferation of tools, their dispersion and heterogeneity complicate the integrated exploitation of such data processing capacity. RESULTS. To facilitate the construction of software clients and make integrated use of this variety of tools, we present a modular programmatic application interface (MAPI) that provides the necessary functionality for uniform representation of Web Services metadata descriptors including their management and invocation protocols of the services which they represent. This document describes the main functionality of the framework and how it can be used to facilitate the deployment of new software under a unified structure of bioinformatics Web Services. A notable feature of MAPI is the modular organization of the functionality into different modules associated with specific tasks. This means that only the modules needed for the client have to be installed, and that the module functionality can be extended without the need for re-writing the software client. CONCLUSIONS. The potential utility and versatility of the software library has been demonstrated by the implementation of several currently available clients that cover different aspects of integrated data processing, ranging from service discovery to service invocation with advanced features such as workflows composition and asynchronous services calls to multiple types of Web Services including those registered in repositories (e.g. GRID-based, SOAP, BioMOBY, R-bioconductor, and others).
Resumo:
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Resumo:
Summary Cancer is a leading cause of morbidity and mortality in Western countries (as an example, colorectal cancer accounts for about 300'000 new cases and 200'000 deaths each year in Europe and in the USA). Despite that many patients with cancer have complete macroscopic clearance of their disease after resection, radiotherapy and/or chemotherapy, many of these patients develop fatal recurrence. Vaccination with immunogenic peptide tumor antigens has shown encouraging progresses in the last decade; immunotherapy might therefore constitute a fourth therapeutic option in the future. We dissect here and critically evaluate the numerous steps of reverse immunology, a forecast procedure to identify antigenic peptides from the sequence of a gene of interest. Bioinformatic algorithms were applied to mine sequence databases for tumor-specific transcripts. A quality assessment of publicly available sequence databanks allowed defining strengths and weaknesses of bioinformatics-based prediction of colon cancer-specific alternative splicing: new splice variants could be identified, however cancer-restricted expression could not be significantly predicted. Other sources of target transcripts were quantitatively investigated by polymerase chain reactions, as cancer-testis genes or reported overexpressed transcripts. Based on the relative expression of a defined set of housekeeping genes in colon cancer tissues, we characterized a precise procedure for accurate normalization and determined a threshold for the definition of significant overexpression of genes in cancers versus normal tissues. Further steps of reverse immunology were applied on a splice variant of the Melan¬A gene. Since it is known that the C-termini of antigenic peptides are directly produced by the proteasome, longer precursor and overlapping peptides encoded by the target sequence were synthesized chemically and digested in vitro with purified proteasome. The resulting fragments were identified by mass spectroscopy to detect cleavage sites. Using this information and based on the available anchor motifs for defined HLA class I molecules, putative antigenic peptides could be predicted. Their relative affinity for HLA molecules was confirmed experimentally with functional competitive binding assays and they were used to search patients' peripheral blood lymphocytes for the presence of specific cytolytic T lymphocytes (CTL). CTL clones specific for a splice variant of Melan-A could be isolated; although they recognized peptide-pulsed cells, they failed to lyse melanoma cells in functional assays of antigen recognition. In the conclusion, we discuss advantages and bottlenecks of reverse immunology and compare the technical aspects of this approach with the more classical procedure of direct immunology, a technique introduced by Boon and colleagues more than 10 years ago to successfully clone tumor antigens.
Resumo:
BACKGROUND: The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference. RESULTS: GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition. CONCLUSION: To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.
Resumo:
The data indispensable for carrying out the comprehensive, multi-faceted process of medical technology assessment (MTA) should be collected from a variety of sources. The authors distinguish between type "A" general data, useful for assessment but collected without this specific aim, and type "B" data. Registries of health care procedures or of diseases, as well as clinical data bases are quoted as examples of type "B" data, specifically relating to MTA. Since demographic methods are of importance for the evaluation of long-term effects of medical technologies, examples of sources of type "A" data are presented. Their significance for health policy making is discussed.
Resumo:
The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.