975 resultados para MULTILOCUS SEQUENCE-ANALYSIS


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: The variety of DNA microarray formats and datasets presently available offers an unprecedented opportunity to perform insightful comparisons of heterogeneous data. Cross-species studies, in particular, have the power of identifying conserved, functionally important molecular processes. Validation of discoveries can now often be performed in readily available public data which frequently requires cross-platform studies.Cross-platform and cross-species analyses require matching probes on different microarray formats. This can be achieved using the information in microarray annotations and additional molecular biology databases, such as orthology databases. Although annotations and other biological information are stored using modern database models ( e. g. relational), they are very often distributed and shared as tables in text files, i.e. flat file databases. This common flat database format thus provides a simple and robust solution to flexibly integrate various sources of information and a basis for the combined analysis of heterogeneous gene expression profiles.Results: We provide annotationTools, a Bioconductor-compliant R package to annotate microarray experiments and integrate heterogeneous gene expression profiles using annotation and other molecular biology information available as flat file databases. First, annotationTools contains a specialized set of functions for mining this widely used database format in a systematic manner. It thus offers a straightforward solution for annotating microarray experiments. Second, building on these basic functions and relying on the combination of information from several databases, it provides tools to easily perform cross-species analyses of gene expression data.Here, we present two example applications of annotationTools that are of direct relevance for the analysis of heterogeneous gene expression profiles, namely a cross-platform mapping of probes and a cross-species mapping of orthologous probes using different orthology databases. We also show how to perform an explorative comparison of disease-related transcriptional changes in human patients and in a genetic mouse model.Conclusion: The R package annotationTools provides a simple solution to handle microarray annotation and orthology tables, as well as other flat molecular biology databases. Thereby, it allows easy integration and analysis of heterogeneous microarray experiments across different technological platforms or species.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aims: The adaptive immune response against hepatitis C virus (HCV) is significantly shaped by the host's composition of HLA alleles. Thus, the HLA phenotype is a critical determinant of viral evolution during adaptive immune pressure. Potential associations of HLA class I alleles with polymorphisms of HCV immune escape variants are largely unknown. Methods: Direct sequence analysis of the genes encoding the HCV proteins E2, NS3 and NS5B in a cohort of 159 patients with chronic HCV genotype 1 infection who were treated with pegylated interferon-alfa 2b and ribavirin in a prospective controlled trial for 48 weeks was exhibited. HLA class I genotyping was performed by strand-specific reverse hybridization with the INNO-LiPA line probe assays for HLA-A and HLA-B and by strand-specific PCR-SSP. We analyzed each amino acid position of HCV proteins using an extension of Fisher's exact test for associations with HLA alleles. In addition, associations of specific HLA alleles with inflammatory activity, liver fibrosis, HCV RNA viral load and virologic treatment outcome were investigated. Results: Separate analyses of HCV subtype 1a and 1b isolates revealed substantially different patterns of HLA-restricted polymorphisms between subtypes. Only one polymorphism within NS5B (V2758x) was significantly associated with HLA B*15 in HCV genotype 1b infected patients (adjusted p=0,048). However, a number of HLA class I-restricted polymorphisms within novel putative HCV CD8+ T cell epitopes (genotype 1a: HLA-A*11 GTRTIASPK1086-1094 [NS3], HLA-B*07 WPAPQGARSL1111-1120 [NS3]; genotype 1b: HLA-A*24 HYAPRPCGI488-496 [E2], HLA-B*44 GENETDVLL530-538 [E2], HLA-B*15 RVFTEAMTRY2757-2766 [NS5B]) were observed with high predicted epitope binding scores assessed by the web-based software SYFPEITHI (>21). Most of the identified putative epitopes were overlapping with already otherwise published epitopes, indicating a high immunogenicity of the accordant HCV protein region. In addition, certain HLA class I alleles were associated with inflammatory activity, stage of liver fibrosis, and sustained virologic response to antiviral therapy. Conclusions: HLA class I restricted HCV sequence polymorphisms are rare. HCV polymorphisms identified within putative HCV CD8+ T cell epitopes in the present study differ in their genomic distribution between genotype 1a and 1b isolates, implying divergent adaptation to the host's immune pressure on the HCV subtype level.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Little is known about the relation between the genome organization and gene expression in Leishmania. Bioinformatic analysis can be used to predict genes and find homologies with known proteins. A model was proposed, in which genes are organized into large clusters and transcribed from only one strand, in the form of large polycistronic primary transcripts. To verify the validity of this model, we studied gene expression at the transcriptional, post-transcriptional and translational levels in a unique locus of 34kb located on chr27 and represented by cosmid L979. Sequence analysis revealed 115 ORFs on either DNA strand. Using computer programs developed for Leishmania genes, only nine of these ORFs, localized on the same strand, were predicted to code for proteins, some of which show homologies with known proteins. Additionally, one pseudogene, was identified. We verified the biological relevance of these predictions. mRNAs from nine predicted genes and proteins from seven were detected. Nuclear run-on analyses confirmed that the top strand is transcribed by RNA polymerase II and suggested that there is no polymerase entry site. Low levels of transcription were detected in regions of the bottom strand and stable transcripts were identified for four ORFs on this strand not predicted to be protein-coding. In conclusion, the transcriptional organization of the Leishmania genome is complex, raising the possibility that computer predictions may not be comprehensive.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Owing to its special mode of evolution and central role in the adaptive immune system, the major histocompatibility complex (MHC) has become the focus of diverse disciplines such as immunology, evolutionary ecology, and molecular evolution. MHC evolution has been studied extensively in diverse vertebrate lineages over the last few decades, and it has been suggested that birds differ from the established mammalian norm. Mammalian MHC genes evolve independently, and duplication history (i.e., orthology) can usually be traced back within lineages. In birds, this has been observed in only 3 pairs of closely related species. Here we report strong evidence for the persistence of orthology of MHC genes throughout an entire avian order. Phylogenetic reconstructions of MHC class II B genes in 14 species of owls trace back orthology over tens of thousands of years in exon 3. Moreover, exon 2 sequences from several species show closer relationships than sequences within species, resembling transspecies evolution typically observed in mammals. Thus, although previous studies suggested that long-term evolutionary dynamics of the avian MHC was characterized by high rates of concerted evolution, resulting in rapid masking of orthology, our results question the generality of this conclusion. The owl MHC thus opens new perspectives for a more comprehensive understanding of avian MHC evolution.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Using a direct binding assay based on photoaffinity labeling, we studied the interaction of T cell receptor (TCR) with a Kd-bound photoreactive peptide derivative on living cells. The Kd-restricted Plasmodium berghei circumsporozoite (PbCS) peptide 253-260 (YIPSAEKI) was reacted NH2-terminally with biotin and at the TCR contact residue Lys259 with photoreactive iodo, 4-azido salicylic acid (IASA) to make biotin-YIPSAEK(IASA)I. Cytotoxic T lymphocyte (CTL) clones derived from mice immunized with this derivative recognized this conjugate, but not a related one lacking the IASA group nor the parental PbCS peptide. The clones were Kd restricted. Recognition experiments with variant conjugates, lacking substituents from IASA, revealed a diverse fine specificity pattern and indicated that this group interacted directly with the TCR. The TCR of four clones could be photoaffinity labeled by biotin-YIPSAEK(125IASA)I. This labeling was dependent on the conjugates binding to the Kd molecule and was selective for the TCR alpha (2 clones) or beta chain (1 clone), or was common for both chains (1 clone). TCR sequence analysis showed a preferential usage of J alpha TA28 containing alpha chains that were paired with V beta 1 expressing beta chains. The TCR that were photoaffinity labeled at the alpha chain expressed these J alpha and V beta segments. The tryptophan encoded by the J alpha TA28 segment is rarely found in other J alpha segments. Moreover, we show that the IASA group interacts preferentially with tryptophan in aqueous solution. We thus propose that for these CTL clones, labeling of the alpha chain occurs via the J alpha-encoded tryptophan residue.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A defect in glucose sensing of the pancreatic beta-cells has been observed in several animal models of type II diabetes and has been correlated with a reduced gene expression of the glucose transporter type 2 (Glut2). In a transgenic mouse model, expression of Glut2 antisense RNA in pancreatic beta-cells has recently been shown to be associated with an impaired glucose-induced insulin secretion and the development of diabetes. To identify factors that may be involved in the specific decrease of Glut2 in the beta-cells of the diabetic animal, an attempt was made to localize the cis-elements and trans-acting factors involved in the control of Glut2 expression in the endocrine pancreas. It was demonstrated by transient transfection studies that only 338 base pairs (bp) of the murine Glut2 proximal promoter are needed for reporter gene expression in pancreatic islet-derived cell lines, whereas no activity was detected in nonpancreatic cells. Three cis-elements, GTI, GTII, and GTIII, have been identified by DNAse I footprinting and gel retardation experiments within these 338 bp. GTI and GTIII bind distinct but ubiquitously expressed trans-acting factors. On the other hand, nuclear proteins specifically expressed in pancreatic cell lines interact with GTII, and their relative abundance correlates with endogenous Glut2 expression. These GTII-binding factors correspond to nuclear proteins of 180 and 90 kilodaltons as defined by Southwestern analysis. The 180-kilodalton factor is present in pancreatic beta-cell lines but not in an alpha-cell line. Mutation of the GTI or GTIII cis-elements decreases transcriptional activity directed by the 338-bp promoter, whereas mutation of GTII increases gene transcription. Thus negative and positive regulatory sequences are identified within the proximal 338 bp of the GLUT2 promoter and may participate in the islet-specific expression of the gene by binding beta-cell specific trans-acting factors.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Genomic plasticity of human chromosome 8p23.1 region is highly influenced by two groups of complex segmental duplications (SDs), termed REPD and REPP, that mediate different kinds of rearrangements. Part of the difficulty to explain the wide range of phenotypes associated with 8p23.1 rearrangements is that REPP and REPD are not yet well characterized, probably due to their polymorphic status. Here, we describe a novel primate-specific gene family, named FAM90A (family with sequence similarity 90), found within these SDs. According to the current human reference sequence assembly, the FAM90A family includes 24 members along 8p23.1 region plus a single member on chromosome 12p13.31, showing copy number variation (CNV) between individuals. These genes can be classified into subfamilies I and II, which differ in their upstream and 5′-untranslated region sequences, but both share the same open reading frame and are ubiquitously expressed. Sequence analysis and comparative fluorescence in situ hybridization studies showed that FAM90A subfamily II suffered a big expansion in the hominoid lineage, whereas subfamily I members were likely generated sometime around the divergence of orangutan and African great apes by a fusion process. In addition, the analysis of the Ka/Ks ratios provides evidence of functional constraint of some FAM90A genes in all species. The characterization of the FAM90A gene family contributes to a better understanding of the structural polymorphism of the human 8p23.1 region and constitutes a good example of how SDs, CNVs and rearrangements within themselves can promote the formation of new gene sequences with potential functional consequences.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Understanding the molecular mechanisms responsible for the regulation of the transcriptome present in eukaryotic cells isone of the most challenging tasks in the postgenomic era. In this regard, alternative splicing (AS) is a key phenomenoncontributing to the production of different mature transcripts from the same primary RNA sequence. As a plethora ofdifferent transcript forms is available in databases, a first step to uncover the biology that drives AS is to identify thedifferent types of reflected splicing variation. In this work, we present a general definition of the AS event along with anotation system that involves the relative positions of the splice sites. This nomenclature univocally and dynamically assignsa specific ‘‘AS code’’ to every possible pattern of splicing variation. On the basis of this definition and the correspondingcodes, we have developed a computational tool (AStalavista) that automatically characterizes the complete landscape of ASevents in a given transcript annotation of a genome, thus providing a platform to investigate the transcriptome diversityacross genes, chromosomes, and species. Our analysis reveals that a substantial part—in human more than a quarter—ofthe observed splicing variations are ignored in common classification pipelines. We have used AStalavista to investigate andto compare the AS landscape of different reference annotation sets in human and in other metazoan species and found thatproportions of AS events change substantially depending on the annotation protocol, species-specific attributes, andcoding constraints acting on the transcripts. The AStalavista system therefore provides a general framework to conductspecific studies investigating the occurrence, impact, and regulation of AS.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A report of the 6th Georgia Tech-Oak Ridge National Lab International Conference on Bioinformatics 'In silico Biology: Gene Discovery and Systems Genomics', Atlanta, USA, 15-17 November, 2007.

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: Despite the continuous production of genome sequence for a number of organisms,reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularlytrue for genomes for which there is not a large collection of known gene sequences, such as therecently published chicken genome. We used the chicken sequence to test comparative andhomology-based gene-finding methods followed by experimental validation as an effective genomeannotation method.Results: We performed experimental evaluation by RT-PCR of three different computational genefinders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram wascomputed and each component of it was evaluated. The results showed that de novo comparativemethods can identify up to about 700 chicken genes with no previous evidence of expression, andcan correctly extend about 40% of homology-based predictions at the 5' end.Conclusions: De novo comparative gene prediction followed by experimental verification iseffective at enhancing the annotation of the newly sequenced genomes provided by standardhomology-based methods.