963 resultados para Genome-specific Sequence


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Marginal zone B-cell lymphomas (MZLs) have been divided into 3 distinct subtypes (extranodal MZLs of mucosa-associated lymphoid tissue [MALT] type, nodal MZLs, and splenic MZLs). Nevertheless, the relationship between the subtypes is still unclear. We performed a comprehensive analysis of genomic DNA copy number changes in a very large series of MZL cases with the aim of addressing this question. Samples from 218 MZL patients (25 nodal, 57 MALT, 134 splenic, and 2 not better specified MZLs) were analyzed with the Affymetrix Human Mapping 250K SNP arrays, and the data combined with matched gene expression in 33 of 218 cases. MALT lymphoma presented significantly more frequently gains at 3p, 6p, 18p, and del(6q23) (TNFAIP3/A20), whereas splenic MZLs was associated with del(7q31), del(8p). Nodal MZLs did not show statistically significant differences compared with MALT lymphoma while lacking the splenic MZLs-related 7q losses. Gains of 3q and 18q were common to all 3 subtypes. del(8p) was often present together with del(17p) (TP53). Although del(17p) did not determine a worse outcome and del(8p) was only of borderline significance, the presence of both deletions had a highly significant negative impact on the outcome of splenic MZLs.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Desulfovibrio sp. A2 is an anaerobic gram-negative sulfate-reducing bacterium with remarkable tolerance to copper. It was isolated from wastewater effluents of a zinc smelter at the Urals. Here, we report the 4.2-Mb draft genome sequence of Desulfovibrio sp. A2 and identify potential copper resistance mechanisms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We have sequenced the genome of Desulfosporosinus sp. OT, a Gram-positive, acidophilic sulfate-reducing Firmicute isolated from copper tailing sediment in the Norilsk mining-smelting area in Northern Siberia, Russia. This represents the first sequenced genome of a Desulfosporosinus species. The genome has a size of 5.7 Mb and encodes 6,222 putative proteins.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Enterococcus hirae ATCC 9790 is a Gram-positive lactic acid bacterium that has been used in basic research for over 4 decades. Here we report the sequence and annotation of the 2.8-Mb genome of E. hirae and its endemic 29-kb plasmid pTG9790.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Genome predictions based on selected genes would be a very welcome approach for taxonomic studies, including DNA-DNA similarity, G+C content and representative phylogeny of bacteria. At present, DNA-DNA hybridizations are still considered the gold standard in species descriptions. However, this method is time-consuming and troublesome, and datasets can vary significantly between experiments as well as between laboratories. For the same reasons, full matrix hybridizations are rarely performed, weakening the significance of the results obtained. The authors established a universal sequencing approach for the three genes recN, rpoA and thdF for the Pasteurellaceae, and determined if the sequences could be used for predicting DNA-DNA relatedness within the family. The sequence-based similarity values calculated using a previously published formula proved most useful for species and genus separation, indicating that this method provides better resolution and no experimental variation compared to hybridization. By this method, cross-comparisons within the family over species and genus borders easily become possible. The three genes also serve as an indicator of the genome G+C content of a species. A mean divergence of around 1 % was observed from the classical method, which in itself has poor reproducibility. Finally, the three genes can be used alone or in combination with already-established 16S rRNA, rpoB and infB gene-sequencing strategies in a multisequence-based phylogeny for the family Pasteurellaceae. It is proposed to use the three sequences as a taxonomic tool, replacing DNA-DNA hybridization.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background Leishmania represent a complex of important human pathogens that belong to the systematic order of the kinetoplastida. They are transmitted between their human and mammalian hosts by different bloodsucking sandfly vectors. In their hosts, the Leishmania undergo several differentiation steps, and their coordination and optimization crucially depend on numerous interactions between the parasites and the physiological environment presented by the fly and human hosts. Little is still known about the signalling networks involved in these functions. In an attempt to better understand the role of cyclic nucleotide signalling in Leishmania differentiation and host-parasite interaction, we here present an initial study on the cyclic nucleotide-specific phosphodiesterases of Leishmania major. Results This paper presents the identification of three class I cyclic-nucleotide-specific phosphodiesterases (PDEs) from L. major, PDEs whose catalytic domains exhibit considerable sequence conservation with, among other, all eleven human PDE families. In contrast to other protozoa such as Dictyostelium, or fungi such as Saccharomyces cerevisiae, Candida ssp or Neurospora, no genes for class II PDEs were found in the Leishmania genomes. LmjPDEA contains a class I catalytic domain at the C-terminus of the polypeptide, with no other discernible functional domains elsewhere. LmjPDEB1 and LmjPDEB2 are coded for by closely related, tandemly linked genes on chromosome 15. Both PDEs contain two GAF domains in their N-terminal region, and their almost identical catalytic domains are located at the C-terminus of the polypeptide. LmjPDEA, LmjPDEB1 and LmjPDEB2 were further characterized by functional complementation in a PDE-deficient S. cerevisiae strain. All three enzymes conferred complementation, demonstrating that all three can hydrolyze cAMP. Recombinant LmjPDEB1 and LmjPDEB2 were shown to be cAMP-specific, with Km values in the low micromolar range. Several PDE inhibitors were found to be active against these PDEs in vitro, and to inhibit cell proliferation. Conclusion The genome of L. major contains only PDE genes that are predicted to code for class I PDEs, and none for class II PDEs. This is more similar to what is found in higher eukaryotes than it is to the situation in Dictyostelium or the fungi that concomitantly express class I and class II PDEs. Functional complementation demonstrated that LmjPDEA, LmjPDEB1 and LmjPDEB2 are capable of hydrolyzing cAMP. In vitro studies with recombinant LmjPDEB1 and LmjPDEB2 confirmed this, and they demonstrated that both are completely cAMP-specific. Both enzymes are inhibited by several commercially available PDE inhibitors. The observation that these inhibitors also interfere with cell growth in culture indicates that inhibition of the PDEs is fatal for the cell, suggesting an important role of cAMP signalling for the maintenance of cellular integrity and proliferation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Here we present the identification and cloning of the NcBSR4 gene, the putative Neospora caninum orthologue to the Toxoplasma gondii TgBSR4 gene. To isolate NcBSR4, genome walking PCR was performed on N. caninum genomic DNA using the expressed sequence tag NcEST3c28h02.y1 sequence, which shares a 44% identity with the TgBSR4 gene, as a framework. Nucleotide sequencing of amplified DNA fragments revealed a single uninterrupted 1227 bp open reading frame that encodes a protein of 408 amino acids with 66% similarity to the TgBSR4 antigen. A putative 39-residue signal peptide was found at the NH2-terminus, followed by a hydrophilic region. At the COOH-terminus, a potential site for a glycosylphosphatidylinositol anchor was identified at amino acid 379. A polyclonal serum against recombinant NcBSR4 protein was raised in rabbits, and immunolabelling demonstrated stage-specific expression of the NcBSR4 antigen in N. caninum bradyzoites produced in vitro and in vivo. Furthermore, RT-PCR analysis showed a slight increase of NcBSR4 transcripts in bradyzoites generated during in vitro tachyzoite-to-bradyzoite stage-conversion, suggesting that this gene is specifically expressed at the bradyzoite stage and that its transcription relies on the switch to this stage.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We report a high-quality draft sequence of the genome of the horse (Equus caballus). The genome is relatively repetitive but has little segmental duplication. Chromosomes appear to have undergone few historical rearrangements: 53% of equine chromosomes show conserved synteny to a single human chromosome. Equine chromosome 11 is shown to have an evolutionary new centromere devoid of centromeric satellite DNA, suggesting that centromeric function may arise before satellite repeat accumulation. Linkage disequilibrium, showing the influences of early domestication of large herds of female horses, is intermediate in length between dog and human, and there is long-range haplotype sharing among breeds.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Avibacterium paragallinarum is an important pathogen of chicken livestock causing infectious coryza. Here, we report the draft genome sequence of the virulent A. paragallinarum serotype A strain JF4211 (2.8 Mbp and G+C content of 41%) and the two toxin operons discovered from the annotation of the genome.