969 resultados para coding sequence
Resumo:
The recently described cupin superfamily of proteins includes the germin and germinlike proteins, of which the cereal oxalate oxidase is the best characterized. This superfamily also includes seed storage proteins, in addition to several microbial enzymes and proteins with unknown function. All these proteins are characterized by the conservation of two central motifs, usually containing two or three histidine residues presumed to be involved with metal binding in the catalytic active site. The present study on the coding regions of Synechocystis PCC6803 identifies a previously unknown group of 12 related cupins, each containing the characteristic two-motif signature. This group comprises 11 single-domain proteins, ranging in length from 104 to 289 residues, and includes two phosphomannose isomerases and two epimerases involved in cell wall synthesis, a member of the pirin group of nuclear proteins, a possible transcriptional regulator, and a close relative-of a cytochrome c551 from Rhodococcus. Additionally, there is a duplicated, two-domain protein that has close similarity to an oxalate decarboxylase from the fungus Collybia velutipes and that is a putative progenitor of the storage proteins of land plants.
Resumo:
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.
Resumo:
The General Packet Radio Service (GPRS) has been developed for the mobile radio environment to allow the migration from the traditional circuit switched connection to a more efficient packet based communication link particularly for data transfer. GPRS requires the addition of not only the GPRS software protocol stack, but also more baseband functionality for the mobile as new coding schemes have be en defined, uplink status flag detection, multislot operation and dynamic coding scheme detect. This paper concentrates on evaluating the performance of the GPRS coding scheme detection methods in the presence of a multipath fading channel with a single co-channel interferer as a function of various soft-bit data widths. It has been found that compressing the soft-bit data widths from the output of the equalizer to save memory can influence the likelihood decision of the coding scheme detect function and hence contribute to the overall performance loss of the system. Coding scheme detection errors can therefore force the channel decoder to either select the incorrect decoding scheme or have no clear decision which coding scheme to use resulting in the decoded radio block failing the block check sequence and contribute to the block error rate. For correct performance simulation, the performance of the full coding scheme detection must be taken into account.
Resumo:
Physiological and yield traits such as stomatal conductance (mmol m-2s-1), Leaf relative water content (RWC %) and grain yield per plant were studied in a separate experiment. Results revealed that five out of sixteen cultivars viz. Anmol, Moomal, Sarsabz, Bhitai and Pavan, appeared to be relatively more drought tolerant. Based on morphophysiological results, studies were continued to look at these cultivars for drought tolerance at molecular level. Initially, four well recognized primers for dehydrin genes (DHNs) responsible for drought induction in T. durum L., T. aestivum L. and O. sativa L. were used for profiling gene sequence of sixteen wheat cultivars. The primers amplified the DHN genes variably like Primer WDHN13 (T. aestivum L.) amplified the DHN gene in only seven cultivars whereas primer TdDHN15 (T. durum L.) amplified all the sixteen cultivars with even different DNA banding patterns some showing second weaker DNA bands. Third primer TdDHN16 (T. durum L.) has shown entirely different PCR amplification prototype, specially showing two strong DNA bands while fourth primer RAB16C (O. sativa L.) failed to amplify DHN gene in any of the cultivars. Examination of DNA sequences revealed several interesting features. First, it identified the two exon/one intron structure of this gene (complete sequences were not shown), a feature not previously described in the two database cDNA sequences available from T. aestivum L. (gi|21850). Secondly, the analysis identified several single nucleotide polymorphisms (SNPs), positions in gene sequence. Although complete gene sequence was not obtained for all the cultivars, yet there were a total of 38 variable positions in exonic (coding region) sequence, from a total gene length of 453 nucleotides. Matrix of SNP shows these 37 positions with individual sequence at positions given for each of the 14 cultivars (sequence of two cultivars was not obtained) included in this analysis. It demonstrated a considerable diversity for this gene with only three cultivars i.e. TJ-83, Marvi and TD-1 being similar to the consensus sequence. All other cultivars showed a unique combination of SNPs. In order to prove a functional link between these polymorphisms and drought tolerance in wheat, it would be necessary to conduct a more detailed study involving directed mutation of this gene and DHN gene expression.
Resumo:
Phylogenetic analyses of representative species from the five genera of Winteraceae (Drimys, Pseudowintera, Takhtajania, Tasmannia, and Zygogynum s.l.) were performed using ITS nuclear sequences and a combined data-set of ITS + psbA-trnH + rpS16 sequences (sampling of 30 and 15 species, respectively). Indel informativity using simple gap coding or gaps as a fifth character was examined in both data-sets. Parsimony and Bayesian analyses support the monophyly of Drimys, Tasmannia, and Zygogynum s.l., but do not support the monophyly of Belliolum, Zygogynum s.s., and Bubbia. Within Drimys, the combined data-set recovers two subclades. Divergence time estimates suggest that the splitting between Drimys and its sister clade (Pseudowintera + Zygogynum s.l.) occurred around the end of the Cretaceous; in contrast, the divergence between the two subclades within Drimys is more recent (15.5-18.5 MY) and coincides in time with the Andean uplift. Estimates suggest that the earliest divergences within Winteraceae could have predated the first events of Gondwana fragmentation. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
Musca domestica larvae display in anterior and middle midgut contents, a proteolytic activity with pH optimum of 3.0-3.5 and kinetic properties like cathepsin D. Three cDNAs coding for preprocathepsin D-like proteinases (ppCAD 1, ppCAD 2, ppCAD 3) were cloned from a M. domestica midgut cDNA library. The coded protein sequences included the signal peptide, propeptide and mature enzyme that has all conserved catalytic and substrate binding residues found in bovine lysosomal cathepsin D. Nevertheless, ppCAD 2 and ppCAD 3 lack the characteristic proline loop and glycosylation sites. A comparison among the sequences of cathepsin D-like enzymes from some vertebrates and those found in M. domestica and in the genomes of Aedes aegypti, Drosophila melanogaster, Tribolium castaneum, and Bombyx mori showed that only flies have enzymes lacking the proline loop (as defined by the motif: DxPxPx(G/A)P), thus resembling vertebrate pepsin. ppCAD 3 should correspond to the digestive cathepsin D-like proteinase (CAD) found in enzyme assays because: (1) it seems to be the most expressed CAD, based on the frequency of ESTs found. (2) The mRNA for CAD 3 is expressed only in the anterior and proximal middle midgut. (3) Recombinant procathepsin D-like proteinase (pCAD 3), after auto-activation has a pH optimum of 2.5-3.0 that is close to the luminal pH of M. domestica midgut. (4) Immunoblots of proteins from different tissues revealed with anti-pCAD 3 serum were positive only in samples of anterior and middle midgut tissue and contents. (5) CAD 3 is localized with immunogold inside secretory vesicles and around microvilli in anterior and middle midguit cells. The data support the view that on adapting to deal with a bacteria-rich food in an acid midgut region, M. domestica digestive CAD resulted from the same archetypical gene as the intracellular cathepsin D, paralleling what happened with vertebrates. The lack of the proline loop may be somehow associated with the extracellular role of both pepsin and digestive CAD 3. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Phylogenetic analyses of chloroplast DNA sequences, morphology, and combined data have provided consistent support for many of the major branches within the angiosperm, clade Dipsacales. Here we use sequences from three mitochondrial loci to test the existing broad scale phylogeny and in an attempt to resolve several relationships that have remained uncertain. Parsimony, maximum likelihood, and Bayesian analyses of a combined mitochondrial data set recover trees broadly consistent with previous studies, although resolution and support are lower than in the largest chloroplast analyses. Combining chloroplast and mitochondrial data results in a generally well-resolved and very strongly supported topology but the previously recognized problem areas remain. To investigate why these relationships have been difficult to resolve we conducted a series of experiments using different data partitions and heterogeneous substitution models. Usually more complex modeling schemes are favored regardless of the partitions recognized but model choice had little effect on topology or support values. In contrast there are consistent but weakly supported differences in the topologies recovered from coding and non-coding matrices. These conflicts directly correspond to relationships that were poorly resolved in analyses of the full combined chloroplast-mitochondrial data set. We suggest incongruent signal has contributed to our inability to confidently resolve these problem areas. (c) 2007 Elsevier Inc. All rights reserved.
Resumo:
Many of the controversies around the concept of homology rest on the subjectivity inherent to primary homology propositions. Dynamic homology partially solves this problem, but there has been up to now scant application of it outside of the molecular domain. This is probably because morphological and behavioural characters are rich in properties, connections and qualities, so that there is less space for conflicting character delimitations. Here we present a new method for the direct optimization of behavioural data, a method that relies on the richness of this database to delimit the characters, and on dynamic procedures to establish character state identity. We use between-species congruence in the data matrix and topological stability to choose the best cladogram. We test the methodology using sequences of predatory behaviour in a group of spiders that evolved the highly modified predatory technique of spitting glue onto prey. The cladogram recovered is fully compatible with previous analyses in the literature, and thus the method seems consistent. Besides the advantage of enhanced objectivity in character proposition, the new procedure allows the use of complex, context-dependent behavioural characters in an evolutionary framework, an important step towards the practical integration of the evolutionary and ecological perspectives on diversity. (C) The Willi Hennig Society 2010.
Resumo:
Immune evasion by Plasmodium falciparum is favored by extensive allelic diversity of surface antigens. Some of them, most notably the vaccine-candidate merozoite surface protein (MSP)-1, exhibit a poorly understood pattern of allelic dimorphism, in which all observed alleles group into two highly diverged allelic families with few or no inter-family recombinants. Here we describe contrasting levels and patterns of sequence diversity in genes encoding three MSP-1-associated surface antigens of P. falciparum, ranging from an ancient allelic dimorphism in the Msp-6 gene to a near lack of allelic divergence in Msp-9 to a more classical multi-allele polymorphism in Msp-7 Other members of the Msp-7 gene family exhibit very little polymorphism in non-repetitive regions. A comparison of P. falciparum Msp-6 sequences to an orthologous sequence from P. reichenowi provided evidence for distinct evolutionary histories of the 5` and 3` segments of the dimorphic region in PfMsp-6, consistent with one dimorphic lineage having arisen from recombination between now-extinct ancestral alleles. In addition. we uncovered two surprising patterns of evolution in repetitive sequence. Firsts in Msp-6, large deletions are associated with (nearly) identical sequence motifs at their borders. Second, a comparison of PfMsp-9 with the P. reichenowi ortholog indicated retention of a significant inter-unit diversity within an 18-base pair repeat within the coding region of P. falciparum, but homogenization in P. reichenowi. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Thaumastocoris peregrinus is a recently introduced invertebrate pest of non-native Eucalyptus plantations in the Southern Hemisphere. It was first reported from South Africa in 2003 and in Argentina in 2005. Since then, populations have grown explosively and it has attained an almost ubiquitous distribution over several regions in South Africa on 26 Eucalyptus species. Here we address three key questions regarding this invasion, namely whether only one species has been introduced, whether there were single or multiple introductions into South Africa and South America and what the source of the introduction might have been. To answer these questions, bar-coding using mitochondrial DNA (COI) sequence diversity was used to characterise the populations of this insect from Australia, Argentina, Brazil, South Africa and Uruguay. Analyses revealed three cryptic species in Australia, of which only T. peregrinus is represented in South Africa and South America. Thaumastocoris peregrinus populations contained eight haplotypes, with a pairwise nucleotide distance of 0.2-0.9% from seventeen locations in Australia. Three of these haplotypes are shared with populations in South America and South Africa, but the latter regions do not share haplotypes. These data, together with the current distribution of the haplotypes and the known direction of original spread in these regions, suggest that at least three distinct introductions of the insect occurred in South Africa and South America before 2005. The two most common haplotypes in Sydney, one of which was also found in Brisbane, are shared with the non-native regions. Sydney populations of T. peregrinus, which have regularly reached outbreak levels in recent years, might thus have served as source of these three distinct introductions into other regions of the Southern Hemisphere.
Resumo:
open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTEs were assembled into 81,429 contigs. of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTEs sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTEs coincided with DNA regions predicted as encoding exons by GENSCAN.
Resumo:
The data mining of Eucalyptus ESTs genome finds four clusters (EGCEST2257E11.g, EGBGRT3213F11.g, and EGCCFB1223H11.g) from highly conservative 14-3-3 protein family which modulates a wide variety of cellular processes. Multiple alignments were built from twenty four sequences of 14-3-3 proteins searched into the GenBank databases and into the four pools of Eucalyptus genome programs. The alignment has shown two regions highly conservative on the sequences corresponding to the motifs of protein phosphorylation and nine highly conservative regions on the sequence corresponding to the linkage regions of alpha helices structure based on three dimensional of dimer functional structure. The differences of amino acid into the structural and functional domains of 14-3-3 plant protein were identified and can explain the functional diversity of different isoforms. The phylogenic protein trees were built by the maximum parsimony and neighborjoining procedures of Clustal X alignments and PAUP software for phylogenic analysis.
Resumo:
The 3'-terminal 853 nt (and the putative 283 aa) sequence of the VP2-encoding gene from 29 field strains of porcine parvovirus (PPV) were determined and compared both to each other and with other published sequences. Sequences were examined using maximum-parsimony and statistical analyses for nucleotide diversity and sequence variability. Among the nucleotide sequences of the PPV field strains, 26 polymorphic sites were encountered; 22 polymorphic sites were detected in the putative amino acid sequence. Mapping polymorphic sites of protein data onto the three-dimensional (3D) structure of PPV VP2 revealed that almost all substitutions were located on the external surface of the viral capsid. Mapping amino acid substitutions to the alignment between PPV VP2 sequences and the 3D structure of canine parvovirus (CPV) capsid, many PPV substitutions were observed to map to regions of recognized antigenicity and/or to contain phenotypically important residues for CPV and other parvoviruses. In spite of the high sequence similarity, genetic analysis has shown the existence of at least two virus lineages among the samples. In conclusion, these results highlight the need for close surveillance on PPV genetic drift, with an assessment of its potential ability to modify the antigenic make-up of the virus.