931 resultados para DNA sequence
Resumo:
Short tandem DNA repeats and telomerase compose the telomere structure in the vast majority of eukaryotic organisms. However, such a conserved organisation has not been found in dipterans. While telomeric DNA in Drosophila is composed of specific retrotransposons, complex terminal tandem repeats are present in chromosomes of Anopheles and chironomid species. In the sciarid Rhynchosciara americana, short repeats (16 and 22 bp long) tandemly arrayed seem to reach chromosome ends. Moreover, in situ hybridisation data using homopolymeric RNA probes suggested in this species the existence of a third putative chromosome end repeat enriched with (dA).(dT) homopolymers. In this work, chromosome micro-dissection and PCR primed by homopolymeric primers were employed to clone these repeats. Named T-14 and 93 % AT-rich, the repetitive unit is 14 bp long and appears organised in tandem arrays. It is localised in five non-centromeric ends and in four interstitial bands of R. americana chromosomes. To date, T-14 is the shortest repeat that has been characterised in chromosome ends of dipterans. As observed for short tandem repeats identified previously in chromosome ends of R. americana, the T-14 probe hybridised to bridges connecting non-homologous polytene chromosome ends, indicative of close association of T-14 repeats with the very end of the chromosomes. The results of this work suggest that R. americana represents an additional example of organism provided with more than one DNA sequence that is able to reach chromosome termini.
Resumo:
The 3 angstrom resolution crystal structure of the Escherichia coli catabolite gene activator protein (CAP) complexed with a 30-base pair DNA sequence shows that the DNA is bent by 900. This bend results almost entirely from two 400 kinks that occur between TG/CA base pairs at positions 5 and 6 on each side of the dyad axis of the complex. DNA sequence discrimination by CAP derives both from sequence-dependent distortion of the DNA helix and from direct hydrogen-bonding interactions between three protein side chains and the exposed edges of three base pairs in the major groove of the DNA. The structure of this transcription factor-DNA complex provides insights into possible mechanisms of transcription activation
Resumo:
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Resumo:
In cattle, at least 39 variants of the 4 casein proteins (α(S1)-, β-, α(S2)- and κ-casein) have been described to date. Many of these variants are known to affect milk-production traits, cheese-processing properties, and the nutritive value of milk. They also provide valuable information for phylogenetic studies. So far, the majority of studies exploring the genetic variability of bovine caseins considered European taurine cattle breeds and were carried out at the protein level by electrophoretic techniques. This only allows the identification of variants that, due to amino acid exchanges, differ in their electric charge, molecular weight, or isoelectric point. In this study, the open reading frames of the casein genes CSN1S1, CSN2, CSN1S2, and CSN3 of 356 animals belonging to 14 taurine and 3 indicine cattle breeds were sequenced. With this approach, we identified 23 alleles, including 5 new DNA sequence variants, with a predicted effect on the protein sequence. The new variants were only found in indicine breeds and in one local Iranian breed, which has been phenotypically classified as a taurine breed. A multidimensional scaling approach based on available SNP chip data, however, revealed an admixture of taurine and indicine populations in this breed as well as in the local Iranian breed Golpayegani. Specific indicine casein alleles were also identified in a few European taurine breeds, indicating the introgression of indicine breeds into these populations. This study shows the existence of substantial undiscovered genetic variability of bovine casein loci, especially in indicine cattle breeds. The identification of new variants is a valuable tool for phylogenetic studies and investigations into the evolution of the milk protein genes.
Resumo:
The LIM domain-binding protein Ldb1 is an essential cofactor of LIM-homeodomain (LIM-HD) and LIM-only (LMO) proteins in development. The stoichiometry of Ldb1, LIM-HD, and LMO proteins is tightly controlled in the cell and is likely a critical determinant of their biological actions. Single-stranded DNA-binding proteins (SSBPs) were recently shown to interact with Ldb1 and are also important in developmental programs. We establish here that two mammalian SSBPs, SSBP2 and SSBP3, contribute to an erythroid DNA-binding complex that contains the transcription factors Tal1 and GATA-1, the LIM domain protein Lmo2, and Ldb1 and binds a bipartite E-box-GATA DNA sequence motif. In addition, SSBP2 was found to augment transcription of the Protein 4.2 (P4.2) gene, a direct target of the E-box-GATA-binding complex, in an Ldb1-dependent manner and to increase endogenous Ldb1 and Lmo2 protein levels, E-box-GATA DNA-binding activity, and P4.2 and beta-globin expression in erythroid progenitors. Finally, SSBP2 was demonstrated to inhibit Ldb1 and Lmo2 interaction with the E3 ubiquitin ligase RLIM, prevent RLIM-mediated Ldb1 ubiquitination, and protect Ldb1 and Lmo2 from proteasomal degradation. These results define a novel biochemical function for SSBPs in regulating the abundance of LIM domain and LIM domain-binding proteins.
Resumo:
Lyme disease Borrelia can infect humans and animals for months to years, despite the presence of an active host immune response. The vls antigenic variation system, which expresses the surface-exposed lipoprotein VlsE, plays a major role in B. burgdorferi immune evasion. Gene conversion between vls silent cassettes and the vlsE expression site occurs at high frequency during mammalian infection, resulting in sequence variation in the VlsE product. In this study, we examined vlsE sequence variation in B. burgdorferi B31 during mouse infection by analyzing 1,399 clones isolated from bladder, heart, joint, ear, and skin tissues of mice infected for 4 to 365 days. The median number of codon changes increased progressively in C3H/HeN mice from 4 to 28 days post infection, and no clones retained the parental vlsE sequence at 28 days. In contrast, the decrease in the number of clones with the parental vlsE sequence and the increase in the number of sequence changes occurred more gradually in severe combined immunodeficiency (SCID) mice. Clones containing a stop codon were isolated, indicating that continuous expression of full-length VlsE is not required for survival in vivo; also, these clones continued to undergo vlsE recombination. Analysis of clones with apparent single recombination events indicated that recombinations into vlsE are nonselective with regard to the silent cassette utilized, as well as the length and location of the recombination event. Sequence changes as small as one base pair were common. Fifteen percent of recovered vlsE variants contained "template-independent" sequence changes, which clustered in the variable regions of vlsE. We hypothesize that the increased frequency and complexity of vlsE sequence changes observed in clones recovered from immunocompetent mice (as compared with SCID mice) is due to rapid clearance of relatively invariant clones by variable region-specific anti-VlsE antibody responses.
Resumo:
The complete 50,237-bp DNA sequence of the conjugative and mobilizing multiresistance plasmid pRE25 from Enterococcus faecalis RE25 was determined. The plasmid had 58 putative open reading frames, 5 of which encode resistance to 12 antimicrobials. Chloramphenicol acetyltransferase and the 23S RNA methylase are identical to gene products of the broad-host-range plasmid pIP501 from Streptococcus agalactiae. In addition, a 30.5-kb segment is almost identical to pIP501. Genes encoding an aminoglycoside 6-adenylyltransferase, a streptothricin acetyltransferase, and an aminoglycoside phosphotransferase are arranged in tandem on a 7.4-kb fragment as previously reported in Tn5405 from Staphylococcus aureus and in pJH1 from E. faecalis. One interrupted and five complete IS elements as well as three replication genes were also identified. pRE25 was transferred by conjugation to E. faecalis, Listeria innocua, and Lactococcus lactis by means of a transfer region that appears similar to that of pIP501. It is concluded that pRE25 may contribute to the further spread of antibiotic-resistant microorganisms via food into the human community.
VERIFICATION OF DNA PREDICTED PROTEIN SEQUENCES BY ENZYME HYDROLYSIS AND MASS SPECTROMETRIC ANALYSIS
Resumo:
The focus of this thesis lies in the development of a sensitive method for the analysis of protein primary structure which can be easily used to confirm the DNA sequence of a protein's gene and determine the modifications which are made after translation. This technique involves the use of dipeptidyl aminopeptidase (DAP) and dipeptidyl carboxypeptidase (DCP) to hydrolyze the protein and the mass spectrometric analysis of the dipeptide products.^ Dipeptidyl carboxypeptidase was purified from human lung tissue and characterized with respect to its proteolytic activity. The results showed that the enzyme has a relatively unrestricted specificity, making it useful for the analysis of the C-terminal of proteins. Most of the dipeptide products were identified using gas chromatography/mass spectrometry (GC/MS). In order to analyze the peptides not hydrolyzed by DCP and DAP, as well as the dipeptides not identified by GC/MS, a FAB ion source was installed on a quadrupole mass spectrometer and its performance evaluated with a variety of compounds.^ Using these techniques, the sequences of the N-terminal and C-terminal regions and seven fragments of bacteriophage P22 tail protein have been verified. All of the dipeptides identified in these analysis were in the same DNA reading frame, thus ruling out the possibility of a single base being inserted or deleted from the DNA sequence. The verification of small sequences throughout the protein sequence also indicates that no large portions of the protein have been removed after translation. ^
Resumo:
DNA sequence variation is currently a major source of data for studying human origins, evolution, and demographic history, and for detecting linkage association of complex diseases. In this dissertation, I investigated DNA variation in worldwide populations from two ∼10 kb autosomal regions on 22q11.2 (noncoding) and 1q24 (introns). A total of 75 variant sites were found among 128 human sequences in the 22q11.2 region, yielding an estimate of 0.088% for nucleotide diversity (π), and a total of 52 variant sites were found among 122 human sequences in the 1q24 region with an estimated π value of 0.057%. The data from these two regions and a 10 kb noncoding region on Xq13.3 all show a strong excess of low-frequency variants in comparison to that expected from an equilibrium population, indicating a relatively recent population expansion. The effective population sizes estimated from the three regions were 11,000, 12,700, and 8,600, respectively, which are close to the commonly used value of 10,000. In each of the two autosomal regions, the age of the most recent common ancestor (MRCA) was estimated to be older than 1 million years among all the sequences and ∼600,000 years among non-African sequences, providing first evidence from autosomal noncoding or intronic regions for a genetic history of humans much more ancient than the emergence of modern humans. The ancient genetic history of humans indicates no severe bottleneck during the evolution of humans in the last half million years; otherwise, much of the ancient genetic history would have been lost during a severe bottleneck. This study strongly suggests that both the “out of Africa” and the multiregional models are too simple for explaining the evolution of modern humans. A compilation of genome-wide data revealed that nucleotide diversity is highest in autosomal regions, intermediate in X-linked regions, and lowest in Y-linked regions. The data suggest the existence of background selection or selective sweep on Y-linked loci. In general, the nucleotide diversity in humans is low compared to that in chimpanzee and Drosophila populations. ^
Resumo:
Background: Zooplankton play an important role in our oceans, in biogeochemical cycling and providing a food source for commercially important fish larvae. However, difficulties in correctly identifying zooplankton hinder our understanding of their roles in marine ecosystem functioning, and can prevent detection of long term changes in their community structure. The advent of massively parallel Next Generation Sequencing technology allows DNA sequence data to be recovered directly from whole community samples. Here we assess the ability of such sequencing to quantify the richness and diversity of a mixed zooplankton assemblage from a productive monitoring site in the Western English Channel. Methodology/Principle Findings: Plankton WP2 replicate net hauls (200 µm) were taken at the Western Channel Observatory long-term monitoring station L4 in September 2010 and January 2011. These samples were analysed by microscopy and metagenetic analysis of the 18S nuclear small subunit ribosomal RNA gene using the 454 pyrosequencing platform. Following quality control a total of 419,042 sequences were obtained for all samples. The sequences clustered in to 205 operational taxonomic units using a 97% similarity cut-off. Allocation of taxonomy by comparison with the National Centre for Biotechnology Information database identified 138 OTUs to species level, 11 to genus level and 1 to order, <2.5% of sequences were classified as unknowns. By comparison a skilled microscopic analyst was able to routinely enumerate only 75 taxonomic groups. Conclusions: The percentage of OTUs assigned to major eukaryotic taxonomic groups broadly aligns between the metagenetic and morphological analysis and are dominated by Copepoda. However, the metagenetics reveals a previously hidden taxonomic richness, especially for Copepoda and meroplankton such as Bivalvia, Gastropoda and Polychaeta. It also reveals rare species and parasites. We conclude that Next Generation Sequencing of 18S amplicons is a powerful tool for estimating diversity and species richness of zooplankton communities.
Resumo:
In Azotobacter vinelandii, deletion of the fdxA gene that encodes a well characterized seven-iron ferredoxin (FdI) is known to lead to overexpression of the FdI redox partner, NADPH:ferredoxin reductase (FPR). Previous studies have established that this is an oxidative stress response in which the fpr gene is transcriptionally activated to the same extent in response to either addition of the superoxide propagator paraquat to the cells or to fdxA deletion. In both cases, the activation occurs through a specific DNA sequence located upstream of the fpr gene. Here, we report the identification of the A. vinelandii protein that binds specifically to the paraquat activatable fpr promoter region as the E1 subunit of the pyruvate dehydrogenase complex (PDHE1), a central enzyme in aerobic respiration. Sequence analysis shows that PDHE1, which was not previously suspected to be a DNA-binding protein, has a helix–turn–helix motif. The data presented here further show that FdI binds specifically to the DNA-bound PDHE1.
Resumo:
Although polyomavirus JC (JCV) is the proven pathogen of progressive multifocal leukoencephalopathy, the fatal demyelinating disease, this virus is ubiquitous as a usually harmless symbiote among human beings. JCV propagates in the adult kidney and excretes its progeny in urine, from which JCV DNA can readily be recovered. The main mode of transmission of JCV is from parents to children through long cohabitation. In this study, we collected a substantial number of urine samples from native inhabitants of 34 countries in Europe, Africa, and Asia. A 610-bp segment of JCV DNA was amplified from each urine sample, and its DNA sequence was determined. A worldwide phylogenetic tree subsequently constructed revealed the presence of nine subtypes including minor ones. Five subtypes (EU, Af2, B1, SC, and CY) occupied rather large territories that overlapped with each other at their boundaries. The entire Europe, northern Africa, and western Asia were the domain of EU, whereas the domain of Af2 included nearly all of Africa and southwestern Asia all the way to the northeastern edge of India. Partially overlapping domains in Asia were occupied by subtypes B1, SC, and CY. Of particular interest was the recovery of JCV subtypes in a pocket or pockets that were separated by great geographic distances from the main domains of those subtypes. Certain of these pockets can readily be explained by recent migrations of human populations carrying these subtypes. Overall, it appears that JCV genotyping promises to reveal previously unknown human migration routes: ancient as well as recent.
Resumo:
We have examined the effects on transcription initiation of promoter and enhancer strength and of the curvature of the DNA separating these entities on wild-type and mutated enhancer–promoter regions at the Escherichia coli σ54-dependent promoters glnAp2 and glnHp2 on supercoiled and linear DNA. Our results, together with previously reported observations by other investigators, show that the initiation of transcription on linear DNA requires a single intrinsic or induced bend in the DNA, as well as a promoter with high affinity for σ54-RNA polymerase, but on supercoiled DNA requires either such a bend or a high affinity promoter but not both. The examination of the DNA sequence of all nif gene activator- or nitrogen regulator I-σ54 promoters reveals that those lacking a binding site for the integration host factor have an intrinsic single bend in the DNA separating enhancer from promoter.
Molecular keys to speciation: DNA polymorphism and the control of genetic exchange in enterobacteria
Resumo:
Speciation involves the establishment of genetic barriers between closely related organisms. The extent of genetic recombination is a key determinant and a measure of genetic isolation. The results reported here reveal that genetic barriers can be established, eliminated, or modified by manipulating two systems which control genetic recombination, SOS and mismatch repair. The extent of genetic isolation between enterobacteria is a simple mathematical function of DNA sequence divergence. The function does not depend on hybrid DNA stability, but rather on the number of blocks of sequences identical in the two mating partners and sufficiently large to allow the initiation of recombination. Further, there is no obvious discontinuity in the function that could be used to define a level of divergence for distinguishing species.
Resumo:
We have used two monovalent phage display libraries containing variants of the Zif268 DNA-binding domain to obtain families of zinc fingers that bind to alterations in the last 4 bp of the DNA sequence of the Zif268 consensus operator, GCG TGGGCG. Affinity selection was performed by altering the Zif268 operator three base pairs at a time, and simultaneously selecting for sets of 16 related DNA sequences. In this way, only four experiments were required to select for all possible 64 combinations of DNA triplet sequences. The results show that (i) for high-affinity DNA binding in the range observed for the Zif268 wild-type complex (Kd = 0.5–5 nM), finger 1 specifically requires the arginine at the carboxy terminus of its recognition helix that forms a bidentate hydrogen-bond with the guanine base (G) in the crystal structure of Zif268 complexed to its DNA operator sequence GCG TGG GCG; (ii) when the guanine base (G) is replaced by A, C, or T, a lower-affinity family (Kd ⩾ 50 nM) can be detected that shows an overall tendency to bind G-rich DNA; (iii) the residues at position 2 on the finger 2 recognition helix do not appear to interact strongly with the complementary 5′ base in the finger 1 binding site; and (iv) unexpected substitutions at the amino terminus of finger 1 can occasionally result in specificity for the 3′ base in the finger 1 binding site. A DNA recognition directory was constructed for high-affinity zinc fingers that recognize all three bases in a DNA triplet for seven sequences of the type GNN. Similar approaches may be applied to other zinc fingers to broaden the scope of the directory.