923 resultados para Complete nucleotide sequence


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The macronuclear genome of the ciliate Oxytricha trifallax displays an extreme and unique eukaryotic genome architecture with extensive genomic variation. During sexual genome development, the expressed, somatic macronuclear genome is whittled down to the genic portion of a small fraction (∼5%) of its precursor "silent" germline micronuclear genome by a process of "unscrambling" and fragmentation. The tiny macronuclear "nanochromosomes" typically encode single, protein-coding genes (a small portion, 10%, encode 2-8 genes), have minimal noncoding regions, and are differentially amplified to an average of ∼2,000 copies. We report the high-quality genome assembly of ∼16,000 complete nanochromosomes (∼50 Mb haploid genome size) that vary from 469 bp to 66 kb long (mean ∼3.2 kb) and encode ∼18,500 genes. Alternative DNA fragmentation processes ∼10% of the nanochromosomes into multiple isoforms that usually encode complete genes. Nucleotide diversity in the macronucleus is very high (SNP heterozygosity is ∼4.0%), suggesting that Oxytricha trifallax may have one of the largest known effective population sizes of eukaryotes. Comparison to other ciliates with nonscrambled genomes and long macronuclear chromosomes (on the order of 100 kb) suggests several candidate proteins that could be involved in genome rearrangement, including domesticated MULE and IS1595-like DDE transposases. The assembly of the highly fragmented Oxytricha macronuclear genome is the first completed genome with such an unusual architecture. This genome sequence provides tantalizing glimpses into novel molecular biology and evolution. For example, Oxytricha maintains tens of millions of telomeres per cell and has also evolved an intriguing expansion of telomere end-binding proteins. In conjunction with the micronuclear genome in progress, the O. trifallax macronuclear genome will provide an invaluable resource for investigating programmed genome rearrangements, complementing studies of rearrangements arising during evolution and disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Models of DNA sequence evolution and methods for estimating evolutionary distances are needed for studying the rate and pattern of molecular evolution and for inferring the evolutionary relationships of organisms or genes. In this dissertation, several new models and methods are developed.^ The rate variation among nucleotide sites: To obtain unbiased estimates of evolutionary distances, the rate heterogeneity among nucleotide sites of a gene should be considered. Commonly, it is assumed that the substitution rate varies among sites according to a gamma distribution (gamma model) or, more generally, an invariant+gamma model which includes some invariable sites. A maximum likelihood (ML) approach was developed for estimating the shape parameter of the gamma distribution $(\alpha)$ and/or the proportion of invariable sites $(\theta).$ Computer simulation showed that (1) under the gamma model, $\alpha$ can be well estimated from 3 or 4 sequences if the sequence length is long; and (2) the distance estimate is unbiased and robust against violations of the assumptions of the invariant+gamma model.^ However, this ML method requires a huge amount of computational time and is useful only for less than 6 sequences. Therefore, I developed a fast method for estimating $\alpha,$ which is easy to implement and requires no knowledge of tree. A computer program was developed for estimating $\alpha$ and evolutionary distances, which can handle the number of sequences as large as 30.^ Evolutionary distances under the stationary, time-reversible (SR) model: The SR model is a general model of nucleotide substitution, which assumes (i) stationary nucleotide frequencies and (ii) time-reversibility. It can be extended to SRV model which allows rate variation among sites. I developed a method for estimating the distance under the SR or SRV model, as well as the variance-covariance matrix of distances. Computer simulation showed that the SR method is better than a simpler method when the sequence length $L>1,000$ bp and is robust against deviations from time-reversibility. As expected, when the rate varies among sites, the SRV method is much better than the SR method.^ The evolutionary distances under nonstationary nucleotide frequencies: The statistical properties of the paralinear and LogDet distances under nonstationary nucleotide frequencies were studied. First, I developed formulas for correcting the estimation biases of the paralinear and LogDet distances. The performances of these formulas and the formulas for sampling variances were examined by computer simulation. Second, I developed a method for estimating the variance-covariance matrix of the paralinear distance, so that statistical tests of phylogenies can be conducted when the nucleotide frequencies are nonstationary. Third, a new method for testing the molecular clock hypothesis was developed in the nonstationary case. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

(1) A mathematical theory for computing the probabilities of various nucleotide configurations is developed, and the probability of obtaining the correct phylogenetic tree (model tree) from sequence data is evaluated for six phylogenetic tree-making methods (UPGMA, distance Wagner method, transformed distance method, Fitch-Margoliash's method, maximum parsimony method, and compatibility method). The number of nucleotides (m*) necessary to obtain the correct tree with a probability of 95% is estimated with special reference to the human, chimpanzee, and gorilla divergence. m* is at least 4,200, but the availability of outgroup species greatly reduces m* for all methods except UPGMA. m* increases if transitions occur more frequently than transversions as in the case of mitochondrial DNA. (2) A new tree-making method called the neighbor-joining method is proposed. This method is applicable either for distance data or character state data. Computer simulation has shown that the neighbor-joining method is generally better than UPGMA, Farris' method, Li's method, and modified Farris method on recovering the true topology when distance data are used. A related method, the simultaneous partitioning method, is also discussed. (3) The maximum likelihood (ML) method for phylogeny reconstruction under the assumption of both constant and varying evolutionary rates is studied, and a new algorithm for obtaining the ML tree is presented. This method gives a tree similar to that obtained by UPGMA when constant evolutionary rate is assumed, whereas it gives a tree similar to that obtained by the maximum parsimony tree and the neighbor-joining method when varying evolutionary rate is assumed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The complete 50,237-bp DNA sequence of the conjugative and mobilizing multiresistance plasmid pRE25 from Enterococcus faecalis RE25 was determined. The plasmid had 58 putative open reading frames, 5 of which encode resistance to 12 antimicrobials. Chloramphenicol acetyltransferase and the 23S RNA methylase are identical to gene products of the broad-host-range plasmid pIP501 from Streptococcus agalactiae. In addition, a 30.5-kb segment is almost identical to pIP501. Genes encoding an aminoglycoside 6-adenylyltransferase, a streptothricin acetyltransferase, and an aminoglycoside phosphotransferase are arranged in tandem on a 7.4-kb fragment as previously reported in Tn5405 from Staphylococcus aureus and in pJH1 from E. faecalis. One interrupted and five complete IS elements as well as three replication genes were also identified. pRE25 was transferred by conjugation to E. faecalis, Listeria innocua, and Lactococcus lactis by means of a transfer region that appears similar to that of pIP501. It is concluded that pRE25 may contribute to the further spread of antibiotic-resistant microorganisms via food into the human community.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Supramolecular DNA assembly blends DNA building blocks with synthetic organic and inorganic molecules giving structural and functional advantages both to the initial self-assembly process and to the final construct. Synthetic molecules can bring a number of additional interactions into DNA nanotechnology. Incorporating extended aromatic molecules as connectors of DNA strands allows folding of these strands through π-π stacking (DNA “foldamers”). In previous work it was shown that short oligopyrenotides (phosphodiester-linked pyrene oligomers) behave as staircase-like foldamers, which cooperatively self-assemble into two-dimensional supramolecular polymers in aqueous medium. Herein, we demonstrate that a 10-mer DNA-sequence modified with 7 pyrene units (see illustration) forms dimensionally-defined supramolecular polymers under thermodynamic conditions in water. We present the self-assembly behavior, morphological studies, and the spectroscopic properties of the investigated DNA-sequences (illustrative AFM picture shown below).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Supramolecular DNA assembly blends DNA building blocks with synthetic organic molecules giving structural and functional advantages. Incorporating extended aromatic molecules as connectors of DNA strands allows folding of these strands through π-π stacking (DNA 'foldamers'). In previous work it was shown that short oligopyrenotides behave as staircase-like foldamers, which cooperatively self-assemble into 2D supramolecular polymers in aqueous medium. Herein, we demonstrate that 10-mer DNA-sequence conjugated with seven pyrene unites forms dimensionally-defined supramolecular polymers under thermodynamic conditions in water. We present the self-assembly behavior, morphologycal studies (AFM and TEM), and the spectroscopic properties (UV/vis, CD) of the investigated pyrene - conjugated DNA-sequence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fragmentation of electrospray-generated multiply deprotonated RNA and mixed-sequence RNA/DNA pentanucleotides upon low-energy collision-induced dissociation (CID) in a hybrid quadrupole time-of-flight mass spectrometer was investigated. The goal of unambiguous sequence identification of mixed-sequence RNA/DNA oligonucleotides requires detailed understanding of the gas-phase dissociation of this class of compounds. The two major dissociation events, base loss and backbone fragmentation, are discussed and the unique fragmentation behavior of oligoribonucleotides is demonstrated. Backbone fragmentation of the all-RNA pentanucleotides is characterized by abundant c-ions and their complementary y-ions as the major sequence-defining fragment ion series. In contrast to the dissociation of oligodeoxyribonucleotides, where backbone fragmentation is initiated by the loss of a nucleobase which subsequently leads to the formation of the w- and [a-base]-ions, backbone dissociation of oligoribonucleotides is essentially decoupled from base loss. The different behavior of RNA and DNA oligonucleotides is related to the presence of the 2'-hydroxyl substituent, which is the only structural alteration between the DNA and RNA pentanucleotides studied. CID of mixed-sequence RNA/DNA pentanucleotides results in a combination of the nucleotide-typical backbone fragmentation products, with abundant w-fragment ions generated by cleavage of the phosphodiester backbone adjacent to the deoxy building blocks, whereas backbone cleavage adjacent to ribonucleotides induces the formation of c- and y-ions. (C) 2002 American Society for Mass Spectrometry.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND The copy number variation (CNV) in beta-defensin genes (DEFB) on human chromosome 8p23 has been proposed to contribute to the phenotypic differences in inflammatory diseases. However, determination of exact DEFB CN is a major challenge in association studies. Quantitative real-time PCR (qPCR), paralog ratio tests (PRT) and multiplex ligation-dependent probe amplification (MLPA) have been extensively used to determine DEFB CN in different laboratories, but inter-method inconsistencies were observed frequently. In this study we asked which one is superior among the three methods for DEFB CN determination. RESULTS We developed a clustering approach for MLPA and PRT to statistically correlate data from a single experiment. Then we compared qPCR, a newly designed PRT and MLPA for DEFB CN determination in 285 DNA samples. We found MLPA had the best convergence and clustering results of the raw data and the highest call rate. In addition, the concordance rates between MLPA or PRT and qPCR (32.12% and 37.99%, respectively) were unacceptably low with underestimated CN by qPCR. Concordance rate between MLPA and PRT (90.52%) was high but PRT systematically underestimated CN by one in a subset of samples. In these samples a sequence variant which caused complete PCR dropout of the respective DEFB cluster copies was found in one primer binding site of one of the targeted paralogous pseudogenes. CONCLUSION MLPA is superior to PRT and even more to qPCR for DEFB CN determination. Although the applied PRT provides in most cases reliable results, such a test is particularly sensitive to low-frequency sequence variations preferably accumulating in loci like pseudogenes which are most likely not under selective pressure. In the light of the superior performance of multiplex assays, the drawbacks of such single PRTs could be overcome by combining more test markers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions. METHODS We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute. RESULTS The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%. CONCLUSIONS Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The 3' ends of animal replication-dependent histone mRNAs are formed by endonucleolytic cleavage of the primary transcripts downstream of a highly conserved RNA hairpin. The hairpin-binding protein (HBP) binds to this RNA element and is involved in histone RNA 3' processing. A minimal RNA-binding domain (RBD) of approximately 73 amino acids that has no similarity with other known RNA-binding motifs was identified in human HBP [Wang Z-F et al., Genes & Dev, 1996, 10:3028-3040]. The primary sequence identity between human and Caenorhabditis elegans RBDs is 55% compared to 38% for the full-length proteins. We analyzed whether differences between C. elegans and human HBP and hairpins are reflected in the specificity of RNA binding. The C. elegans HBP and its RBD recognize only their cognate RNA hairpins, whereas the human HBP or RBD can bind both the mammalian and the C. elegans hairpins. This selectivity of C. elegans HBP is mostly mediated by the first nucleotide in the loop, which is C in C. elegans and U in all other metazoans. By converting amino acids in the human RBD to the corresponding C. elegans residues at places where the latter deviates from the consensus, we could identify two amino acid segments that contribute to selectivity for the first nucleotide of the hairpin loop.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A complete reference genome of the Apis mellifera Filamentous virus (AmFV) was determined using Illumina Hiseq sequencing. The AmFV genome is a double stranded DNA molecule of approximately 498,500 nucleotides with a GC content of 50.8%. It encompasses 247 non-overlapping open reading frames (ORFs), equally distributed on both strands, which cover 65% of the genome. While most of the ORFs lacked threshold sequence alignments to reference protein databases, twenty-eight were found to display significant homologies with proteins present in other large double stranded DNA viruses. Remarkably, 13 ORFs had strong similarity with typical baculovirus domains such as PIFs (per os infectivity factor genes: pif-1, pif-2, pif-3 and p74) and BRO (Baculovirus Repeated Open Reading Frame). The putative AmFV DNA polymerase is of type B, but is only distantly related to those of the baculoviruses. The ORFs encoding proteins involved in nucleotide metabolism had the highest percent identity to viral proteins in GenBank. Other notable features include the presence of several collagen-like, chitin-binding, kinesin and pacifastin domains. Due to the large size of the AmFV genome and the inconsistent affiliation with other large double stranded DNA virus families infecting invertebrates, AmFV may belong to a new virus family.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Theoretical and empirical studies were conducted on the pattern of nucleotide and amino acid substitution in evolution, taking into account the effects of mutation at the nucleotide level and purifying selection at the amino acid level. A theoretical model for predicting the evolutionary change in electrophoretic mobility of a protein was also developed by using information on the pattern of amino acid substitution. The specific problems studied and the main results obtained are as follows: (1) Estimation of the pattern of nucleotide substitution in DNA nuclear genomes. The pattern of point mutations and nucleotide substitutions among the four different nucleotides are inferred from the evolutionary changes of pseudogenes and functional genes, respectively. Both patterns are non-random, the rate of change varying considerably with nucleotide pair, and that in both cases transitions occur somewhat more frequently than transversions. In protein evolution, substitution occurs more often between amino acids with similar physico-chemical properties than between dissimilar amino acids. (2) Estimation of the pattern of nucleotide substitution in RNA genomes. The majority of mutations in retroviruses accumulate at the reverse transcription stage. Selection at the amino acid level is very weak, and almost non-existent between synonymous codons. The pattern of mutation is very different from that in DNA genomes. Nevertheless, the pattern of purifying selection at the amino acid level is similar to that in DNA genomes, although selection intensity is much weaker. (3) Evaluation of the determinants of molecular evolutionary rates in protein-coding genes. Based on rates of nucleotide substitution for mammalian genes, the rate of amino acid substitution of a protein is determined by its amino acid composition. The content of glycine is shown to correlate strongly and negatively with the rate of substitution. Empirical formulae, called indices of mutability, are developed in order to predict the rate of molecular evolution of a protein from data on its amino acid sequence. (4) Studies on the evolutionary patterns of electrophoretic mobility of proteins. A theoretical model was constructed that predicts the electric charge of a protein at any given pH and its isoelectric point from data on its primary and quaternary structures. Using this model, the evolutionary change in electrophoretic mobilities of different proteins and the expected amount of electrophoretically hidden genetic variation were studied. In the absence of selection for the pI value, proteins will on the average evolve toward a mildly basic pI. (Abstract shortened with permission of author.) ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Historically morphological features were used as the primary means to classify organisms. However, the age of molecular genetics has allowed us to approach this field from the perspective of the organism's genetic code. Early work used highly conserved sequences, such as ribosomal RNA. The increasing number of complete genomes in the public data repositories provides the opportunity to look not only at a single gene, but at organisms' entire parts list. ^ Here the Sequence Comparison Index (SCI) and the Organism Comparison Index (OCI), algorithms and methods to compare proteins and proteomes, are presented. The complete proteomes of 104 sequenced organisms were compared. Over 280 million full Smith-Waterman alignments were performed on sequence pairs which had a reasonable expectation of being related. From these alignments a whole proteome phylogenetic tree was constructed. This method was also used to compare the small subunit (SSU) rRNA from each organism and a tree constructed from these results. The SSU rRNA tree by the SCI/OCI method looks very much like accepted SSU rRNA trees from sources such as the Ribosomal Database Project, thus validating the method. The SCI/OCI proteome tree showed a number of small but significant differences when compared to the SSU rRNA tree and proteome trees constructed by other methods. Horizontal gene transfer does not appear to affect the SCI/OCI trees until the transferred genes make up a large portion of the proteome. ^ As part of this work, the Database of Related Local Alignments (DaRLA) was created and contains over 81 million rows of sequence alignment information. DaRLA, while primarily used to build the whole proteome trees, can also be applied shared gene content analysis, gene order analysis, and creating individual protein trees. ^ Finally, the standard BLAST method for analyzing shared gene content was compared to the SCI method using 4 spirochetes. The SCI system performed flawlessly, finding all proteins from one organism against itself and finding all the ribosomal proteins between organisms. The BLAST system missed some proteins from its respective organism and failed to detect small ribosomal proteins between organisms. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although more than 100 genes associated with inherited retinal disease have been mapped to chromosomal locations, less than half of these genes have been cloned. This text includes identification and evaluation of candidate genes for three autosomal dominant forms of inherited retinal degeneration: atypical vitelliform macular dystrophy (VMD1), cone-rod dystrophy (CORD), and retinitis pigmentosa (RP). ^ VMD1 is a disorder characterized by complete penetrance but extremely variable expressivity, and includes macular or peripheral retinal lesions and peripappilary abnormalitites. In 1984, linkage was reported between VMD1 and soluble glutamate-pyruvate transaminase GPT); however, placement of GPT to 8q24 on linkage maps had been debated, and VMD1 did not show linkage to microsatellite markers in that region. This study excluded linkage between the loci by cloning GPT, identifying the nucleotide substitution associated with the GPT sozymes, and by assaying VMD1 family samples with an RFLP designed to detect the substitution. In addition, linkage of VMD1 to the known dominant macular degeneration loci was excluded. ^ CORD is characterized by early onset of color-vision deficiency, and decreased visual acuity, However, this retinal degeneration progresses to no light perception, severe macular lesion, and “bone-spicule” accumulations in the peripheral retina. In this study, the disorder in a large Texan family was mapped to the CORD2 locus of 19q13, and a mutation in the retina/pineal-specific cone-rod homeobox gene (CRX) was identified as the disease cause. In addition, mutations in CRX were associated with significantly different retinal disease phenotypes, including retinitis pigmentosa and Leber congenital amaurosis. ^ Many of the mutations leading to inherited retinal disorders have been identified in genes like CRX, which are expressed predominantly in the retina and pineal gland. Therefore, a combination of database analysis and laboratory investigation was used to identify 26 novel retina/pineal-specific expressed sequence tag (EST) clusters as candidate genes for inherited retinal disorders. Eight of these genes were mapped into the candidate regions of inherited retinal degeneration loci. ^ Two of the eight clusters mapped into the retinitis pigmentosa RP13 candidate region of 17p13, and were both determined to represent a single gene that is highly expressed in photoreceptors. This gene, the Ah receptor-interacting like protein-1 (AIPL1), was cloned, characterized, and screened for mutations in RP13 patient DNA samples. ^