7 resultados para Simulation-based methods
em National Center for Biotechnology Information - NCBI
Resumo:
Elucidating the genetic basis of human phenotypes is a major goal of contemporary geneticists. Logically, two fundamental and contrasting approaches are available, one that begins with a phenotype and concludes with the identification of a responsible gene or genes; the other that begins with a gene and works toward identifying one or more phenotypes resulting from allelic variation of it. This paper provides a conceptual overview of phenotype-based vs. gene-based procedures with emphasis on gene-based methods. A key feature of a gene-based approach is that laboratory effort first is devoted to developing an assay for mutations in the gene under regard; the assay then is applied to the evaluation of large numbers of unrelated individuals with a variety of phenotypes that are deemed potentially resulting from alleles at the gene. No effort is directed toward chromosomally mapping the loci responsible for the phenotypes scanned. Example is made of my laboratory’s successful use of a gene-based approach to identify genes causing hereditary diseases of the retina such as retinitis pigmentosa. Reductions in the cost and improvements in the speed of scanning individuals for DNA sequence anomalies may make a gene-based approach an efficient alternative to phenotype-based approaches to correlating genes with phenotypes.
Resumo:
We examine the occurrence of the ≈300 known protein folds in different groups of organisms. To do this, we characterize a large fraction of the currently known protein sequences (≈140,000) in structural terms, by matching them to known structures via sequence comparison (or by secondary-structure class prediction for those without structural homologues). Overall, we find that an appreciable fraction of the known folds are present in each of the major groups of organisms (e.g., bacteria and eukaryotes share 156 of 275 folds), and most of the common folds are associated with many families of nonhomologous sequences (i.e., >10 sequence families for each common fold). However, different groups of organisms have characteristically distinct distributions of folds. So, for instance, some of the most common folds in vertebrates, such as globins or zinc fingers, are rare or absent in bacteria. Many of these differences in fold usage are biologically reasonable, such as the folds of metabolic enzymes being common in bacteria and those associated with extracellular transport and communication being common in animals. They also have important implications for database-based methods for fold recognition, suggesting that an unknown sequence from a plant is more likely to have a certain fold (e.g., a TIM barrel) than an unknown sequence from an animal.
Resumo:
On the causal hypothesis, most genetic determinants of disease are single-nucleotide polymorphisms (SNPs) that are likely to be selected as markers for positional cloning. On the proximity hypothesis, most disease determinants will not be included among markers but may be detected through linkage disequilibrium with other SNPs. In that event, allelic association among SNPs is an essential factor in positional cloning. Recent simulation based on monotonic population expansion suggests that useful association does not usually extend beyond 3 kb. This is contradicted by significant disequilibrium at much greater distances, with corresponding reduction in the number of SNPs required for a cost-effective genome scan. A plausible explanation is that cyclical expansions follow population bottlenecks that establish new disequilibria. Data on more than 1,000 locus pairs indicate that most disequilibria trace to the Neolithic, with no apparent difference between haplotypes that are random or selected through a major disease gene. Short duration may be characteristic of alleles contributing to disease susceptibility and haplotypes characteristic of particular ethnic groups. Alleles that are highly polymorphic in all ethnic groups may be older, neutral, or advantageous, in weak disequilibrium with nearby markers, and therefore less useful for positional cloning of disease genes. Significant disequilibrium at large distance makes the number of suitably chosen SNPs required for genome screening as small as 30,000, or 1 per 100 kb, with greater density (including less common SNPs) reserved for candidate regions.
Resumo:
We created a simulation based on experimental data from bacteriophage T7 that computes the developmental cycle of the wild-type phage and also of mutants that have an altered genome order. We used the simulation to compute the fitness of more than 105 mutants. We tested these computations by constructing and experimentally characterizing T7 mutants in which we repositioned gene 1, coding for T7 RNA polymerase. Computed protein synthesis rates for ectopic gene 1 strains were in moderate agreement with observed rates. Computed phage-doubling rates were close to observations for two of four strains, but significantly overestimated those of the other two. Computations indicate that the genome organization of wild-type T7 is nearly optimal for growth: only 2.8% of random genome permutations were computed to grow faster, the highest 31% faster, than wild type. Specific discrepancies between computations and observations suggest that a better understanding of the translation efficiency of individual mRNAs and the functions of qualitatively “nonessential” genes will be needed to improve the T7 simulation. In silico representations of biological systems can serve to assess and advance our understanding of the underlying biology. Iteration between computation, prediction, and observation should increase the rate at which biological hypotheses are formulated and tested.
Resumo:
Gene expression profiling provides powerful analyses of transcriptional responses to cellular perturbation. In contrast to DNA array-based methods, reporter gene technology has been underused for this application. Here we describe a genomewide, genome-registered collection of Escherichia coli bioluminescent reporter gene fusions. DNA sequences from plasmid-borne, random fusions of E. coli chromosomal DNA to a Photorhabdus luminescens luxCDABE reporter allowed precise mapping of each fusion. The utility of this collection covering about 30% of the transcriptional units was tested by analyzing individual fusions representative of heat shock, SOS, OxyR, SoxRS, and cya/crp stress-responsive regulons. Each fusion strain responded as anticipated to environmental conditions known to activate the corresponding regulatory circuit. Thus, the collection mirrors E. coli's transcriptional wiring diagram. This genomewide collection of gene fusions provides an independent test of results from other gene expression analyses. Accordingly, a DNA microarray-based analysis of mitomycin C-treated E. coli indicated elevated expression of expected and unanticipated genes. Selected luxCDABE fusions corresponding to these up-regulated genes were used to confirm or contradict the DNA microarray results. The power of partnering gene fusion and DNA microarray technology to discover promoters and define operons was demonstrated when data from both suggested that a cluster of 20 genes encoding production of type I extracellular polysaccharide in E. coli form a single operon.
Resumo:
Phyllosphere microbial communities were evaluated on leaves of field-grown plant species by culture-dependent and -independent methods. Denaturing gradient gel electrophoresis (DGGE) with 16S rDNA primers generally indicated that microbial community structures were similar on different individuals of the same plant species, but unique on different plant species. Phyllosphere bacteria were identified from Citrus sinesis (cv. Valencia) by using DGGE analysis followed by cloning and sequencing of the dominant rDNA bands. Of the 17 unique sequences obtained, database queries showed only four strains that had been described previously as phyllosphere bacteria. Five of the 17 sequences had 16S similarities lower than 90% to database entries, suggesting that they represent previously undescribed species. In addition, three fungal species were also identified. Very different 16S rDNA DGGE banding profiles were obtained when replicate cv. Valencia leaf samples were cultured in BIOLOG EcoPlates for 4.5 days. All of these rDNA sequences had 97–100% similarity to those of known phyllosphere bacteria, but only two of them matched those identified by the culture independent DGGE analysis. Like other studied ecosystems, microbial phyllosphere communities therefore are more complex than previously thought, based on conventional culture-based methods.
Resumo:
The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information-theoretic techniques, with traditional symbolic methods. This work is made possible by the recent availability of linguistic databases that add rich linguistic annotation to corpora of natural language text. Already, these methods have led to a dramatic improvement in the performance of a variety of NLP systems with similar improvement likely in the coming years. This paper focuses on these trends, surveying in particular three areas of recent progress: part-of-speech tagging, stochastic parsing, and lexical semantics.