6 resultados para penalty-based genetic algorithm
em DigitalCommons@The Texas Medical Center
Resumo:
A plasmid based genetic system was developed for the tail protein of the Salmonella typhimurium bacteriophage P22 and used to isolate and characterize tail protein mutants. The tail protein is a trimeric structural protein of the phage and an endorhamnosidase whose activity is essential for infection. The gene for the tail protein has previously been cloned into a plasmid expression vector and sequenced. A plate complementation assay for tail protein produced from the cloned gene was developed and used to isolate 27 tail protein mutants following mutagenesis of the cloned gene. These mutations were mapped into 12 deletion intervals using deletions which were made on plasmids in vitro and crossed onto P22. The base substitutions were determined by DNA sequencing. The majority of mutants had missense or nonsense mutations in the protein coding portion of the gene; however four of the mutants were in the putative transcription terminator. The oligomeric state of tail protein from the 15 missense mutants was investigated using SDS and nondenaturing polyacrylamide gel electrophoresis of cell lysates. Wild-type tail protein retains its trimeric structure in SDS gels at room temperature. Two of the mutant proteins also migrated as trimers in SDS gels, yet one of these had a considerably faster mobility than wild-type trimer. Its migration was the same as wild-type in a nondenaturing gel, so it is thought to be a trimer which is partially denatured by SDS. Four of the mutants produced proteins which migrate at the position of a monomer in an SDS gel but cannot be seen on a nondenaturing gel. These proteins are thought to be either monomers or soluble aggregates which cannot enter the nondenaturing gel. The remainder of mutants produce protein which is degraded. The mutant tail protein which had normal trimeric mobility on SDS and nondenaturing gels was purified. This protein has essentially wild-type ability to attach to phage capsids, but its endorhamnosidase activity is only 4% of wild-type. ^
Resumo:
Linkage and association studies are major analytical tools to search for susceptibility genes for complex diseases. With the availability of large collection of single nucleotide polymorphisms (SNPs) and the rapid progresses for high throughput genotyping technologies, together with the ambitious goals of the International HapMap Project, genetic markers covering the whole genome will be available for genome-wide linkage and association studies. In order not to inflate the type I error rate in performing genome-wide linkage and association studies, multiple adjustment for the significant level for each independent linkage and/or association test is required, and this has led to the suggestion of genome-wide significant cut-off as low as 5 × 10 −7. Almost no linkage and/or association study can meet such a stringent threshold by the standard statistical methods. Developing new statistics with high power is urgently needed to tackle this problem. This dissertation proposes and explores a class of novel test statistics that can be used in both population-based and family-based genetic data by employing a completely new strategy, which uses nonlinear transformation of the sample means to construct test statistics for linkage and association studies. Extensive simulation studies are used to illustrate the properties of the nonlinear test statistics. Power calculations are performed using both analytical and empirical methods. Finally, real data sets are analyzed with the nonlinear test statistics. Results show that the nonlinear test statistics have correct type I error rates, and most of the studied nonlinear test statistics have higher power than the standard chi-square test. This dissertation introduces a new idea to design novel test statistics with high power and might open new ways to mapping susceptibility genes for complex diseases. ^
Resumo:
Essential biological processes are governed by organized, dynamic interactions between multiple biomolecular systems. Complexes are thus formed to enable the biological function and get dissembled as the process is completed. Examples of such processes include the translation of the messenger RNA into protein by the ribosome, the folding of proteins by chaperonins or the entry of viruses in host cells. Understanding these fundamental processes by characterizing the molecular mechanisms that enable then, would allow the (better) design of therapies and drugs. Such molecular mechanisms may be revealed trough the structural elucidation of the biomolecular assemblies at the core of these processes. Various experimental techniques may be applied to investigate the molecular architecture of biomolecular assemblies. High-resolution techniques, such as X-ray crystallography, may solve the atomic structure of the system, but are typically constrained to biomolecules of reduced flexibility and dimensions. In particular, X-ray crystallography requires the sample to form a three dimensional (3D) crystal lattice which is technically di‑cult, if not impossible, to obtain, especially for large, dynamic systems. Often these techniques solve the structure of the different constituent components within the assembly, but encounter difficulties when investigating the entire system. On the other hand, imaging techniques, such as cryo-electron microscopy (cryo-EM), are able to depict large systems in near-native environment, without requiring the formation of crystals. The structures solved by cryo-EM cover a wide range of resolutions, from very low level of detail where only the overall shape of the system is visible, to high-resolution that approach, but not yet reach, atomic level of detail. In this dissertation, several modeling methods are introduced to either integrate cryo-EM datasets with structural data from X-ray crystallography, or to directly interpret the cryo-EM reconstruction. Such computational techniques were developed with the goal of creating an atomic model for the cryo-EM data. The low-resolution reconstructions lack the level of detail to permit a direct atomic interpretation, i.e. one cannot reliably locate the atoms or amino-acid residues within the structure obtained by cryo-EM. Thereby one needs to consider additional information, for example, structural data from other sources such as X-ray crystallography, in order to enable such a high-resolution interpretation. Modeling techniques are thus developed to integrate the structural data from the different biophysical sources, examples including the work described in the manuscript I and II of this dissertation. At intermediate and high-resolution, cryo-EM reconstructions depict consistent 3D folds such as tubular features which in general correspond to alpha-helices. Such features can be annotated and later on used to build the atomic model of the system, see manuscript III as alternative. Three manuscripts are presented as part of the PhD dissertation, each introducing a computational technique that facilitates the interpretation of cryo-EM reconstructions. The first manuscript is an application paper that describes a heuristics to generate the atomic model for the protein envelope of the Rift Valley fever virus. The second manuscript introduces the evolutionary tabu search strategies to enable the integration of multiple component atomic structures with the cryo-EM map of their assembly. Finally, the third manuscript develops further the latter technique and apply it to annotate consistent 3D patterns in intermediate-resolution cryo-EM reconstructions. The first manuscript, titled An assembly model for Rift Valley fever virus, was submitted for publication in the Journal of Molecular Biology. The cryo-EM structure of the Rift Valley fever virus was previously solved at 27Å-resolution by Dr. Freiberg and collaborators. Such reconstruction shows the overall shape of the virus envelope, yet the reduced level of detail prevents the direct atomic interpretation. High-resolution structures are not yet available for the entire virus nor for the two different component glycoproteins that form its envelope. However, homology models may be generated for these glycoproteins based on similar structures that are available at atomic resolutions. The manuscript presents the steps required to identify an atomic model of the entire virus envelope, based on the low-resolution cryo-EM map of the envelope and the homology models of the two glycoproteins. Starting with the results of the exhaustive search to place the two glycoproteins, the model is built iterative by running multiple multi-body refinements to hierarchically generate models for the different regions of the envelope. The generated atomic model is supported by prior knowledge regarding virus biology and contains valuable information about the molecular architecture of the system. It provides the basis for further investigations seeking to reveal different processes in which the virus is involved such as assembly or fusion. The second manuscript was recently published in the of Journal of Structural Biology (doi:10.1016/j.jsb.2009.12.028) under the title Evolutionary tabu search strategies for the simultaneous registration of multiple atomic structures in cryo-EM reconstructions. This manuscript introduces the evolutionary tabu search strategies applied to enable a multi-body registration. This technique is a hybrid approach that combines a genetic algorithm with a tabu search strategy to promote the proper exploration of the high-dimensional search space. Similar to the Rift Valley fever virus, it is common that the structure of a large multi-component assembly is available at low-resolution from cryo-EM, while high-resolution structures are solved for the different components but lack for the entire system. Evolutionary tabu search strategies enable the building of an atomic model for the entire system by considering simultaneously the different components. Such registration indirectly introduces spatial constrains as all components need to be placed within the assembly, enabling the proper docked in the low-resolution map of the entire assembly. Along with the method description, the manuscript covers the validation, presenting the benefit of the technique in both synthetic and experimental test cases. Such approach successfully docked multiple components up to resolutions of 40Å. The third manuscript is entitled Evolutionary Bidirectional Expansion for the Annotation of Alpha Helices in Electron Cryo-Microscopy Reconstructions and was submitted for publication in the Journal of Structural Biology. The modeling approach described in this manuscript applies the evolutionary tabu search strategies in combination with the bidirectional expansion to annotate secondary structure elements in intermediate resolution cryo-EM reconstructions. In particular, secondary structure elements such as alpha helices show consistent patterns in cryo-EM data, and are visible as rod-like patterns of high density. The evolutionary tabu search strategy is applied to identify the placement of the different alpha helices, while the bidirectional expansion characterizes their length and curvature. The manuscript presents the validation of the approach at resolutions ranging between 6 and 14Å, a level of detail where alpha helices are visible. Up to resolution of 12 Å, the method measures sensitivities between 70-100% as estimated in experimental test cases, i.e. 70-100% of the alpha-helices were correctly predicted in an automatic manner in the experimental data. The three manuscripts presented in this PhD dissertation cover different computation methods for the integration and interpretation of cryo-EM reconstructions. The methods were developed in the molecular modeling software Sculptor (http://sculptor.biomachina.org) and are available for the scientific community interested in the multi-resolution modeling of cryo-EM data. The work spans a wide range of resolution covering multi-body refinement and registration at low-resolution along with annotation of consistent patterns at high-resolution. Such methods are essential for the modeling of cryo-EM data, and may be applied in other fields where similar spatial problems are encountered, such as medical imaging.
Resumo:
The main objective of this study was to develop and validate a computer-based statistical algorithm based on a multivariable logistic model that can be translated into a simple scoring system in order to ascertain stroke cases using hospital admission medical records data. This algorithm, the Risk Index Score (RISc), was developed using data collected prospectively by the Brain Attack Surveillance in Corpus Christ (BASIC) project. The validity of the RISc was evaluated by estimating the concordance of scoring system stroke ascertainment to stroke ascertainment accomplished by physician review of hospital admission records. The goal of this study was to develop a rapid, simple, efficient, and accurate method to ascertain the incidence of stroke from routine hospital admission hospital admission records for epidemiologic investigations. ^ The main objectives of this study were to develop and validate a computer-based statistical algorithm based on a multivariable logistic model that could be translated into a simple scoring system to ascertain stroke cases using hospital admission medical records data. (Abstract shortened by UMI.)^
Resumo:
Although many family-based genetic studies have collected dietary data, very few have used the dietary information in published findings. No single solution has been presented or discussed in the literature to deal with the problem of using factor analyses for the analyses of dietary data from several related individuals from a given household. The standard statistical approach of factor analysis cannot be applied to the VIVA LA FAMILIA Study diet data to ascertain dietary patterns since this population consists of three children from each family, thus the dietary patterns of the related children may be correlated and non-independent. Addressing this problem in this project will enable us to describe the dietary patterns in Hispanic families and to explore the relationships between dietary patterns and childhood obesity. ^ In the VIVA LA FAMILIA Study, an overweight child was first identified and then his/her siblings and parents were brought in for data collection which included 24 hour recalls and food frequency questionnaire (FFQ). Dietary intake data were collected using FFQ and 24 hour recalls on 1030 Hispanic children from 319 families. ^ The design of the VIVA LA FAMILIA Study has important and unique statistical considerations since its participants are related to each other, the majority form distinct nuclear families. Thus, the standard approach of factor analysis cannot be applied to these diet data to ascertain dietary patterns. In this project we propose to investigate whether the determinants of the correlation matrix of each family unit will allow us to adjust the original correlation matrix of the dietary intake data prior to ascertaining dietary intake patterns. If these methods are appropriate, then in the future the dietary patterns among related individuals could be assessed by standard orthogonal principal component factor analysis.^
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^