11 resultados para two-Gaussian mixture model
em DigitalCommons@The Texas Medical Center
Resumo:
Nuclear morphometry (NM) uses image analysis to measure features of the cell nucleus which are classified as: bulk properties, shape or form, and DNA distribution. Studies have used these measurements as diagnostic and prognostic indicators of disease with inconclusive results. The distributional properties of these variables have not been systematically investigated although much of the medical data exhibit nonnormal distributions. Measurements are done on several hundred cells per patient so summary measurements reflecting the underlying distribution are needed.^ Distributional characteristics of 34 NM variables from prostate cancer cells were investigated using graphical and analytical techniques. Cells per sample ranged from 52 to 458. A small sample of patients with benign prostatic hyperplasia (BPH), representing non-cancer cells, was used for general comparison with the cancer cells.^ Data transformations such as log, square root and 1/x did not yield normality as measured by the Shapiro-Wilks test for normality. A modulus transformation, used for distributions having abnormal kurtosis values, also did not produce normality.^ Kernel density histograms of the 34 variables exhibited non-normality and 18 variables also exhibited bimodality. A bimodality coefficient was calculated and 3 variables: DNA concentration, shape and elongation, showed the strongest evidence of bimodality and were studied further.^ Two analytical approaches were used to obtain a summary measure for each variable for each patient: cluster analysis to determine significant clusters and a mixture model analysis using a two component model having a Gaussian distribution with equal variances. The mixture component parameters were used to bootstrap the log likelihood ratio to determine the significant number of components, 1 or 2. These summary measures were used as predictors of disease severity in several proportional odds logistic regression models. The disease severity scale had 5 levels and was constructed of 3 components: extracapsulary penetration (ECP), lymph node involvement (LN+) and seminal vesicle involvement (SV+) which represent surrogate measures of prognosis. The summary measures were not strong predictors of disease severity. There was some indication from the mixture model results that there were changes in mean levels and proportions of the components in the lower severity levels. ^
Resumo:
Though E2F1 is deregulated in most human cancers by mutations of the p16-cyclin D-Rb pathway, it also exhibits tumor suppressive activity. A transgenic mouse model overexpressing E2F1 under the control of the bovine keratin 5 (K5) promoter exhibits epidermal hyperplasia and spontaneously develops tumors in the skin and other epithelial tissues after one year of age. In a p53-deficient background, aberrant apoptosis in K5 E2F1 transgenic epidermis is reduced and tumorigenesis is accelerated. In sharp contrast, K5 E2F1 transgenic mice are resistant to papilloma formation in the DMBA/TPA two-stage carcinogenesis protocol. K5 E2F4 and K5 DP1 transgenic mice were also characterized and both display epidermal hyperplasia but do not develop spontaneous tumors even in cooperation with p53 deficiency. These transgenic mice do not have increased levels of apoptosis in their skin and are more susceptible to papilloma formation in the two-stage carcinogenesis model. These studies show that deregulated proliferation does not necessarily lead to tumor formation and that the ability to suppress skin carcinogenesis is unique to E2F1. E2F1 can also suppress skin carcinogenesis when okadaic acid is used as the tumor promoter and when a pre-initiated mouse model is used, demonstrating that E2F1's tumor suppressive activity is not specific for TPA and occurs at the promotion stage. E2F1 was thought to induce p53-dependent apoptosis through upregulation of p19ARF tumor suppressor, which inhibits mdm2-mediated p53 degradation. Consistent with in vitro studies, the overexpression of E2F1 in mouse skin results in the transcriptional activation of the p19ARF and the accumulation of p53. Inactivation of either p19ARF or p53 restores the sensitivity of K5 E2F1 transgenic mice to DMBA/TPA carcinogenesis, demonstrating that an intact p19ARF-p53 pathway is necessary for E2F1 to suppress carcinogenesis. Surprisingly, while p53 is required for E2F1 to induce apoptosis in mouse skin, p19ARF is not, and inactivation of p19ARF actually enhances E2F1-induced apoptosis and proliferation in transgenic epidermis. This indicates that ARF is important for E2F1-induced tumor suppression but not apoptosis. Senescence is another potential mechanism of tumor suppression that involves p53 and p19ARF. K5 E2F1 transgenic mice initiated with DMBA and treated with TPA show an increased number of senescence cells in their epidermis. These experiments demonstrate that E2F1's unique tumor suppressive activity in two-stage skin carcinogenesis can be genetically separated from E2F1-induced apoptosis and suggest that senescence utilizing the p19ARF-p53 pathway plays a role in tumor suppression by E2F1. ^
Resumo:
Mixture modeling is commonly used to model categorical latent variables that represent subpopulations in which population membership is unknown but can be inferred from the data. In relatively recent years, the potential of finite mixture models has been applied in time-to-event data. However, the commonly used survival mixture model assumes that the effects of the covariates involved in failure times differ across latent classes, but the covariate distribution is homogeneous. The aim of this dissertation is to develop a method to examine time-to-event data in the presence of unobserved heterogeneity under a framework of mixture modeling. A joint model is developed to incorporate the latent survival trajectory along with the observed information for the joint analysis of a time-to-event variable, its discrete and continuous covariates, and a latent class variable. It is assumed that the effects of covariates on survival times and the distribution of covariates vary across different latent classes. The unobservable survival trajectories are identified through estimating the probability that a subject belongs to a particular class based on observed information. We applied this method to a Hodgkin lymphoma study with long-term follow-up and observed four distinct latent classes in terms of long-term survival and distributions of prognostic factors. Our results from simulation studies and from the Hodgkin lymphoma study demonstrated the superiority of our joint model compared with the conventional survival model. This flexible inference method provides more accurate estimation and accommodates unobservable heterogeneity among individuals while taking involved interactions between covariates into consideration.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
Compromised blood-spinal cord barrier (BSCB) is a factor in the outcome following traumatic spinal cord injury (SCI). Vascular endothelial growth factor (VEGF) is a potent stimulator of angiogenesis and vascular permeability. The role of VEGF in SCI is controversial. Relatively little is known about the spatial and temporal changes in the BSCB permeability following administration of VEGF in experimental SCI. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) studies were performed to noninvasively follow spatial and temporal changes in the BSCB permeability following acute administration of VEGF in experimental SCI over a post-injury period of 56 days. The DCE-MRI data was analyzed using a two-compartment pharmacokinetic model. Animals were assessed for open field locomotion using the Basso-Beattie-Bresnahan score. These studies demonstrate that the BSCB permeability was greater at all time points in the VEGF-treated animals compared to saline controls, most significantly in the epicenter region of injury. Although a significant temporal reduction in the BSCB permeability was observed in the VEGF-treated animals, BSCB permeability remained elevated even during the chronic phase. VEGF treatment resulted in earlier improvement in locomotor ability during the chronic phase of SCI. This study suggests a beneficial role of acutely administered VEGF in hastening neurobehavioral recovery after SCI.
Resumo:
SRI is unique among known photoreceptors in that it produces opposite signals depending on the color of light stimuli. Absorption of orange light (587 nm) triggers an attractant response by the cell, whereas absorption of orange light followed by near-UV light (373 run) triggers a repellent response. Using behavioral mutants that exhibit aberrant color-sensing ability, we tested a two-conformation equilibrium model, using FRET and EPR spectroscopy. The essence of the model applied to SRI-HtrI is that the complex exists in a metastable two-conformer equilibrium which is shifted in one direction by orange light absorption (producing an attractant signal) and in the opposite direction by a second UV-violet photon (producing a repellent signal). First, by FRET we found that the E-F cytoplasmic loop of SRI moves toward the RAMP domain of the HtrI transducer during the formation of the orange-light activated signaling state of the complex. This is the first localization of a change in the physical relationship between the receptor and transducer subunits of the complex and provides a structural property of the two proposed conformers that we can monitor. Second, EPR spectra of a spin label probe at this cytoplasmic position showed shifts in the dark in the mutants toward shorter or longer EF loop-RAMP distances, explaining their behavior in terms of their mutations causing pre-stimulus shifts into one or the other conformer. ^ Next, we applied a novel electrophysiological method for monitoring the directionality of proton movement during photoactivation of SRI, to investigate the process of proton transfer in the photoactive site from the chromophore to proton acceptors on both the wildtype and aberrant color-response mutants. We observed an unexpected and critical difference in the two signaling conformations of the SRI-HtrI complex. The finding is that the vectoriality (i.e. movement away or toward the cytoplasm) of the light-induced proton transfer from the chromophore to the protein is opposite in formation of the two conformations. Retinylidene proton transfer is a common critical process in rhodopsins and these results are the first to show differences in vectoriality in a rhodopsin receptor, and to demonstrate functional importance of the direction of proton transfer. ^
Resumo:
Objectives. To investigate procedural gender equity by assessing predisposing, enabling and need predictors of gender differences in annual medical expenditures and utilization among hypertensive individuals in the U.S. Also, to estimate and compare lifetime medical expenditures among hypertensive men and women in the U.S. ^ Data source. 2001-2004 the Medical Expenditure Panel Survey (MEPS);1986-2000 National Health Interview Survey (NHIS) and National Health Interview Survey linked to mortality in the National Death Index through 2002 (2002 NHIS-NDI). ^ Study design. We estimated total medical expenditure using four equations regression model, specific medical expenditures using two equations regression model and utilization using negative binomial regression model. Procedural equity was assessed by applying the Aday et al. theoretical framework. Expenditures were estimated in 2004 dollars. We estimated hypertension-attributable medical expenditure and utilization among men and women. ^ To estimate lifetime expenditures from ages 20 to 85+, we estimated medical expenditures with cross-sectional data and survival with prospective data. The four equations regression model were used to estimate average annual medical expenditures defined as sum of inpatient stay, emergency room visits, outpatient visits, office based visits, and prescription drugs expenditures. Life tables were used to estimate the distribution of life time medical expenditures for hypertensive men and women at different age and factors such as disease incidence, medical technology and health care cost were assumed to be fixed. Both total and hypertension attributable expenditures among men and women were estimated. ^ Data collection. We used the 2001-2004 MEPS household component and medical condition files; the NHIS person and condition files from 1986-1996 and 1997-2000 sample adult files were used; and the 1986-2000 NHIS that were linked to mortality in the 2002 NHIS-NDI. ^ Principal findings. Hypertensive men had significantly less utilization for most measures after controlling predisposing, enabling and need factors than hypertensive women. Similarly, hypertensive men had less prescription drug (-9.3%), office based (-7.2%) and total medical (-4.5%) expenditures than hypertensive women. However, men had more hypertension-attributable medical expenditures and utilization than women. ^ Expected total lifetime expenditure for average life table individuals at age 20, was $188,300 for hypertensive men and $254,910 for hypertensive women. But the lifetime expenditure that could be attributed to hypertension was $88,033 for men and $40,960 for women. ^ Conclusion. Hypertensive women had more utilization and expenditure for most measures than hypertensive men, possibly indicating procedural inequity. However, relatively higher hypertension-attributable health care of men shows more utilization of resources to treat hypertension related diseases among men than women. Similar results were reported in lifetime analyses.^ Key words: gender, medical expenditures, utilization, hypertension-attributable, lifetime expenditure ^
Resumo:
Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. Many recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. The current study incorporated gene network information into gene-based analysis of GWAS data for Crohn's disease (CD). The purpose was to develop statistical models to boost the power of identifying disease-associated genes and gene subnetworks by maximizing the use of existing biological knowledge from multiple sources. The results revealed that Markov random field (MRF) based mixture model incorporating direct neighborhood information from a single gene network is not efficient in identifying CD-related genes based on the GWAS data. The incorporation of solely direct neighborhood information might lead to the low efficiency of these models. Alternative MRF models looking beyond direct neighboring information are necessary to be developed in the future for the purpose of this study.^
Resumo:
Essential biological processes are governed by organized, dynamic interactions between multiple biomolecular systems. Complexes are thus formed to enable the biological function and get dissembled as the process is completed. Examples of such processes include the translation of the messenger RNA into protein by the ribosome, the folding of proteins by chaperonins or the entry of viruses in host cells. Understanding these fundamental processes by characterizing the molecular mechanisms that enable then, would allow the (better) design of therapies and drugs. Such molecular mechanisms may be revealed trough the structural elucidation of the biomolecular assemblies at the core of these processes. Various experimental techniques may be applied to investigate the molecular architecture of biomolecular assemblies. High-resolution techniques, such as X-ray crystallography, may solve the atomic structure of the system, but are typically constrained to biomolecules of reduced flexibility and dimensions. In particular, X-ray crystallography requires the sample to form a three dimensional (3D) crystal lattice which is technically di‑cult, if not impossible, to obtain, especially for large, dynamic systems. Often these techniques solve the structure of the different constituent components within the assembly, but encounter difficulties when investigating the entire system. On the other hand, imaging techniques, such as cryo-electron microscopy (cryo-EM), are able to depict large systems in near-native environment, without requiring the formation of crystals. The structures solved by cryo-EM cover a wide range of resolutions, from very low level of detail where only the overall shape of the system is visible, to high-resolution that approach, but not yet reach, atomic level of detail. In this dissertation, several modeling methods are introduced to either integrate cryo-EM datasets with structural data from X-ray crystallography, or to directly interpret the cryo-EM reconstruction. Such computational techniques were developed with the goal of creating an atomic model for the cryo-EM data. The low-resolution reconstructions lack the level of detail to permit a direct atomic interpretation, i.e. one cannot reliably locate the atoms or amino-acid residues within the structure obtained by cryo-EM. Thereby one needs to consider additional information, for example, structural data from other sources such as X-ray crystallography, in order to enable such a high-resolution interpretation. Modeling techniques are thus developed to integrate the structural data from the different biophysical sources, examples including the work described in the manuscript I and II of this dissertation. At intermediate and high-resolution, cryo-EM reconstructions depict consistent 3D folds such as tubular features which in general correspond to alpha-helices. Such features can be annotated and later on used to build the atomic model of the system, see manuscript III as alternative. Three manuscripts are presented as part of the PhD dissertation, each introducing a computational technique that facilitates the interpretation of cryo-EM reconstructions. The first manuscript is an application paper that describes a heuristics to generate the atomic model for the protein envelope of the Rift Valley fever virus. The second manuscript introduces the evolutionary tabu search strategies to enable the integration of multiple component atomic structures with the cryo-EM map of their assembly. Finally, the third manuscript develops further the latter technique and apply it to annotate consistent 3D patterns in intermediate-resolution cryo-EM reconstructions. The first manuscript, titled An assembly model for Rift Valley fever virus, was submitted for publication in the Journal of Molecular Biology. The cryo-EM structure of the Rift Valley fever virus was previously solved at 27Å-resolution by Dr. Freiberg and collaborators. Such reconstruction shows the overall shape of the virus envelope, yet the reduced level of detail prevents the direct atomic interpretation. High-resolution structures are not yet available for the entire virus nor for the two different component glycoproteins that form its envelope. However, homology models may be generated for these glycoproteins based on similar structures that are available at atomic resolutions. The manuscript presents the steps required to identify an atomic model of the entire virus envelope, based on the low-resolution cryo-EM map of the envelope and the homology models of the two glycoproteins. Starting with the results of the exhaustive search to place the two glycoproteins, the model is built iterative by running multiple multi-body refinements to hierarchically generate models for the different regions of the envelope. The generated atomic model is supported by prior knowledge regarding virus biology and contains valuable information about the molecular architecture of the system. It provides the basis for further investigations seeking to reveal different processes in which the virus is involved such as assembly or fusion. The second manuscript was recently published in the of Journal of Structural Biology (doi:10.1016/j.jsb.2009.12.028) under the title Evolutionary tabu search strategies for the simultaneous registration of multiple atomic structures in cryo-EM reconstructions. This manuscript introduces the evolutionary tabu search strategies applied to enable a multi-body registration. This technique is a hybrid approach that combines a genetic algorithm with a tabu search strategy to promote the proper exploration of the high-dimensional search space. Similar to the Rift Valley fever virus, it is common that the structure of a large multi-component assembly is available at low-resolution from cryo-EM, while high-resolution structures are solved for the different components but lack for the entire system. Evolutionary tabu search strategies enable the building of an atomic model for the entire system by considering simultaneously the different components. Such registration indirectly introduces spatial constrains as all components need to be placed within the assembly, enabling the proper docked in the low-resolution map of the entire assembly. Along with the method description, the manuscript covers the validation, presenting the benefit of the technique in both synthetic and experimental test cases. Such approach successfully docked multiple components up to resolutions of 40Å. The third manuscript is entitled Evolutionary Bidirectional Expansion for the Annotation of Alpha Helices in Electron Cryo-Microscopy Reconstructions and was submitted for publication in the Journal of Structural Biology. The modeling approach described in this manuscript applies the evolutionary tabu search strategies in combination with the bidirectional expansion to annotate secondary structure elements in intermediate resolution cryo-EM reconstructions. In particular, secondary structure elements such as alpha helices show consistent patterns in cryo-EM data, and are visible as rod-like patterns of high density. The evolutionary tabu search strategy is applied to identify the placement of the different alpha helices, while the bidirectional expansion characterizes their length and curvature. The manuscript presents the validation of the approach at resolutions ranging between 6 and 14Å, a level of detail where alpha helices are visible. Up to resolution of 12 Å, the method measures sensitivities between 70-100% as estimated in experimental test cases, i.e. 70-100% of the alpha-helices were correctly predicted in an automatic manner in the experimental data. The three manuscripts presented in this PhD dissertation cover different computation methods for the integration and interpretation of cryo-EM reconstructions. The methods were developed in the molecular modeling software Sculptor (http://sculptor.biomachina.org) and are available for the scientific community interested in the multi-resolution modeling of cryo-EM data. The work spans a wide range of resolution covering multi-body refinement and registration at low-resolution along with annotation of consistent patterns at high-resolution. Such methods are essential for the modeling of cryo-EM data, and may be applied in other fields where similar spatial problems are encountered, such as medical imaging.
Resumo:
The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^