205 resultados para Genomic Selection
em Cambridge University Engineering Department Publications Database
Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data
Resumo:
We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/
Resumo:
Variable selection for regression is a classical statistical problem, motivated by concerns that too large a number of covariates may bring about overfitting and unnecessarily high measurement costs. Novel difficulties arise in streaming contexts, where the correlation structure of the process may be drifting, in which case it must be constantly tracked so that selections may be revised accordingly. A particularly interesting phenomenon is that non-selected covariates become missing variables, inducing bias on subsequent decisions. This raises an intricate exploration-exploitation tradeoff, whose dependence on the covariance tracking algorithm and the choice of variable selection scheme is too complex to be dealt with analytically. We hence capitalise on the strength of simulations to explore this problem, taking the opportunity to tackle the difficult task of simulating dynamic correlation structures. © 2008 IEEE.
Resumo:
Sensor networks can be naturally represented as graphical models, where the edge set encodes the presence of sparsity in the correlation structure between sensors. Such graphical representations can be valuable for information mining purposes as well as for optimizing bandwidth and battery usage with minimal loss of estimation accuracy. We use a computationally efficient technique for estimating sparse graphical models which fits a sparse linear regression locally at each node of the graph via the Lasso estimator. Using a recently suggested online, temporally adaptive implementation of the Lasso, we propose an algorithm for streaming graphical model selection over sensor networks. With battery consumption minimization applications in mind, we use this algorithm as the basis of an adaptive querying scheme. We discuss implementation issues in the context of environmental monitoring using sensor networks, where the objective is short-term forecasting of local wind direction. The algorithm is tested against real UK weather data and conclusions are drawn about certain tradeoffs inherent in decentralized sensor networks data analysis. © 2010 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Resumo:
We present a stochastic simulation technique for subset selection in time series models, based on the use of indicator variables with the Gibbs sampler within a hierarchical Bayesian framework. As an example, the method is applied to the selection of subset linear AR models, in which only significant lags are included. Joint sampling of the indicators and parameters is found to speed convergence. We discuss the possibility of model mixing where the model is not well determined by the data, and the extension of the approach to include non-linear model terms.
Resumo:
The Vi capsular polysaccharide is a virulence-associated factor expressed by Salmonella enterica serotype Typhi but absent from virtually all other Salmonella serotypes. In order to study this determinant in vivo, we characterised a Vi-positive S. Typhimurium (C5.507 Vi(+)), harbouring the Salmonella pathogenicity island (SPI)-7, which encodes the Vi locus. S. Typhimurium C5.507 Vi(+) colonised and persisted in mice at similar levels compared to the parent strain, S. Typhimurium C5. However, the innate immune response to infection with C5.507 Vi(+) and SGB1, an isogenic derivative not expressing Vi, differed markedly. Infection with C5.507 Vi(+) resulted in a significant reduction in cellular trafficking of innate immune cells, including PMN and NK cells, compared to SGB1 Vi(-) infected animals. C5.507 Vi(+) infection stimulated reduced numbers of TNF-α, MIP-2 and perforin producing cells compared to SGB1 Vi(-). The modulating effect associated with Vi was not observed in MyD88(-/-) and was reduced in TLR4(-/-) mice. The presence of the Vi capsule also correlated with induction of the anti-inflammatory cytokine IL-10 in vivo, a factor that impacted on chemotaxis and the activation of immune cells in vitro.
Modelling and simulation techniques for supporting healthcare decision making: a selection framework