919 resultados para Naive Bayes classifier
Resumo:
Currently, no available pathological or molecular measures of tumor angiogenesis predict response to antiangiogenic therapies used in clinical practice. Recognizing that tumor endothelial cells (EC) and EC activation and survival signaling are the direct targets of these therapies, we sought to develop an automated platform for quantifying activity of critical signaling pathways and other biological events in EC of patient tumors by histopathology. Computer image analysis of EC in highly heterogeneous human tumors by a statistical classifier trained using examples selected by human experts performed poorly due to subjectivity and selection bias. We hypothesized that the analysis can be optimized by a more active process to aid experts in identifying informative training examples. To test this hypothesis, we incorporated a novel active learning (AL) algorithm into FARSIGHT image analysis software that aids the expert by seeking out informative examples for the operator to label. The resulting FARSIGHT-AL system identified EC with specificity and sensitivity consistently greater than 0.9 and outperformed traditional supervised classification algorithms. The system modeled individual operator preferences and generated reproducible results. Using the results of EC classification, we also quantified proliferation (Ki67) and activity in important signal transduction pathways (MAP kinase, STAT3) in immunostained human clear cell renal cell carcinoma and other tumors. FARSIGHT-AL enables characterization of EC in conventionally preserved human tumors in a more automated process suitable for testing and validating in clinical trials. The results of our study support a unique opportunity for quantifying angiogenesis in a manner that can now be tested for its ability to identify novel predictive and response biomarkers.
Resumo:
BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Resumo:
In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a log-normal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples. Copyright 2012 by the author(s)/owner(s).
Resumo:
Learning multiple tasks across heterogeneous domains is a challenging problem since the feature space may not be the same for different tasks. We assume the data in multiple tasks are generated from a latent common domain via sparse domain transforms and propose a latent probit model (LPM) to jointly learn the domain transforms, and the shared probit classifier in the common domain. To learn meaningful task relatedness and avoid over-fitting in classification, we introduce sparsity in the domain transforms matrices, as well as in the common classifier. We derive theoretical bounds for the estimation error of the classifier in terms of the sparsity of domain transforms. An expectation-maximization algorithm is derived for learning the LPM. The effectiveness of the approach is demonstrated on several real datasets.
Resumo:
Antigenically evolving pathogens such as influenza viruses are difficult to control owing to their ability to evade host immunity by producing immune escape variants. Experimental studies have repeatedly demonstrated that viral immune escape variants emerge more often from immunized hosts than from naive hosts. This empirical relationship between host immune status and within-host immune escape is not fully understood theoretically, nor has its impact on antigenic evolution at the population level been evaluated. Here, we show that this relationship can be understood as a trade-off between the probability that a new antigenic variant is produced and the level of viraemia it reaches within a host. Scaling up this intra-host level trade-off to a simple population level model, we obtain a distribution for variant persistence times that is consistent with influenza A/H3N2 antigenic variant data. At the within-host level, our results show that target cell limitation, or a functional equivalent, provides a parsimonious explanation for how host immune status drives the generation of immune escape mutants. At the population level, our analysis also offers an alternative explanation for the observed tempo of antigenic evolution, namely that the production rate of immune escape variants is driven by the accumulation of herd immunity. Overall, our results suggest that disease control strategies should be further assessed by considering the impact that increased immunity--through vaccination--has on the production of new antigenic variants.
Resumo:
Immune responses are highly energy-dependent processes. Activated T cells increase glucose uptake and aerobic glycolysis to survive and function. Malnutrition and starvation limit nutrients and are associated with immune deficiency and increased susceptibility to infection. Although it is clear that immunity is suppressed in times of nutrient stress, mechanisms that link systemic nutrition to T cell function are poorly understood. We show in this study that fasting leads to persistent defects in T cell activation and metabolism, as T cells from fasted animals had low glucose uptake and decreased ability to produce inflammatory cytokines, even when stimulated in nutrient-rich media. To explore the mechanism of this long-lasting T cell metabolic defect, we examined leptin, an adipokine reduced in fasting that regulates systemic metabolism and promotes effector T cell function. We show that leptin is essential for activated T cells to upregulate glucose uptake and metabolism. This effect was cell intrinsic and specific to activated effector T cells, as naive T cells and regulatory T cells did not require leptin for metabolic regulation. Importantly, either leptin addition to cultured T cells from fasted animals or leptin injections to fasting animals was sufficient to rescue both T cell metabolic and functional defects. Leptin-mediated metabolic regulation was critical, as transgenic expression of the glucose transporter Glut1 rescued cytokine production of T cells from fasted mice. Together, these data demonstrate that induction of T cell metabolism upon activation is dependent on systemic nutritional status, and leptin links adipocytes to metabolically license activated T cells in states of nutritional sufficiency.
Resumo:
We recently developed an approach for testing the accuracy of network inference algorithms by applying them to biologically realistic simulations with known network topology. Here, we seek to determine the degree to which the network topology and data sampling regime influence the ability of our Bayesian network inference algorithm, NETWORKINFERENCE, to recover gene regulatory networks. NETWORKINFERENCE performed well at recovering feedback loops and multiple targets of a regulator with small amounts of data, but required more data to recover multiple regulators of a gene. When collecting the same number of data samples at different intervals from the system, the best recovery was produced by sampling intervals long enough such that sampling covered propagation of regulation through the network but not so long such that intervals missed internal dynamics. These results further elucidate the possibilities and limitations of network inference based on biological data.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
OBJECTIVES: To evaluate the immune reconstitution in HIV-1-infected children in whom highly active antiretroviral therapy (HAART) controlled viral replication and to assess the existence of a relation between the magnitude of this restoration and age. METHODS: All HIV-1-infected children in whom a new HAART decreased plasma viral load below 400 copies/ml after 3 months of therapy were prospectively enrolled in a study of their immune reconstitution. Viral load, lymphocyte phenotyping, determination of CD4+ and CD8+ T cell receptor repertoires and proliferative responses to mitogens and recall antigens were assessed every 3 months during 1 year. RESULTS: Nineteen children were evaluated. Naive and memory CD4+ percentages were already significantly increased after 3 months of HAART. In contrast to memory CD4+ percentages, naive CD4+ percentages continued to rise until 12 months. Age at baseline was inversely correlated with the magnitude of the rise in naive CD4+ cells after 3, 6 and 9 months of therapy but not after 12 months. Although memory and activated CD8+ cells were already decreasing after 3 months, abnormalities of the CD8 T cell receptor repertoire and activation of CD8+ cells persisted at 1 year. HAART increased the response to mitogens as early as 3 months after starting therapy. CONCLUSIONS: In children the recovery of naive CD4+ cells occurs more rapidly if treatment is started at a younger age, but after 1 year of viral replication control, patients of all ages have achieved the same level of restoration. Markers of chronic activation in CD8+ cells persist after 1 year of HAART.
Resumo:
The pragmatics of 'vegetarian' and 'carnivorous' exhibits an asymmetry that we put in evidence by analyzing a newspaper report about vegetarian dog-owners imposing a vegetarian diet on their pets. More fundamental is the problem of partonomy versus containment, for which we attempt a naive but formal analysis applied to ingestion and the food chain, an issue we derive from the same text analyzed. Our formal tools belong in commonsense modelling, a domain of artificial intelligence related to extra-linguistic knowledge and pragmatics. We first provide an interpretation of events analyzed, and express it graphically in a semantic-network related representation, and propose an alternative that we express in terms of a modal logic, avoiding the full representational power of Hayes's "ontology for liquids".
Resumo:
This paper describes an industrial application of case-based reasoning in engineering. The application involves an integration of case-based reasoning (CBR) retrieval techniques with a relational database. The database is specially designed as a repository of experiential knowledge and with the CBR application in mind such as to include qualitative search indices. The application is for an intelligent assistant for design and material engineers in the submarine cable industry. The system consists of three components; a material classifier and a database of experiential knowledge and a CBR system is used to retrieve similar past cases based on component descriptions. Work has shown that an uncommon retrieval technique, hierarchical searching, well represents several search indices and that this techniques aids the implementation of advanced techniques such as context sensitive weights. The system is currently undergoing user testing at the Alcatel Submarine Cables site in Greenwich. Plans are for wider testing and deployment over several sites internationally.
Resumo:
This paper investigates the use of the acoustic emission (AE) monitoring technique for use in identifying the damage mechanisms present in paper associated with its production process. The microscopic structure of paper consists of a random mesh of paper fibres connected by hydrogen bonds. This implies the existence of two damage mechanisms, the failure of a fibre-fibre bond and the failure of a fibre. This paper describes a hybrid mathematical model which couples the mechanics of the mass-spring model to the acoustic wave propagation model for use in generating the acoustic signal emitted by complex structures of paper fibres under strain. The derivation of the mass-spring model can be found in [1,2], with details of the acoustic wave equation found in [3,4]. The numerical implementation of the vibro-acoustic model is discussed in detail with particular emphasis on the damping present in the numerical model. The hybrid model uses an implicit solver which intrinsically introduces artificial damping to the solution. The artificial damping is shown to affect the frequency response of the mass-spring model, therefore certain restrictions on the simulation time step must be enforced so that the model produces physically accurate results. The hybrid mathematical model is used to simulate small fibre networks to provide information on the acoustic response of each damage mechanism. The simulated AEs are then analysed using a continuous wavelet transform (CWT), described in [5], which provides a two dimensional time-frequency representation of the signal. The AEs from the two damage mechanisms show different characteristics in the CWT so that it is possible to define a fibre-fibre bond failure by the criteria listed below. The dominant frequency components of the AE must be at approximately 250 kHz or 750 kHz. The strongest frequency component may be at either approximately 250 kHz or 750 kHz. The duration of the frequency component at approximately 250 kHz is longer than that of the frequency component at approximately 750 kHz. Similarly, the criteria for identifying a fibre failure are given below. The dominant frequency component of the AE must be greater than 800 kHz. The duration of the dominant frequency component must be less than 5.00E-06 seconds. The dominant frequency component must be present at the front of the AE. Essentially, the failure of a fibre-fibre bond produces a low frequency wave and the failure of a fibre produces a high frequency pulse. Using this theoretical criteria, it is now possible to train an intelligent classifier such as the Self-Organising Map (SOM) [6] using the experimental data. First certain features must be extracted from the CWTs of the AEs for use in training the SOM. For this work, each CWT is divided into 200 windows of 5E-06s in duration covering a 100 kHz frequency range. The power ratio for each windows is then calculated and used as a feature. Having extracted the features from the AEs, the SOM can now be trained, but care is required so that the both damage mechanisms are adequately represented in the training set. This is an issue with paper as the failure of the fibre-fibre bonds is the prevalent damage mechanism. Once a suitable training set is found, the SOM can be trained and its performance analysed. For the SOM described in this work, there is a good chance that it will correctly classify the experimental AEs.
Resumo:
Noise is one of the main factors degrading the quality of original multichannel remote sensing data and its presence influences classification efficiency, object detection, etc. Thus, pre-filtering is often used to remove noise and improve the solving of final tasks of multichannel remote sensing. Recent studies indicate that a classical model of additive noise is not adequate enough for images formed by modern multichannel sensors operating in visible and infrared bands. However, this fact is often ignored by researchers designing noise removal methods and algorithms. Because of this, we focus on the classification of multichannel remote sensing images in the case of signal-dependent noise present in component images. Three approaches to filtering of multichannel images for the considered noise model are analysed, all based on discrete cosine transform in blocks. The study is carried out not only in terms of conventional efficiency metrics used in filtering (MSE) but also in terms of multichannel data classification accuracy (probability of correct classification, confusion matrix). The proposed classification system combines the pre-processing stage where a DCT-based filter processes the blocks of the multichannel remote sensing image and the classification stage. Two modern classifiers are employed, radial basis function neural network and support vector machines. Simulations are carried out for three-channel image of Landsat TM sensor. Different cases of learning are considered: using noise-free samples of the test multichannel image, the noisy multichannel image and the pre-filtered one. It is shown that the use of the pre-filtered image for training produces better classification in comparison to the case of learning for the noisy image. It is demonstrated that the best results for both groups of quantitative criteria are provided if a proposed 3D discrete cosine transform filter equipped by variance stabilizing transform is applied. The classification results obtained for data pre-filtered in different ways are in agreement for both considered classifiers. Comparison of classifier performance is carried out as well. The radial basis neural network classifier is less sensitive to noise in original images, but after pre-filtering the performance of both classifiers is approximately the same.
Resumo:
The grading of crushed aggregate is carried out usually by sieving. We describe a new image-based approach to the automatic grading of such materials. The operational problem addressed is where the camera is located directly over a conveyor belt. Our approach characterizes the information content of each image, taking into account relative variation in the pixel data, and resolution scale. In feature space, we find very good class separation using a multidimensional linear classifier. The innovation in this work includes (i) introducing an effective image-based approach into this application area, and (ii) our supervised classification using wavelet entropy-based features.