3 resultados para Analysis Model

em Duke University


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Multi-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Intraoperative assessment of surgical margins is critical to ensuring residual tumor does not remain in a patient. Previously, we developed a fluorescence structured illumination microscope (SIM) system with a single-shot field of view (FOV) of 2.1 × 1.6 mm (3.4 mm2) and sub-cellular resolution (4.4 μm). The goal of this study was to test the utility of this technology for the detection of residual disease in a genetically engineered mouse model of sarcoma. Primary soft tissue sarcomas were generated in the hindlimb and after the tumor was surgically removed, the relevant margin was stained with acridine orange (AO), a vital stain that brightly stains cell nuclei and fibrous tissues. The tissues were imaged with the SIM system with the primary goal of visualizing fluorescent features from tumor nuclei. Given the heterogeneity of the background tissue (presence of adipose tissue and muscle), an algorithm known as maximally stable extremal regions (MSER) was optimized and applied to the images to specifically segment nuclear features. A logistic regression model was used to classify a tissue site as positive or negative by calculating area fraction and shape of the segmented features that were present and the resulting receiver operator curve (ROC) was generated by varying the probability threshold. Based on the ROC curves, the model was able to classify tumor and normal tissue with 77% sensitivity and 81% specificity (Youden's index). For an unbiased measure of the model performance, it was applied to a separate validation dataset that resulted in 73% sensitivity and 80% specificity. When this approach was applied to representative whole margins, for a tumor probability threshold of 50%, only 1.2% of all regions from the negative margin exceeded this threshold, while over 14.8% of all regions from the positive margin exceeded this threshold.