894 resultados para Gaussian mixture model
Resumo:
Cigarette smoke is a complex mixture of more than 4000 hazardous chemicals including the carcinogenic benzopyrenes. Nicotine, the most potent component of tobacco, is responsible for the addictive nature of cigarettes and is a major component of e-cigarette cartridges. Our study aims to investigate the toxicity of nicotine with special emphasis on the replacement of animals. Furthermore, we intend to study the effect of nicotine, cigarette smoke and e-cigarette vapours on human airways. In our current work, the BEAS 2B human bronchial epithelial cell line was used to analyse the effect of nicotine in isolation, on cell viability. Concentrations of nicotine from 1.1µM to 75µM were added to 5x105 cells per well in a 96 well plate and incubated for 24 hours. Cell titre blue results showed that all the nicotine treated cells were more metabolically active than the control wells (cells alone). These data indicate that, under these conditions, nicotine does not affect cell viability and in fact, suggests that there is a stimulatory effect of nicotine on metabolism. We are now furthering this finding by investigating the pro-inflammatory response of these cells to nicotine by measuring cytokine secretion via ELISA. Further work includes analysing nicotine exposure at different time points and on other epithelial cells lines like Calu-3.
Resumo:
The paper presents the simulation of the pyrolysis vapors condensation process using an Eulerian approach. The condensable volatiles produced by the fast pyrolysis of biomass in a 100 g/h bubbling fluidized bed reactor are condensed in a water cooled condenser. The vapors enter the condenser at 500 °C, and the water temperature is 15 °C. The properties of the vapor phase are calculated according to the mole fraction of its individual compounds. The saturated vapor pressure is calculated for the vapor mixture using a corresponding states correlation and assuming that the mixture of the condensable compounds behave as a pure fluid. Fluent 6.3 has been used as the simulation platform, while the condensation model has been incorporated to the main code using an external user defined function. © 2011 American Chemical Society.
Resumo:
Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.
Resumo:
In questo studio, un multi-model ensemble è stato implementato e verificato, seguendo una delle priorità di ricerca del Subseasonal to Seasonal Prediction Project (S2S). Una regressione lineare è stata applicata ad un insieme di previsioni di ensemble su date passate, prodotte dai centri di previsione mensile del CNR-ISAC e ECMWF-IFS. Ognuna di queste contiene un membro di controllo e quattro elementi perturbati. Le variabili scelte per l'analisi sono l'altezza geopotenziale a 500 hPa, la temperatura a 850 hPa e la temperatura a 2 metri, la griglia spaziale ha risoluzione 1 ◦ × 1 ◦ lat-lon e sono stati utilizzati gli inverni dal 1990 al 2010. Le rianalisi di ERA-Interim sono utilizzate sia per realizzare la regressione, sia nella validazione dei risultati, mediante stimatori nonprobabilistici come lo scarto quadratico medio (RMSE) e la correlazione delle anomalie. Successivamente, tecniche di Model Output Statistics (MOS) e Direct Model Output (DMO) sono applicate al multi-model ensemble per ottenere previsioni probabilistiche per la media settimanale delle anomalie di temperatura a 2 metri. I metodi MOS utilizzati sono la regressione logistica e la regressione Gaussiana non-omogenea, mentre quelli DMO sono il democratic voting e il Tukey plotting position. Queste tecniche sono applicate anche ai singoli modelli in modo da effettuare confronti basati su stimatori probabilistici, come il ranked probability skill score, il discrete ranked probability skill score e il reliability diagram. Entrambe le tipologie di stimatori mostrano come il multi-model abbia migliori performance rispetto ai singoli modelli. Inoltre, i valori più alti di stimatori probabilistici sono ottenuti usando una regressione logistica sulla sola media di ensemble. Applicando la regressione a dataset di dimensione ridotta, abbiamo realizzato una curva di apprendimento che mostra come un aumento del numero di date nella fase di addestramento non produrrebbe ulteriori miglioramenti.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Diffuse intrinsic pontine glioma (DIPG) is a rare and incurable brain tumor that arises predominately in children and involves the pons, a structure that along with the midbrain and medulla makes up the brainstem. We have previously developed genetically engineered mouse models of brainstem glioma using the RCAS/Tv-a system by targeting PDGF-B overexpression, p53 loss, and H3.3K27M mutation to Nestin-expressing brainstem progenitor cells of the neonatal mouse. Here we describe a novel mouse model targeting these same genetic alterations to Pax3-expressing cells, which in the neonatal mouse pons consist of a Pax3+/Nestin+/Sox2+ population lining the fourth ventricle and a Pax3+/NeuN+ parenchymal population. Injection of RCAS-PDGF-B into the brainstem of Pax3-Tv-a mice at postnatal day 3 results in 40% of mice developing asymptomatic low-grade glioma. A mixture of low- and high-grade glioma results from injection of Pax3-Tv-a;p53(fl/fl) mice with RCAS-PDGF-B and RCAS-Cre, with or without RCAS-H3.3K27M. These tumors are Ki67+, Nestin+, Olig2+, and largely GFAP- and can arise anywhere within the brainstem, including the classic DIPG location of the ventral pons. Expression of the H3.3K27M mutation reduces overall H3K27me3 as compared with tumors without the mutation, similar to what has been previously shown in human and mouse tumors. Thus, we have generated a novel genetically engineered mouse model of DIPG, which faithfully recapitulates the human disease and represents a novel platform with which to study the biology and treatment of this deadly disease.
Resumo:
Objective
Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism.
Method
The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model.
Result
Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video.
Conclusion
This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.
Resumo:
A large eddy simulation is performed to study the deflagration to detonation transition phenomenon in an obstructed channel containing premixed stoichiometric hydrogen–air mixture. Two-dimensional filtered reactive Navier–Stokes equations are solved utilizing the artificially thickened flame approach (ATF) for modeling sub-grid scale combustion. To include the effect of induction time, a 27-step detailed mechanism is utilized along with an in situ adaptive tabulation (ISAT) method to reduce the computational cost due to the detailed chemistry. The results show that in the slow flame propagation regime, the flame–vortex interaction and the resulting flame folding and wrinkling are the main mechanisms for the increase of the flame surface and consequently acceleration of the flame. Furthermore, at high speed, the major mechanisms responsible for flame propagation are repeated reflected shock–flame interactions and the resulting baroclinic vorticity. These interactions intensify the rate of heat release and maintain the turbulence and flame speed at high level. During the flame acceleration, it is seen that the turbulent flame enters the ‘thickened reaction zones’ regime. Therefore, it is necessary to utilize the chemistry based combustion model with detailed chemical kinetics to properly capture the salient features of the fast deflagration propagation.
Resumo:
Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization–maximization (MM) algorithm with a Nelder–Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.
Resumo:
This thesis considers a three- dimensional numerical model based on 3-D Navier— Stokes and continuity equations involving various wind speeds (North west), water surface levels, horizontal shier stresses, eddy viscosity, densities of oil and gas condensate- water mixture flows. The model is used to simulate the prediction of the surface movement of oil and gas condensate slicks from spill accident in the north coasts of Persian Gulf.
Resumo:
The Li-ion rechargeable battery (LIB) is widely used as an energy storage device, but has significant limitations in battery cycle life and safety. During initial charging, decomposition of the ethylene carbonate (EC)-based electrolytes of the LIB leads to the formation of a passivating layer on the anode known as the solid electrolyte interphase (SEI). The formation of an SEI has great impact on the cycle life and safety of LIB, yet mechanistic aspects of SEI formation are not fully understood. In this dissertation, two surface science model systems have been created under ultra-high vacuum (UHV) to probe the very initial stage of SEI formation at the model carbon anode surfaces of LIB. The first model system, Model System I, is an lithium-carbonate electrolyte/graphite C(0001) system. I have developed a temperature programmed desorption/temperature programmed reaction spectroscopy (TPD/TPRS) instrument as part of my dissertation to study Model System I in quantitative detail. The binding strengths and film growth mechanisms of key electrolyte molecules on model carbon anode surfaces with varying extents of lithiation were measured by TPD. TPRS was further used to track the gases evolved from different reduction products in the early-stage SEI formation. The branching ratio of multiple reaction pathways was quantified for the first time and determined to be 70.% organolithium products vs. 30% inorganic lithium product. The obtained branching ratio provides important information on the distribution of lithium salts that form at the very onset of SEI formation. One of the key reduction products formed from EC in early-stage SEI formation is lithium ethylene dicarbonate (LEDC). Despite intensive studies, the LEDC structure in either the bulk or thin-film (SEI) form is unknown. To enable structural study, pure LEDC was synthesized and subject to synchrotron X-ray diffraction measurements (bulk material) and STM measurements (deposited films). To enable studies of LEDC thin films, Model System II, a lithium ethylene dicarbonate (LEDC)-dimethylformamide (DMF)/Ag(111) system was created by a solution microaerosol deposition technique. Produced films were then imaged by ultra-high vacuum scanning tunneling microscopy (UHV-STM). As a control, the dimethylformamide (DMF)-Ag(111) system was first prepared and its complex 2D phase behavior was mapped out as a function of coverage. The evolution of three distinct monolayer phases of DMF was observed with increasing surface pressure — a 2D gas phase, an ordered DMF phase, and an ordered Ag(DMF)2 complex phase. The addition of LEDC to this mixture, seeded the nucleation of the ordered DMF islands at lower surface pressures (DMF coverages), and was interpreted through nucleation theory. A structural model of the nucleation seed was proposed, and the implication of ionic SEI products, such as LEDC, in early-stage SEI formation was discussed.
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Departamento de Administração, Programa de Pós-graduação em Administração, 2016.