926 resultados para Bayesian Mixture Model, Cavalieri Method, Trapezoidal Rule


Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we develop set of novel Markov Chain Monte Carlo algorithms for Bayesian smoothing of partially observed non-linear diffusion processes. The sampling algorithms developed herein use a deterministic approximation to the posterior distribution over paths as the proposal distribution for a mixture of an independence and a random walk sampler. The approximating distribution is sampled by simulating an optimized time-dependent linear diffusion process derived from the recently developed variational Gaussian process approximation method. The novel diffusion bridge proposal derived from the variational approximation allows the use of a flexible blocking strategy that further improves mixing, and thus the efficiency, of the sampling algorithms. The algorithms are tested on two diffusion processes: one with double-well potential drift and another with SINE drift. The new algorithm's accuracy and efficiency is compared with state-of-the-art hybrid Monte Carlo based path sampling. It is shown that in practical, finite sample applications the algorithm is accurate except in the presence of large observation errors and low to a multi-modal structure in the posterior distribution over paths. More importantly, the variational approximation assisted sampling algorithm outperforms hybrid Monte Carlo in terms of computational efficiency, except when the diffusion process is densely observed with small errors in which case both algorithms are equally efficient. © 2011 Springer-Verlag.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Bayesian adaptive methods have been extensively used in psychophysics to estimate the point at which performance on a task attains arbitrary percentage levels, although the statistical properties of these estimators have never been assessed. We used simulation techniques to determine the small-sample properties of Bayesian estimators of arbitrary performance points, specifically addressing the issues of bias and precision as a function of the target percentage level. The study covered three major types of psychophysical task (yes-no detection, 2AFC discrimination and 2AFC detection) and explored the entire range of target performance levels allowed for by each task. Other factors included in the study were the form and parameters of the actual psychometric function Psi, the form and parameters of the model function M assumed in the Bayesian method, and the location of Psi within the parameter space. Our results indicate that Bayesian adaptive methods render unbiased estimators of any arbitrary point on psi only when M=Psi, and otherwise they yield bias whose magnitude can be considerable as the target level moves away from the midpoint of the range of Psi. The standard error of the estimator also increases as the target level approaches extreme values whether or not M=Psi. Contrary to widespread belief, neither the performance level at which bias is null nor that at which standard error is minimal can be predicted by the sweat factor. A closed-form expression nevertheless gives a reasonable fit to data describing the dependence of standard error on number of trials and target level, which allows determination of the number of trials that must be administered to obtain estimates with prescribed precision.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work studied the drying kinetics of the organic fractions of municipal solid waste (MSW) samples with different initial moisture contents and presented a new method for determination of drying kinetic parameters. A series of drying experiments at different temperatures were performed by using a thermogravimetric technique. Based on the modified Page drying model and the general pattern search method, a new drying kinetic method was developed using multiple isothermal drying curves simultaneously. The new method fitted the experimental data more accurately than the traditional method. Drying kinetic behaviors under extrapolated conditions were also predicted and validated. The new method indicated that the drying activation energies for the samples with initial moisture contents of 31.1 and 17.2 % on wet basis were 25.97 and 24.73 kJ mol−1. These results are useful for drying process simulation and industrial dryer design. This new method can be also applied to determine the drying parameters of other materials with high reliability.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The cyclic phosphazene trimers [N3P3(OC6H5)5OC5H4N·Ti(Cp)2Cl][PF6] (3), [N3P3(OC6H4CH2CN·Ti(Cp)2Cl)6][PF6]6 (4), [N3P3(OC6H4-But)5(OC6H4CH2CN·Ti(Cp)2Cl)][PF6] (5), [N3P3(OC6H5)5C6H4CH2CN·Ru(Cp)(PPh3)2][PF6] (6), [N3P3(OC6H5)5C6H4CH2CN·Fe(Cp)(dppe)][PF6] (7) and N3P3(OC6H5)5OC5H4N·W(CO)5 (8) were prepared and characterized. As a model, the simple compounds [HOC5H5N·Ti(Cp)2Cl]PF6 (1) and [HOC6H4CH2CN·Ti(Cp)2Cl]PF6 (2) were also prepared and characterized. Pyrolysis of the organometallic cyclic trimers in air yields metallic nanostructured materials, which according to transmission and scanning electron microscopy (TEM/SEM), energy-dispersive X-ray microanalysis (EDX), and IR data, can be formulated as either a metal oxide, metal pyrophosphate or a mixture in some cases, depending on the nature and quantity of the metal, characteristics of the organic spacer and the auxiliary substituent attached to the phosphorus cycle. Atomic force microscopy (AFM) data indicate the formation of small island and striate nanostructures. A plausible formation mechanism which involves the formation of a cyclomatrix is proposed, and the pyrolysis of the organometallic cyclic phosphazene polymer as a new and general method for obtaining metallic nanostructured materials is discussed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.

Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Testing for differences within data sets is an important issue across various applications. Our work is primarily motivated by the analysis of microbiomial composition, which has been increasingly relevant and important with the rise of DNA sequencing. We first review classical frequentist tests that are commonly used in tackling such problems. We then propose a Bayesian Dirichlet-multinomial framework for modeling the metagenomic data and for testing underlying differences between the samples. A parametric Dirichlet-multinomial model uses an intuitive hierarchical structure that allows for flexibility in characterizing both the within-group variation and the cross-group difference and provides very interpretable parameters. A computational method for evaluating the marginal likelihoods under the null and alternative hypotheses is also given. Through simulations, we show that our Bayesian model performs competitively against frequentist counterparts. We illustrate the method through analyzing metagenomic applications using the Human Microbiome Project data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.

This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.

The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new

individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the

refreshment sample itself. As we illustrate, nonignorable unit nonresponse

can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse

in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.

The second method incorporates informative prior beliefs about

marginal probabilities into Bayesian latent class models for categorical data.

The basic idea is to append synthetic observations to the original data such that

(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.

We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.

The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A modified UNIFAC–VISCO group contribution method was developed for the correlation and prediction of viscosity of ionic liquids as a function of temperature at 0.1 MPa. In this original approach, cations and anions were regarded as peculiar molecular groups. The significance of this approach comes from the ability to calculate the viscosity of mixtures of ionic liquids as well as pure ionic liquids. Binary interaction parameters for selected cations and anions were determined by fitting the experimental viscosity data available in literature for selected ionic liquids. The temperature dependence on the viscosity of the cations and anions were fitted to a Vogel–Fulcher–Tamman behavior. Binary interaction parameters and VFT type fitting parameters were then used to determine the viscosity of pure and mixtures of ionic liquids with different combinations of cations and anions to ensure the validity of the prediction method. Consequently, the viscosities of binary ionic liquid mixtures were then calculated by using this prediction method. In this work, the viscosity data of pure ionic liquids and of binary mixtures of ionic liquids are successfully calculated from 293.15 K to 363.15 K at 0.1 MPa. All calculated viscosity data showed excellent agreement with experimental data with a relative absolute average deviation lower than 1.7%.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A novel surrogate model is proposed in lieu of Computational Fluid Dynamics (CFD) solvers, for fast nonlinear aerodynamic and aeroelastic modeling. A nonlinear function is identified on selected interpolation points by
a discrete empirical interpolation method (DEIM). The flow field is then reconstructed using a least square approximation of the flow modes extracted
by proper orthogonal decomposition (POD). The aeroelastic reduce order
model (ROM) is completed by introducing a nonlinear mapping function
between displacements and the DEIM points. The proposed model is investigated to predict the aerodynamic forces due to forced motions using
a N ACA 0012 airfoil undergoing a prescribed pitching oscillation. To investigate aeroelastic problems at transonic conditions, a pitch/plunge airfoil
and a cropped delta wing aeroelastic models are built using linear structural models. The presence of shock-waves triggers the appearance of limit
cycle oscillations (LCO), which the model is able to predict. For all cases
tested, the new ROM shows the ability to replicate the nonlinear aerodynamic forces, structural displacements and reconstruct the complete flow
field with sufficient accuracy at a fraction of the cost of full order CFD
model.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A large eddy simulation is performed to study the deflagration to detonation transition phenomenon in an obstructed channel containing premixed stoichiometric hydrogen–air mixture. Two-dimensional filtered reactive Navier–Stokes equations are solved utilizing the artificially thickened flame approach (ATF) for modeling sub-grid scale combustion. To include the effect of induction time, a 27-step detailed mechanism is utilized along with an in situ adaptive tabulation (ISAT) method to reduce the computational cost due to the detailed chemistry. The results show that in the slow flame propagation regime, the flame–vortex interaction and the resulting flame folding and wrinkling are the main mechanisms for the increase of the flame surface and consequently acceleration of the flame. Furthermore, at high speed, the major mechanisms responsible for flame propagation are repeated reflected shock–flame interactions and the resulting baroclinic vorticity. These interactions intensify the rate of heat release and maintain the turbulence and flame speed at high level. During the flame acceleration, it is seen that the turbulent flame enters the ‘thickened reaction zones’ regime. Therefore, it is necessary to utilize the chemistry based combustion model with detailed chemical kinetics to properly capture the salient features of the fast deflagration propagation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The viscosity of ionic liquids (ILs) has been modeled as a function of temperature and at atmospheric pressure using a new method based on the UNIFAC–VISCO method. This model extends the calculations previously reported by our group (see Zhao et al. J. Chem. Eng. Data 2016, 61, 2160–2169) which used 154 experimental viscosity data points of 25 ionic liquids for regression of a set of binary interaction parameters and ion Vogel–Fulcher–Tammann (VFT) parameters. Discrepancies in the experimental data of the same IL affect the quality of the correlation and thus the development of the predictive method. In this work, mathematical gnostics was used to analyze the experimental data from different sources and recommend one set of reliable data for each IL. These recommended data (totally 819 data points) for 70 ILs were correlated using this model to obtain an extended set of binary interaction parameters and ion VFT parameters, with a regression accuracy of 1.4%. In addition, 966 experimental viscosity data points for 11 binary mixtures of ILs were collected from literature to establish this model. All the binary data consist of 128 training data points used for the optimization of binary interaction parameters and 838 test data points used for the comparison of the pure evaluated values. The relative average absolute deviation (RAAD) for training and test is 2.9% and 3.9%, respectively.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A novel surrogate model is proposed in lieu of computational fluid dynamic (CFD) code for fast nonlinear aerodynamic modeling. First, a nonlinear function is identified on selected interpolation points defined by discrete empirical interpolation method (DEIM). The flow field is then reconstructed by a least square approximation of flow modes extracted by proper orthogonal decomposition (POD). The proposed model is applied in the prediction of limit cycle oscillation for a plunge/pitch airfoil and a delta wing with linear structural model, results are validate against a time accurate CFD-FEM code. The results show the model is able to replicate the aerodynamic forces and flow fields with sufficient accuracy while requiring a fraction of CFD cost.