926 resultados para Bayesian Mixture Model, Cavalieri Method, Trapezoidal Rule
Resumo:
The aim of this study was to determine the most informative sampling time(s) providing a precise prediction of tacrolimus area under the concentration-time curve (AUC). Fifty-four concentration-time profiles of tacrolimus from 31 adult liver transplant recipients were analyzed. Each profile contained 5 tacrolimus whole-blood concentrations (predose and 1, 2, 4, and 6 or 8 hours postdose), measured using liquid chromatography-tandem mass spectrometry. The concentration at 6 hours was interpolated for each profile, and 54 values of AUC(0-6) were calculated using the trapezoidal rule. The best sampling times were then determined using limited sampling strategies and sensitivity analysis. Linear mixed-effects modeling was performed to estimate regression coefficients of equations incorporating each concentration-time point (C0, C1, C2, C4, interpolated C5, and interpolated C6) as a predictor of AUC(0-6). Predictive performance was evaluated by assessment of the mean error (ME) and root mean square error (RMSE). Limited sampling strategy (LSS) equations with C2, C4, and C5 provided similar results for prediction of AUC(0-6) (R-2 = 0.869, 0.844, and 0.832, respectively). These 3 time points were superior to C0 in the prediction of AUC. The ME was similar for all time points; the RMSE was smallest for C2, C4, and C5. The highest sensitivity index was determined to be 4.9 hours postdose at steady state, suggesting that this time point provides the most information about the AUC(0-12). The results from limited sampling strategies and sensitivity analysis supported the use of a single blood sample at 5 hours postdose as a predictor of both AUC(0-6) and AUC(0-12). A jackknife procedure was used to evaluate the predictive performance of the model, and this demonstrated that collecting a sample at 5 hours after dosing could be considered as the optimal sampling time for predicting AUC(0-6).
Resumo:
Adsorption of binary mixtures onto activated carbon Norit R1 for the system nitrogen-methane-carbon dioxide was investigated over the pressure range up to 15 MPa. A new model is proposed to describe the experimental data. It is based on the assumption that an activated carbon can be characterized by the distribution function of elements of adsorption volume (EAV) over the solid-fluid potential. This function may be evaluated from pure component isotherms using the equality of the chemical potentials in the adsorbed phase and in the bulk phase for each EAV. In the case of mixture adsorption a simple combining rule is proposed, which allows determining the adsorbed phase density and its composition in the EAV at given pressure and compositions of the bulk phase. The adsorbed concentration of each adsorbate is the integral of its density over the set of EAV. The comparison with experimental data on binary mixtures has shown that the approach works reasonably well. In the case of high-pressure binary mixture adsorption, when only total amount adsorbed was measured, the proposed model allows reliably determining partial amounts of the adsorbed components. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Time-course experiments with microarrays are often used to study dynamic biological systems and genetic regulatory networks (GRNs) that model how genes influence each other in cell-level development of organisms. The inference for GRNs provides important insights into the fundamental biological processes such as growth and is useful in disease diagnosis and genomic drug design. Due to the experimental design, multilevel data hierarchies are often present in time-course gene expression data. Most existing methods, however, ignore the dependency of the expression measurements over time and the correlation among gene expression profiles. Such independence assumptions violate regulatory interactions and can result in overlooking certain important subject effects and lead to spurious inference for regulatory networks or mechanisms. In this paper, a multilevel mixed-effects model is adopted to incorporate data hierarchies in the analysis of time-course data, where temporal and subject effects are both assumed to be random. The method starts with the clustering of genes by fitting the mixture model within the multilevel random-effects model framework using the expectation-maximization (EM) algorithm. The network of regulatory interactions is then determined by searching for regulatory control elements (activators and inhibitors) shared by the clusters of co-expressed genes, based on a time-lagged correlation coefficients measurement. The method is applied to two real time-course datasets from the budding yeast (Saccharomyces cerevisiae) genome. It is shown that the proposed method provides clusters of cell-cycle regulated genes that are supported by existing gene function annotations, and hence enables inference on regulatory interactions for the genetic network.
Resumo:
It is well known that one of the obstacles to effective forecasting of exchange rates is heteroscedasticity (non-stationary conditional variance). The autoregressive conditional heteroscedastic (ARCH) model and its variants have been used to estimate a time dependent variance for many financial time series. However, such models are essentially linear in form and we can ask whether a non-linear model for variance can improve results just as non-linear models (such as neural networks) for the mean have done. In this paper we consider two neural network models for variance estimation. Mixture Density Networks (Bishop 1994, Nix and Weigend 1994) combine a Multi-Layer Perceptron (MLP) and a mixture model to estimate the conditional data density. They are trained using a maximum likelihood approach. However, it is known that maximum likelihood estimates are biased and lead to a systematic under-estimate of variance. More recently, a Bayesian approach to parameter estimation has been developed (Bishop and Qazaz 1996) that shows promise in removing the maximum likelihood bias. However, up to now, this model has not been used for time series prediction. Here we compare these algorithms with two other models to provide benchmark results: a linear model (from the ARIMA family), and a conventional neural network trained with a sum-of-squares error function (which estimates the conditional mean of the time series with a constant variance noise model). This comparison is carried out on daily exchange rate data for five currencies.
Resumo:
In this paper we present a novel method for emulating a stochastic, or random output, computer model and show its application to a complex rabies model. The method is evaluated both in terms of accuracy and computational efficiency on synthetic data and the rabies model. We address the issue of experimental design and provide empirical evidence on the effectiveness of utilizing replicate model evaluations compared to a space-filling design. We employ the Mahalanobis error measure to validate the heteroscedastic Gaussian process based emulator predictions for both the mean and (co)variance. The emulator allows efficient screening to identify important model inputs and better understanding of the complex behaviour of the rabies model.
Resumo:
In this paper, we propose a speech recognition engine using hybrid model of Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM). Both the models have been trained independently and the respective likelihood values have been considered jointly and input to a decision logic which provides net likelihood as the output. This hybrid model has been compared with the HMM model. Training and testing has been done by using a database of 20 Hindi words spoken by 80 different speakers. Recognition rates achieved by normal HMM are 83.5% and it gets increased to 85% by using the hybrid approach of HMM and GMM.
Resumo:
Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.
Resumo:
Studies assume that socioeconomic status determines individuals’ states of health, but how does health determine socioeconomic status? And how does this association vary depending on contextual differences? To answer this question, our study uses an additive Bayesian Networks model to explain the interrelationships between health and socioeconomic determinants using complex and messy data. This model has been used to find the most probable structure in a network to describe the interdependence of these factors in five European welfare state regimes. The advantage of this study is that it offers a specific picture to describe the complex interrelationship between socioeconomic determinants and health, producing a network that is controlled by socio demographic factors such as gender and age. The present work provides a general framework to describe and understand the complex association between socioeconomic determinants and health.
Resumo:
Estimates of abundance or density are essential for wildlife management and conservation. There are few effective density estimates for the Buff-throated Partridge Tetraophasis szechenyii, a rare and elusive high-mountain Galliform species endemic to western China. In this study, we used the temporary emigration N-mixture model to estimate density of this species, with data acquired from playback point count surveys around a sacred area based on indigenous Tibetan culture of protection of wildlife, in Yajiang County, Sichuan, China, during April–June 2009. Within 84 125-m radius points, we recorded 53 partridge groups during three repeats. The best model indicated that detection probability was described by covariates of vegetation cover type, week of visit, time of day, and weather with weak effects, and a partridge group was present during a sampling period with a constant probability. The abundance component was accounted for by vegetation association. Abundance was substantially higher in rhododendron shrubs, fir-larch forests, mixed spruce-larch-birch forests, and especially oak thickets than in pine forests. The model predicted a density of 5.14 groups/km², which is similar to an estimate of 4.7 – 5.3 groups/km² quantified via an intensive spot-mapping effort. The post-hoc estimate of individual density was 14.44 individuals/km², based on the estimated mean group size of 2.81. We suggest that the method we employed is applicable to estimate densities of Buff-throated Partridges in large areas. Given importance of a mosaic habitat for this species, local logging should be regulated. Despite no effect of the conservation area (sacred) on the abundance of Buff-throated Partridges, we suggest regulations linking the sacred mountain conservation area with the official conservation system because of strong local participation facilitated by sacred mountains in land conservation.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.
Resumo:
An RVE–based stochastic numerical model is used to calculate the permeability of randomly generated porous media at different values of the fiber volume fraction for the case of transverse flow in a unidirectional ply. Analysis of the numerical results shows that the permeability is not normally distributed. With the aim of proposing a new understanding on this particular topic, permeability data are fitted using both a mixture model and a unimodal distribution. Our findings suggest that permeability can be fitted well using a mixture model based on the lognormal and power law distributions. In case of a unimodal distribution, it is found, using the maximum-likelihood estimation method (MLE), that the generalized extreme value (GEV) distribution represents the best fit. Finally, an expression of the permeability as a function of the fiber volume fraction based on the GEV distribution is discussed in light of the previous results.
Resumo:
Este estudio presenta la validación de las observaciones que realizó el programa de observación pesquera llamado Programa Bitácoras de Pesca (PBP) durante el periodo 2005 - 2011 en el área de distribución donde operan las embarcaciones industriales de cerco dedicadas a la pesca del stock norte-centro de la anchoveta peruana (Engraulis ringens). Además, durante ese mismo periodo y área de distribución, se estimó la magnitud del descarte por exceso de captura, descarte de juveniles y la captura incidental de dicha pesquera. Se observaron 3 768 viajes de un total de 302 859, representando un porcentaje de 1.2 %. Los datos del descarte por exceso de captura, descarte de juveniles y captura incidental registrados en los viajes observados, se caracterizaron por presentar un alta proporción de ceros. Para la validación de las observaciones, se realizó un estudio de simulación basado en la metodología de Monte Carlo usando un modelo de distribución binomial negativo. Esta permite inferir sobre el nivel de cobertura óptima y conocer si la información obtenida en el programa de observación es contable. De este análisis, se concluye que los niveles de observación actual se deberían incrementar hasta tener un nivel de cobertura de al menos el 10% del total de viajes que realicen en el año las embarcaciones industriales de cerco dedicadas a la pesca del stock norte-centro de la anchoveta peruana. La estimación del descarte por exceso de captura, descarte de juveniles y captura incidental se realizó mediante tres metodologías: Bootstrap, Modelo General Lineal (GLM) y Modelo Delta. Cada metodología estimó distintas magnitudes con tendencias similares. Las magnitudes estimadas fueron comparadas usando un ANOVA Bayesiano, la cual muestra que hubo escasa evidencia que las magnitudes estimadas del descarte por exceso de captura por metodología sean diferentes, lo mismo se presentó para el caso de la captura incidental, mientras que para el descarte de juveniles mostró que hubieron diferencias sustanciales de ser diferentes. La metodología que cumplió los supuestos y explico la mayor variabilidad de las variables modeladas fue el Modelo Delta, el cual parece ser una mejor alternativa para la estimación, debido a la alta proporción de ceros en los datos. Las estimaciones promedio del descarte por exceso de captura, descarte de juveniles y captura incidental aplicando el Modelo Delta, fueron 252 580, 41 772, 44 823 toneladas respectivamente, que en conjunto representaron el 5.74% de los desembarques. Además, con la magnitud de la estimación del descarte de juveniles, se realizó un ejercicio de proyección de biomasa bajo el escenario hipotético de no mortalidad por pesca y que los individuos juveniles descartados sólo presentaron tallas de 8 y 11 cm., en la cual se obtuvo que la biomasa que no estará disponible a la pesca está entre los 52 mil y 93 mil toneladas.
Resumo:
Environmental impacts of wind energy facilities increasingly cause concern, a central issue being bats and birds killed by rotor blades. Two approaches have been employed to assess collision rates: carcass searches and surveys of animals prone to collisions. Carcass searches can provide an estimate for the actual number of animals being killed but they offer little information on the relation between collision rates and, for example, weather parameters due to the time of death not being precisely known. In contrast, a density index of animals exposed to collision is sufficient to analyse the parameters influencing the collision rate. However, quantification of the collision rate from animal density indices (e.g. acoustic bat activity or bird migration traffic rates) remains difficult. We combine carcass search data with animal density indices in a mixture model to investigate collision rates. In a simulation study we show that the collision rates estimated by our model were at least as precise as conventional estimates based solely on carcass search data. Furthermore, if certain conditions are met, the model can be used to predict the collision rate from density indices alone, without data from carcass searches. This can reduce the time and effort required to estimate collision rates. We applied the model to bat carcass search data obtained at 30 wind turbines in 15 wind facilities in Germany. We used acoustic bat activity and wind speed as predictors for the collision rate. The model estimates correlated well with conventional estimators. Our model can be used to predict the average collision rate. It enables an analysis of the effect of parameters such as rotor diameter or turbine type on the collision rate. The model can also be used in turbine-specific curtailment algorithms that predict the collision rate and reduce this rate with a minimal loss of energy production.
Resumo:
Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física