62 resultados para Estimateur de Bayes
em Queensland University of Technology - ePrints Archive
Resumo:
This paper proposes the use of the Bayes Factor to replace the Bayesian Information Criterion (BIC) as a criterion for speaker clustering within a speaker diarization system. The BIC is one of the most popular decision criteria used in speaker diarization systems today. However, it will be shown in this paper that the BIC is only an approximation to the Bayes factor of marginal likelihoods of the data given each hypothesis. This paper uses the Bayes factor directly as a decision criterion for speaker clustering, thus removing the error introduced by the BIC approximation. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, leading to a 14.7% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
In the study of traffic safety, expected crash frequencies across sites are generally estimated via the negative binomial model, assuming time invariant safety. Since the time invariant safety assumption may be invalid, Hauer (1997) proposed a modified empirical Bayes (EB) method. Despite the modification, no attempts have been made to examine the generalisable form of the marginal distribution resulting from the modified EB framework. Because the hyper-parameters needed to apply the modified EB method are not readily available, an assessment is lacking on how accurately the modified EB method estimates safety in the presence of the time variant safety and regression-to-the-mean (RTM) effects. This study derives the closed form marginal distribution, and reveals that the marginal distribution in the modified EB method is equivalent to the negative multinomial (NM) distribution, which is essentially the same as the likelihood function used in the random effects Poisson model. As a result, this study shows that the gamma posterior distribution from the multivariate Poisson-gamma mixture can be estimated using the NM model or the random effects Poisson model. This study also shows that the estimation errors from the modified EB method are systematically smaller than those from the comparison group method by simultaneously accounting for the RTM and time variant safety effects. Hence, the modified EB method via the NM model is a generalisable method for estimating safety in the presence of the time variant safety and the RTM effects.
Resumo:
This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.
Resumo:
Discrete Markov random field models provide a natural framework for representing images or spatial datasets. They model the spatial association present while providing a convenient Markovian dependency structure and strong edge-preservation properties. However, parameter estimation for discrete Markov random field models is difficult due to the complex form of the associated normalizing constant for the likelihood function. For large lattices, the reduced dependence approximation to the normalizing constant is based on the concept of performing computationally efficient and feasible forward recursions on smaller sublattices which are then suitably combined to estimate the constant for the whole lattice. We present an efficient computational extension of the forward recursion approach for the autologistic model to lattices that have an irregularly shaped boundary and which may contain regions with no data; these lattices are typical in applications. Consequently, we also extend the reduced dependence approximation to these scenarios enabling us to implement a practical and efficient non-simulation based approach for spatial data analysis within the variational Bayesian framework. The methodology is illustrated through application to simulated data and example images. The supplemental materials include our C++ source code for computing the approximate normalizing constant and simulation studies.
Resumo:
We present PAC-Bayes-Empirical-Bernstein inequality. The inequality is based on combination of PAC-Bayesian bounding technique with Empirical Bernstein bound. It allows to take advantage of small empirical variance and is especially useful in regression. We show that when the empirical variance is significantly smaller than the empirical loss PAC-Bayes-Empirical-Bernstein inequality is significantly tighter than PAC-Bayes-kl inequality of Seeger (2002) and otherwise it is comparable. PAC-Bayes-Empirical-Bernstein inequality is an interesting example of application of PAC-Bayesian bounding technique to self-bounding functions. We provide empirical comparison of PAC-Bayes-Empirical-Bernstein inequality with PAC-Bayes-kl inequality on a synthetic example and several UCI datasets.
Resumo:
A new transdimensional Sequential Monte Carlo (SMC) algorithm called SM- CVB is proposed. In an SMC approach, a weighted sample of particles is generated from a sequence of probability distributions which ‘converge’ to the target distribution of interest, in this case a Bayesian posterior distri- bution. The approach is based on the use of variational Bayes to propose new particles at each iteration of the SMCVB algorithm in order to target the posterior more efficiently. The variational-Bayes-generated proposals are not limited to a fixed dimension. This means that the weighted particle sets that arise can have varying dimensions thereby allowing us the option to also estimate an appropriate dimension for the model. This novel algorithm is outlined within the context of finite mixture model estimation. This pro- vides a less computationally demanding alternative to using reversible jump Markov chain Monte Carlo kernels within an SMC approach. We illustrate these ideas in a simulated data analysis and in applications.
Resumo:
Purpose: This study explored the spatial distribution of notified cryptosporidiosis cases and identified major socioeconomic factors associated with the transmission of cryptosporidiosis in Brisbane, Australia. Methods: We obtained the computerized data sets on the notified cryptosporidiosis cases and their key socioeconomic factors by statistical local area (SLA) in Brisbane for the period of 1996 to 2004 from the Queensland Department of Health and Australian Bureau of Statistics, respectively. We used spatial empirical Bayes rates smoothing to estimate the spatial distribution of cryptosporidiosis cases. A spatial classification and regression tree (CART) model was developed to explore the relationship between socioeconomic factors and the incidence rates of cryptosporidiosis. Results: Spatial empirical Bayes analysis reveals that the cryptosporidiosis infections were primarily concentrated in the northwest and southeast of Brisbane. A spatial CART model shows that the relative risk for cryptosporidiosis transmission was 2.4 when the value of the social economic index for areas (SEIFA) was over 1028 and the proportion of residents with low educational attainment in an SLA exceeded 8.8%. Conclusions: There was remarkable variation in spatial distribution of cryptosporidiosis infections in Brisbane. Spatial pattern of cryptosporidiosis seems to be associated with SEIFA and the proportion of residents with low education attainment.
Resumo:
In this thesis, the issue of incorporating uncertainty for environmental modelling informed by imagery is explored by considering uncertainty in deterministic modelling, measurement uncertainty and uncertainty in image composition. Incorporating uncertainty in deterministic modelling is extended for use with imagery using the Bayesian melding approach. In the application presented, slope steepness is shown to be the main contributor to total uncertainty in the Revised Universal Soil Loss Equation. A spatial sampling procedure is also proposed to assist in implementing Bayesian melding given the increased data size with models informed by imagery. Measurement error models are another approach to incorporating uncertainty when data is informed by imagery. These models for measurement uncertainty, considered in a Bayesian conditional independence framework, are applied to ecological data generated from imagery. The models are shown to be appropriate and useful in certain situations. Measurement uncertainty is also considered in the context of change detection when two images are not co-registered. An approach for detecting change in two successive images is proposed that is not affected by registration. The procedure uses the Kolmogorov-Smirnov test on homogeneous segments of an image to detect change, with the homogeneous segments determined using a Bayesian mixture model of pixel values. Using the mixture model to segment an image also allows for uncertainty in the composition of an image. This thesis concludes by comparing several different Bayesian image segmentation approaches that allow for uncertainty regarding the allocation of pixels to different ground components. Each segmentation approach is applied to a data set of chlorophyll values and shown to have different benefits and drawbacks depending on the aims of the analysis.
Resumo:
Speaker verification is the process of verifying the identity of a person by analysing their speech. There are several important applications for automatic speaker verification (ASV) technology including suspect identification, tracking terrorists and detecting a person’s presence at a remote location in the surveillance domain, as well as person authentication for phone banking and credit card transactions in the private sector. Telephones and telephony networks provide a natural medium for these applications. The aim of this work is to improve the usefulness of ASV technology for practical applications in the presence of adverse conditions. In a telephony environment, background noise, handset mismatch, channel distortions, room acoustics and restrictions on the available testing and training data are common sources of errors for ASV systems. Two research themes were pursued to overcome these adverse conditions: Modelling mismatch and modelling uncertainty. To directly address the performance degradation incurred through mismatched conditions it was proposed to directly model this mismatch. Feature mapping was evaluated for combating handset mismatch and was extended through the use of a blind clustering algorithm to remove the need for accurate handset labels for the training data. Mismatch modelling was then generalised by explicitly modelling the session conditions as a constrained offset of the speaker model means. This session variability modelling approach enabled the modelling of arbitrary sources of mismatch, including handset type, and halved the error rates in many cases. Methods to model the uncertainty in speaker model estimates and verification scores were developed to address the difficulties of limited training and testing data. The Bayes factor was introduced to account for the uncertainty of the speaker model estimates in testing by applying Bayesian theory to the verification criterion, with improved performance in matched conditions. Modelling the uncertainty in the verification score itself met with significant success. Estimating a confidence interval for the "true" verification score enabled an order of magnitude reduction in the average quantity of speech required to make a confident verification decision based on a threshold. The confidence measures developed in this work may also have significant applications for forensic speaker verification tasks.