970 resultados para Naive Bayes classifier


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor to replace the Bayesian Information Criterion (BIC) as a criterion for speaker clustering within a speaker diarization system. The BIC is one of the most popular decision criteria used in speaker diarization systems today. However, it will be shown in this paper that the BIC is only an approximation to the Bayes factor of marginal likelihoods of the data given each hypothesis. This paper uses the Bayes factor directly as a decision criterion for speaker clustering, thus removing the error introduced by the BIC approximation. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, leading to a 14.7% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the study of traffic safety, expected crash frequencies across sites are generally estimated via the negative binomial model, assuming time invariant safety. Since the time invariant safety assumption may be invalid, Hauer (1997) proposed a modified empirical Bayes (EB) method. Despite the modification, no attempts have been made to examine the generalisable form of the marginal distribution resulting from the modified EB framework. Because the hyper-parameters needed to apply the modified EB method are not readily available, an assessment is lacking on how accurately the modified EB method estimates safety in the presence of the time variant safety and regression-to-the-mean (RTM) effects. This study derives the closed form marginal distribution, and reveals that the marginal distribution in the modified EB method is equivalent to the negative multinomial (NM) distribution, which is essentially the same as the likelihood function used in the random effects Poisson model. As a result, this study shows that the gamma posterior distribution from the multivariate Poisson-gamma mixture can be estimated using the NM model or the random effects Poisson model. This study also shows that the estimation errors from the modified EB method are systematically smaller than those from the comparison group method by simultaneously accounting for the RTM and time variant safety effects. Hence, the modified EB method via the NM model is a generalisable method for estimating safety in the presence of the time variant safety and the RTM effects.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Plasmodium spp. parasites cause malaria in 300 to 500 million individuals each year. Disease occurs during the blood-stage of the parasite’s life cycle, where the parasite is thought to replicate exclusively within erythrocytes. Infected individuals can also suffer relapses after several years, from Plasmodium vivax and Plasmodium ovale surviving in hepatocytes. Plasmodium falciparum and Plasmodium malariae can also persist after the original bout of infection has apparently cleared in the blood, suggesting that host cells other than erythrocytes (but not hepatocytes) may harbor these blood-stage parasites, thereby assisting their escape from host immunity. Using blood stage transgenic Plasmodium berghei-expressing GFP (PbGFP) to track parasites in host cells, we found that the parasite had a tropism for CD317+ dendritic cells. Other studies using confocal microscopy, in vitro cultures, and cell transfer studies showed that blood-stage parasites could infect, survive, and replicate within CD317+ dendritic cells, and that small numbers of these cells released parasites infectious for erythrocytes in vivo. These data have identified a unique survival strategy for blood-stage Plasmodium, which has significant implications for understanding the escape of Plasmodium spp. from immune-surveillance and for vaccine development.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after n1-ε iterations---for sample size n and ε ∈ (0,1)---the sequence of risks of the classifiers it produces approaches the Bayes risk.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Discrete Markov random field models provide a natural framework for representing images or spatial datasets. They model the spatial association present while providing a convenient Markovian dependency structure and strong edge-preservation properties. However, parameter estimation for discrete Markov random field models is difficult due to the complex form of the associated normalizing constant for the likelihood function. For large lattices, the reduced dependence approximation to the normalizing constant is based on the concept of performing computationally efficient and feasible forward recursions on smaller sublattices which are then suitably combined to estimate the constant for the whole lattice. We present an efficient computational extension of the forward recursion approach for the autologistic model to lattices that have an irregularly shaped boundary and which may contain regions with no data; these lattices are typical in applications. Consequently, we also extend the reduced dependence approximation to these scenarios enabling us to implement a practical and efficient non-simulation based approach for spatial data analysis within the variational Bayesian framework. The methodology is illustrated through application to simulated data and example images. The supplemental materials include our C++ source code for computing the approximate normalizing constant and simulation studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study proposes a framework of a model-based hot spot identification method by applying full Bayes (FB) technique. In comparison with the state-of-the-art approach [i.e., empirical Bayes method (EB)], the advantage of the FB method is the capability to seamlessly integrate prior information and all available data into posterior distributions on which various ranking criteria could be based. With intersection crash data collected in Singapore, an empirical analysis was conducted to evaluate the following six approaches for hot spot identification: (a) naive ranking using raw crash data, (b) standard EB ranking, (c) FB ranking using a Poisson-gamma model, (d) FB ranking using a Poisson-lognormal model, (e) FB ranking using a hierarchical Poisson model, and (f) FB ranking using a hierarchical Poisson (AR-1) model. The results show that (a) when using the expected crash rate-related decision parameters, all model-based approaches perform significantly better in safety ranking than does the naive ranking method, and (b) the FB approach using hierarchical models significantly outperforms the standard EB approach in correctly identifying hazardous sites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study proposes a full Bayes (FB) hierarchical modeling approach in traffic crash hotspot identification. The FB approach is able to account for all uncertainties associated with crash risk and various risk factors by estimating a posterior distribution of the site safety on which various ranking criteria could be based. Moreover, by use of hierarchical model specification, FB approach is able to flexibly take into account various heterogeneities of crash occurrence due to spatiotemporal effects on traffic safety. Using Singapore intersection crash data(1997-2006), an empirical evaluate was conducted to compare the proposed FB approach to the state-of-the-art approaches. Results show that the Bayesian hierarchical models with accommodation for site specific effect and serial correlation have better goodness-of-fit than non hierarchical models. Furthermore, all model-based approaches perform significantly better in safety ranking than the naive approach using raw crash count. The FB hierarchical models were found to significantly outperform the standard EB approach in correctly identifying hotspots.