195 resultados para Naive Bayes classifier


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the study of traffic safety, expected crash frequencies across sites are generally estimated via the negative binomial model, assuming time invariant safety. Since the time invariant safety assumption may be invalid, Hauer (1997) proposed a modified empirical Bayes (EB) method. Despite the modification, no attempts have been made to examine the generalisable form of the marginal distribution resulting from the modified EB framework. Because the hyper-parameters needed to apply the modified EB method are not readily available, an assessment is lacking on how accurately the modified EB method estimates safety in the presence of the time variant safety and regression-to-the-mean (RTM) effects. This study derives the closed form marginal distribution, and reveals that the marginal distribution in the modified EB method is equivalent to the negative multinomial (NM) distribution, which is essentially the same as the likelihood function used in the random effects Poisson model. As a result, this study shows that the gamma posterior distribution from the multivariate Poisson-gamma mixture can be estimated using the NM model or the random effects Poisson model. This study also shows that the estimation errors from the modified EB method are systematically smaller than those from the comparison group method by simultaneously accounting for the RTM and time variant safety effects. Hence, the modified EB method via the NM model is a generalisable method for estimating safety in the presence of the time variant safety and the RTM effects.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Plasmodium spp. parasites cause malaria in 300 to 500 million individuals each year. Disease occurs during the blood-stage of the parasite’s life cycle, where the parasite is thought to replicate exclusively within erythrocytes. Infected individuals can also suffer relapses after several years, from Plasmodium vivax and Plasmodium ovale surviving in hepatocytes. Plasmodium falciparum and Plasmodium malariae can also persist after the original bout of infection has apparently cleared in the blood, suggesting that host cells other than erythrocytes (but not hepatocytes) may harbor these blood-stage parasites, thereby assisting their escape from host immunity. Using blood stage transgenic Plasmodium berghei-expressing GFP (PbGFP) to track parasites in host cells, we found that the parasite had a tropism for CD317+ dendritic cells. Other studies using confocal microscopy, in vitro cultures, and cell transfer studies showed that blood-stage parasites could infect, survive, and replicate within CD317+ dendritic cells, and that small numbers of these cells released parasites infectious for erythrocytes in vivo. These data have identified a unique survival strategy for blood-stage Plasmodium, which has significant implications for understanding the escape of Plasmodium spp. from immune-surveillance and for vaccine development.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after n1-ε iterations---for sample size n and ε ∈ (0,1)---the sequence of risks of the classifiers it produces approaches the Bayes risk.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Discrete Markov random field models provide a natural framework for representing images or spatial datasets. They model the spatial association present while providing a convenient Markovian dependency structure and strong edge-preservation properties. However, parameter estimation for discrete Markov random field models is difficult due to the complex form of the associated normalizing constant for the likelihood function. For large lattices, the reduced dependence approximation to the normalizing constant is based on the concept of performing computationally efficient and feasible forward recursions on smaller sublattices which are then suitably combined to estimate the constant for the whole lattice. We present an efficient computational extension of the forward recursion approach for the autologistic model to lattices that have an irregularly shaped boundary and which may contain regions with no data; these lattices are typical in applications. Consequently, we also extend the reduced dependence approximation to these scenarios enabling us to implement a practical and efficient non-simulation based approach for spatial data analysis within the variational Bayesian framework. The methodology is illustrated through application to simulated data and example images. The supplemental materials include our C++ source code for computing the approximate normalizing constant and simulation studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study proposes a framework of a model-based hot spot identification method by applying full Bayes (FB) technique. In comparison with the state-of-the-art approach [i.e., empirical Bayes method (EB)], the advantage of the FB method is the capability to seamlessly integrate prior information and all available data into posterior distributions on which various ranking criteria could be based. With intersection crash data collected in Singapore, an empirical analysis was conducted to evaluate the following six approaches for hot spot identification: (a) naive ranking using raw crash data, (b) standard EB ranking, (c) FB ranking using a Poisson-gamma model, (d) FB ranking using a Poisson-lognormal model, (e) FB ranking using a hierarchical Poisson model, and (f) FB ranking using a hierarchical Poisson (AR-1) model. The results show that (a) when using the expected crash rate-related decision parameters, all model-based approaches perform significantly better in safety ranking than does the naive ranking method, and (b) the FB approach using hierarchical models significantly outperforms the standard EB approach in correctly identifying hazardous sites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study proposes a full Bayes (FB) hierarchical modeling approach in traffic crash hotspot identification. The FB approach is able to account for all uncertainties associated with crash risk and various risk factors by estimating a posterior distribution of the site safety on which various ranking criteria could be based. Moreover, by use of hierarchical model specification, FB approach is able to flexibly take into account various heterogeneities of crash occurrence due to spatiotemporal effects on traffic safety. Using Singapore intersection crash data(1997-2006), an empirical evaluate was conducted to compare the proposed FB approach to the state-of-the-art approaches. Results show that the Bayesian hierarchical models with accommodation for site specific effect and serial correlation have better goodness-of-fit than non hierarchical models. Furthermore, all model-based approaches perform significantly better in safety ranking than the naive approach using raw crash count. The FB hierarchical models were found to significantly outperform the standard EB approach in correctly identifying hotspots.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many state of the art vision-based Simultaneous Localisation And Mapping (SLAM) and place recognition systems compute the salience of visual features in their environment. As computing salience can be problematic in radically changing environments new low resolution feature-less systems have been introduced, such as SeqSLAM, all of which consider the whole image. In this paper, we implement a supervised classifier system (UCS) to learn the salience of image regions for place recognition by feature-less systems. SeqSLAM only slightly benefits from the results of training, on the challenging real world Eynsham dataset, as it already appears to filter less useful regions of a panoramic image. However, when recognition is limited to specific image regions performance improves by more than an order of magnitude by utilising the learnt image region saliency. We then investigate whether the region salience generated from the Eynsham dataset generalizes to another car-based dataset using a perspective camera. The results suggest the general applicability of an image region salience mask for optimizing route-based navigation applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A big challenge for classification on text is the noisy of text data. It makes classification quality low. Many classification process can be divided into two sequential steps scoring and threshold setting (thresholding). Therefore to deal with noisy data problem, it is important to describe positive feature effectively scoring and to set a suitable threshold. Most existing text classifiers do not concentrate on these two jobs. In this paper, we propose a novel text classifier with pattern-based scoring that describe positive feature effectively, followed by threshold setting. The thresholding is based on score of training set, make it is simple to implement in other scoring methods. Experiment shows that our pattern-based classifier is promising.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classifier selection is a problem encountered by multi-biometric systems that aim to improve performance through fusion of decisions. A particular decision fusion architecture that combines multiple instances (n classifiers) and multiple samples (m attempts at each classifier) has been proposed in previous work to achieve controlled trade-off between false alarms and false rejects. Although analysis on text-dependent speaker verification has demonstrated better performance for fusion of decisions with favourable dependence compared to statistically independent decisions, the performance is not always optimal. Given a pool of instances, best performance with this architecture is obtained for certain combination of instances. Heuristic rules and diversity measures have been commonly used for classifier selection but it is shown that optimal performance is achieved for the `best combination performance' rule. As the search complexity for this rule increases exponentially with the addition of classifiers, a measure - the sequential error ratio (SER) - is proposed in this work that is specifically adapted to the characteristics of sequential fusion architecture. The proposed measure can be used to select a classifier that is most likely to produce a correct decision at each stage. Error rates for fusion of text-dependent HMM based speaker models using SER are compared with other classifier selection methodologies. SER is shown to achieve near optimal performance for sequential fusion of multiple instances with or without the use of multiple samples. The methodology applies to multiple speech utterances for telephone or internet based access control and to other systems such as multiple finger print and multiple handwriting sample based identity verification systems.