959 resultados para Posterior probability


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework. © 2012 The Author(s).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The application of custom classification techniques and posterior probability modeling (PPM) using Worldview-2 multispectral imagery to archaeological field survey is presented in this paper. Research is focused on the identification of Neolithic felsite stone tool workshops in the North Mavine region of the Shetland Islands in Northern Scotland. Sample data from known workshops surveyed using differential GPS are used alongside known non-sites to train a linear discriminant analysis (LDA) classifier based on a combination of datasets including Worldview-2 bands, band difference ratios (BDR) and topographical derivatives. Principal components analysis is further used to test and reduce dimensionality caused by redundant datasets. Probability models were generated by LDA using principal components and tested with sites identified through geological field survey. Testing shows the prospective ability of this technique and significance between 0.05 and 0.01, and gain statistics between 0.90 and 0.94, higher than those obtained using maximum likelihood and random forest classifiers. Results suggest that this approach is best suited to relatively homogenous site types, and performs better with correlated data sources. Finally, by combining posterior probability models and least-cost analysis, a survey least-cost efficacy model is generated showing the utility of such approaches to archaeological field survey.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A new Bayesian algorithm for retrieving surface rain rate from Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) over the ocean is presented, along with validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach offers a rigorous basis for optimally combining multichannel observations with prior knowledge. While other rain-rate algorithms have been published that are based at least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes’s theorem to yield not just a single rain rate, but rather a continuous posterior probability distribution of rain rate. To advance the understanding of theoretical benefits of the Bayesian approach, sensitivity analyses have been conducted based on two synthetic datasets for which the “true” conditional and prior distribution are known. Results demonstrate that even when the prior and conditional likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not the result of a defect of the Bayesian formalism, but rather represents the expected outcome when the physical constraint imposed by the radiometric observations is weak owing to saturation effects. It is also suggested that both the choice of the estimators and the prior information are crucial to the retrieval. In addition, the performance of the Bayesian algorithm herein is found to be comparable to that of other benchmark algorithms in real-world applications, while having the additional advantage of providing a complete continuous posterior probability distribution of surface rain rate.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper analyzes the inner relations between classical sub-scheme probability and statistic probability, subjective probability and objective probability, prior probability and posterior probability, transition probability and probability of utility, and further analysis the goal, method, and its practical economic purpose which represent by these various probability from the perspective of mathematics, so as to deeply understand there connotation and its relation with economic decision making, thus will pave the route for scientific predication and decision making.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a method for topological SLAM that specifically targets loop closing for edge-ordered graphs. Instead of using a heuristic approach to accept or reject loop closing, we propose a probabilistically grounded multi-hypothesis technique that relies on the incremental construction of a map/state hypothesis tree. Loop closing is introduced automatically within the tree expansion, and likely hypotheses are chosen based on their posterior probability after a sequence of sensor measurements. Careful pruning of the hypothesis tree keeps the growing number of hypotheses under control and a recursive formulation reduces storage and computational costs. Experiments are used to validate the approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bactrocera dorsalis sensu stricto, B. papayae, B. philippinensis and B. carambolae are serious pest fruit fly species of the B. dorsalis complex that predominantly occur in south-east Asia and the Pacific. Identifying molecular diagnostics has proven problematic for these four taxa, a situation that cofounds biosecurity and quarantine efforts and which may be the result of at least some of these taxa representing the same biological species. We therefore conducted a phylogenetic study of these four species (and closely related outgroup taxa) based on the individuals collected from a wide geographic range; sequencing six loci (cox1, nad4-3′, CAD, period, ITS1, ITS2) for approximately 20 individuals from each of 16 sample sites. Data were analysed within maximum likelihood and Bayesian phylogenetic frameworks for individual loci and concatenated data sets for which we applied multiple monophyly and species delimitation tests. Species monophyly was measured by clade support, posterior probability or bootstrap resampling for Bayesian and likelihood analyses respectively, Rosenberg's reciprocal monophyly measure, P(AB), Rodrigo's (P(RD)) and the genealogical sorting index, gsi. We specifically tested whether there was phylogenetic support for the four 'ingroup' pest species using a data set of multiple individuals sampled from a number of populations. Based on our combined data set, Bactrocera carambolae emerges as a distinct monophyletic clade, whereas B. dorsalis s.s., B. papayae and B. philippinensis are unresolved. These data add to the growing body of evidence that B. dorsalis s.s., B. papayae and B. philippinensis are the same biological species, which poses consequences for quarantine, trade and pest management.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Approximate Bayesian Computation’ (ABC) represents a powerful methodology for the analysis of complex stochastic systems for which the likelihood of the observed data under an arbitrary set of input parameters may be entirely intractable – the latter condition rendering useless the standard machinery of tractable likelihood-based, Bayesian statistical inference [e.g. conventional Markov chain Monte Carlo (MCMC) simulation]. In this paper, we demonstrate the potential of ABC for astronomical model analysis by application to a case study in the morphological transformation of high-redshift galaxies. To this end, we develop, first, a stochastic model for the competing processes of merging and secular evolution in the early Universe, and secondly, through an ABC-based comparison against the observed demographics of massive (Mgal > 1011 M⊙) galaxies (at 1.5 < z < 3) in the Cosmic Assembly Near-IR Deep Extragalatic Legacy Survey (CANDELS)/Extended Groth Strip (EGS) data set we derive posterior probability densities for the key parameters of this model. The ‘Sequential Monte Carlo’ implementation of ABC exhibited herein, featuring both a self-generating target sequence and self-refining MCMC kernel, is amongst the most efficient of contemporary approaches to this important statistical algorithm. We highlight as well through our chosen case study the value of careful summary statistic selection, and demonstrate two modern strategies for assessment and optimization in this regard. Ultimately, our ABC analysis of the high-redshift morphological mix returns tight constraints on the evolving merger rate in the early Universe and favours major merging (with disc survival or rapid reformation) over secular evolution as the mechanism most responsible for building up the first generation of bulges in early-type discs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies. © 2012 Nature America, Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

So far, most Phase II trials have been designed and analysed under a frequentist framework. Under this framework, a trial is designed so that the overall Type I and Type II errors of the trial are controlled at some desired levels. Recently, a number of articles have advocated the use of Bavesian designs in practice. Under a Bayesian framework, a trial is designed so that the trial stops when the posterior probability of treatment is within certain prespecified thresholds. In this article, we argue that trials under a Bayesian framework can also be designed to control frequentist error rates. We introduce a Bayesian version of Simon's well-known two-stage design to achieve this goal. We also consider two other errors, which are called Bayesian errors in this article because of their similarities to posterior probabilities. We show that our method can also control these Bayesian-type errors. We compare our method with other recent Bayesian designs in a numerical study and discuss implications of different designs on error rates. An example of a clinical trial for patients with nasopharyngeal carcinoma is used to illustrate differences of the different designs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The relationship between site characteristics and understorey vegetation composition was analysed with quantitative methods, especially from the viewpoint of site quality estimation. Theoretical models were applied to an empirical data set collected from the upland forests of southern Finland comprising 104 sites dominated by Scots pine (Pinus sylvestris L.), and 165 sites dominated by Norway spruce (Picea abies (L.) Karsten). Site index H100 was used as an independent measure of site quality. A new model for the estimation of site quality at sites with a known understorey vegetation composition was introduced. It is based on the application of Bayes' theorem to the density function of site quality within the study area combined with the species-specific presence-absence response curves. The resulting posterior probability density function may be used for calculating an estimate for the site variable. Using this method, a jackknife estimate of site index H100 was calculated separately for pine- and spruce-dominated sites. The results indicated that the cross-validation root mean squared error (RMSEcv) of the estimates improved from 2.98 m down to 2.34 m relative to the "null" model (standard deviation of the sample distribution) in pine-dominated forests. In spruce-dominated forests RMSEcv decreased from 3.94 m down to 3.16 m. In order to assess these results, four other estimation methods based on understorey vegetation composition were applied to the same data set. The results showed that none of the methods was clearly superior to the others. In pine-dominated forests, RMSEcv varied between 2.34 and 2.47 m, and the corresponding range for spruce-dominated forests was from 3.13 to 3.57 m.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The lifetime calculation of large dense sensor networks with fixed energy resources and the remaining residual energy have shown that for a constant energy resource in a sensor network the fault rate at the cluster head is network size invariant when using the network layer with no MAC losses.Even after increasing the battery capacities in the nodes the total lifetime does not increase after a max limit of 8 times. As this is a serious limitation lots of research has been done at the MAC layer which allows to adapt to the specific connectivity, traffic and channel polling needs for sensor networks. There have been lots of MAC protocols which allow to control the channel polling of new radios which are available to sensor nodes to communicate. This further reduces the communication overhead by idling and sleep scheduling thus extending the lifetime of the monitoring application. We address the two issues which effects the distributed characteristics and performance of connected MAC nodes. (1) To determine the theoretical minimum rate based on joint coding for a correlated data source at the singlehop, (2a) to estimate cluster head errors using Bayesian rule for routing using persistence clustering when node densities are the same and stored using prior probability at the network layer, (2b) to estimate the upper bound of routing errors when using passive clustering were the node densities at the multi-hop MACS are unknown and not stored at the multi-hop nodes a priori. In this paper we evaluate many MAC based sensor network protocols and study the effects on sensor network lifetime. A renewable energy MAC routing protocol is designed when the probabilities of active nodes are not known a priori. From theoretical derivations we show that for a Bayesian rule with known class densities of omega1, omega2 with expected error P* is bounded by max error rate of P=2P* for single-hop. We study the effects of energy losses using cross-layer simulation of - large sensor network MACS setup, the error rate which effect finding sufficient node densities to have reliable multi-hop communications due to unknown node densities. The simulation results show that even though the lifetime is comparable the expected Bayesian posterior probability error bound is close or higher than Pges2P*.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we propose a new algorithm for learning polyhedral classifiers. In contrast to existing methods for learning polyhedral classifier which solve a constrained optimization problem, our method solves an unconstrained optimization problem. Our method is based on a logistic function based model for the posterior probability function. We propose an alternating optimization algorithm, namely, SPLA1 (Single Polyhedral Learning Algorithm1) which maximizes the loglikelihood of the training data to learn the parameters. We also extend our method to make it independent of any user specified parameter (e.g., number of hyperplanes required to form a polyhedral set) in SPLA2. We show the effectiveness of our approach with experiments on various synthetic and real world datasets and compare our approach with a standard decision tree method (OC1) and a constrained optimization based method for learning polyhedral sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider a small extent sensor network for event detection, in which nodes periodically take samples and then contend over a random access network to transmit their measurement packets to the fusion center. We consider two procedures at the fusion center for processing the measurements. The Bayesian setting, is assumed, that is, the fusion center has a prior distribution on the change time. In the first procedure, the decision algorithm at the fusion center is network-oblivious and makes a decision only when a complete vector of measurements taken at a sampling instant is available. In the second procedure, the decision algorithm at the fusion center is network-aware and processes measurements as they arrive, but in a time-causal order. In this case, the decision statistic depends on the network delays, whereas in the network-oblivious case, the decision statistic does not. This yields a Bayesian change-detection problem with a trade-off between the random network delay and the decision delay that is, a higher sampling rate reduces the decision delay but increases the random access delay. Under periodic sampling, in the network-oblivious case, the structure of the optimal stopping rule is the same as that without the network, and the optimal change detection delay decouples into the network delay and the optimal decision delay without the network. In the network-aware case, the optimal stopping problem is analyzed as a partially observable Markov decision process, in which the states of the queues and delays in the network need to be maintained. A sufficient decision statistic is the network state and the posterior probability of change having occurred, given the measurements received and the state of the network. The optimal regimes are studied using simulation.