977 resultados para Probabilistic methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

To recognize faces in video, face appearances have been widely modeled as piece-wise local linear models which linearly approximate the smooth yet non-linear low dimensional face appearance manifolds. The choice of representations of the local models is crucial. Most of the existing methods learn each local model individually meaning that they only anticipate variations within each class. In this work, we propose to represent local models as Gaussian distributions which are learned simultaneously using the heteroscedastic probabilistic linear discriminant analysis (PLDA). Each gallery video is therefore represented as a collection of such distributions. With the PLDA, not only the within-class variations are estimated during the training, the separability between classes is also maximized leading to an improved discrimination. The heteroscedastic PLDA itself is adapted from the standard PLDA to approximate face appearance manifolds more accurately. Instead of assuming a single global within-class covariance, the heteroscedastic PLDA learns different within-class covariances specific to each local model. In the recognition phase, a probe video is matched against gallery samples through the fusion of point-to-model distances. Experiments on the Honda and MoBo datasets have shown the merit of the proposed method which achieves better performance than the state-of-the-art technique.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Whole image descriptors have recently been shown to be remarkably robust to perceptual change especially compared to local features. However, whole-image-based localization systems typically rely on heuristic methods for determining appropriate matching thresholds in a particular environment. These environment-specific tuning requirements and the lack of a meaningful interpretation of these arbitrary thresholds limits the general applicability of these systems. In this paper we present a Bayesian model of probability for whole-image descriptors that can be seamlessly integrated into localization systems designed for probabilistic visual input. We demonstrate this method using CAT-Graph, an appearance-based visual localization system originally designed for a FAB-MAP-style probabilistic input. We show that using whole-image descriptors as visual input extends CAT-Graph’s functionality to environments that experience a greater amount of perceptual change. We also present a method of estimating whole-image probability models in an online manner, removing the need for a prior training phase. We show that this online, automated training method can perform comparably to pre-trained, manually tuned local descriptor methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As the level of autonomy in Unmanned Aircraft Systems (UAS) increases, there is an imperative need for developing methods to assess robust autonomy. This paper focuses on the computations that lead to a set of measures of robust autonomy. These measures are the probabilities that selected performance indices related to the mission requirements and airframe capabilities remain within regions of acceptable performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A common problem with the use of tensor modeling in generating quality recommendations for large datasets is scalability. In this paper, we propose the Tensor-based Recommendation using Probabilistic Ranking method that generates the reconstructed tensor using block-striped parallel matrix multiplication and then probabilistically calculates the preferences of user to rank the recommended items. Empirical analysis on two real-world datasets shows that the proposed method is scalable for large tensor datasets and is able to outperform the benchmarking methods in terms of accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a tag-based recommender system, the multi-dimensional correlation should be modeled effectively for finding quality recommendations. Recently, few researchers have used tensor models in recommendation to represent and analyze latent relationships inherent in multi-dimensions data. A common approach is to build the tensor model, decompose it and, then, directly use the reconstructed tensor to generate the recommendation based on the maximum values of tensor elements. In order to improve the accuracy and scalability, we propose an implementation of the -mode block-striped (matrix) product for scalable tensor reconstruction and probabilistically ranking the candidate items generated from the reconstructed tensor. With testing on real-world datasets, we demonstrate that the proposed method outperforms the benchmarking methods in terms of recommendation accuracy and scalability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conceptual combination performs a fundamental role in creating the broad range of compound phrases utilised in everyday language. While the systematicity and productivity of language provide a strong argument in favour of assuming compositionality, this very assumption is still regularly questioned in both cognitive science and philosophy. This article provides a novel probabilistic framework for assessing whether the semantics of conceptual combinations are compositional, and so can be considered as a function of the semantics of the constituent concepts, or not. Rather than adjudicating between different grades of compositionality, the framework presented here contributes formal methods for determining a clear dividing line between compositional and non-compositional semantics. Compositionality is equated with a joint probability distribution modelling how the constituent concepts in the combination are interpreted. Marginal selectivity is emphasised as a pivotal probabilistic constraint for the application of the Bell/CH and CHSH systems of inequalities (referred to collectively as Bell-type). Non-compositionality is then equated with either a failure of marginal selectivity, or, in the presence of marginal selectivity, with a violation of Bell-type inequalities. In both non-compositional scenarios, the conceptual combination cannot be modelled using a joint probability distribution with variables corresponding to the interpretation of the individual concepts. The framework is demonstrated by applying it to an empirical scenario of twenty-four non-lexicalised conceptual combinations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many statistical forecast systems are available to interested users. In order to be useful for decision-making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and their statistical manifestation have been firmly established, the forecasts must also provide some quantitative evidence of `quality’. However, the quality of statistical climate forecast systems (forecast quality) is an ill-defined and frequently misunderstood property. Often, providers and users of such forecast systems are unclear about what ‘quality’ entails and how to measure it, leading to confusion and misinformation. Here we present a generic framework to quantify aspects of forecast quality using an inferential approach to calculate nominal significance levels (p-values) that can be obtained either by directly applying non-parametric statistical tests such as Kruskal-Wallis (KW) or Kolmogorov-Smirnov (KS) or by using Monte-Carlo methods (in the case of forecast skill scores). Once converted to p-values, these forecast quality measures provide a means to objectively evaluate and compare temporal and spatial patterns of forecast quality across datasets and forecast systems. Our analysis demonstrates the importance of providing p-values rather than adopting some arbitrarily chosen significance levels such as p < 0.05 or p < 0.01, which is still common practice. This is illustrated by applying non-parametric tests (such as KW and KS) and skill scoring methods (LEPS and RPSS) to the 5-phase Southern Oscillation Index classification system using historical rainfall data from Australia, The Republic of South Africa and India. The selection of quality measures is solely based on their common use and does not constitute endorsement. We found that non-parametric statistical tests can be adequate proxies for skill measures such as LEPS or RPSS. The framework can be implemented anywhere, regardless of dataset, forecast system or quality measure. Eventually such inferential evidence should be complimented by descriptive statistical methods in order to fully assist in operational risk management.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many downscaling techniques have been developed in the past few years for projection of station-scale hydrological variables from large-scale atmospheric variables simulated by general circulation models (GCMs) to assess the hydrological impacts of climate change. This article compares the performances of three downscaling methods, viz. conditional random field (CRF), K-nearest neighbour (KNN) and support vector machine (SVM) methods in downscaling precipitation in the Punjab region of India, belonging to the monsoon regime. The CRF model is a recently developed method for downscaling hydrological variables in a probabilistic framework, while the SVM model is a popular machine learning tool useful in terms of its ability to generalize and capture nonlinear relationships between predictors and predictand. The KNN model is an analogue-type method that queries days similar to a given feature vector from the training data and classifies future days by random sampling from a weighted set of K closest training examples. The models are applied for downscaling monsoon (June to September) daily precipitation at six locations in Punjab. Model performances with respect to reproduction of various statistics such as dry and wet spell length distributions, daily rainfall distribution, and intersite correlations are examined. It is found that the CRF and KNN models perform slightly better than the SVM model in reproducing most daily rainfall statistics. These models are then used to project future precipitation at the six locations. Output from the Canadian global climate model (CGCM3) GCM for three scenarios, viz. A1B, A2, and B1 is used for projection of future precipitation. The projections show a change in probability density functions of daily rainfall amount and changes in the wet and dry spell distributions of daily precipitation. Copyright (C) 2011 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Latent variable methods, such as PLCA (Probabilistic Latent Component Analysis) have been successfully used for analysis of non-negative signal representations. In this paper, we formulate PLCS (Probabilistic Latent Component Segmentation), which models each time frame of a spectrogram as a spectral distribution. Given the signal spectrogram, the segmentation boundaries are estimated using a maximum-likelihood approach. For an efficient solution, the algorithm imposes a hard constraint that each segment is modelled by a single latent component. The hard constraint facilitates the solution of ML boundary estimation using dynamic programming. The PLCS framework does not impose a parametric assumption unlike earlier ML segmentation techniques. PLCS can be naturally extended to model coarticulation between successive phones. Experiments on the TIMIT corpus show that the proposed technique is promising compared to most state of the art speech segmentation algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Monte Carlo simulation methods involving splitting of Markov chains have been used in evaluation of multi-fold integrals in different application areas. We examine in this paper the performance of these methods in the context of evaluation of reliability integrals from the point of view of characterizing the sampling fluctuations. The methods discussed include the Au-Beck subset simulation, Holmes-Diaconis-Ross method, and generalized splitting algorithm. A few improvisations based on first order reliability method are suggested to select algorithmic parameters of the latter two methods. The bias and sampling variance of the alternative estimators are discussed. Also, an approximation to the sampling distribution of some of these estimators is obtained. Illustrative examples involving component and series system reliability analyses are presented with a view to bring out the relative merits of alternative methods. (C) 2015 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methods for generating a new population are a fundamental component of estimation of distribution algorithms (EDAs). They serve to transfer the information contained in the probabilistic model to the new generated population. In EDAs based on Markov networks, methods for generating new populations usually discard information contained in the model to gain in efficiency. Other methods like Gibbs sampling use information about all interactions in the model but are computationally very costly. In this paper we propose new methods for generating new solutions in EDAs based on Markov networks. We introduce approaches based on inference methods for computing the most probable configurations and model-based template recombination. We show that the application of different variants of inference methods can increase the EDAs’ convergence rate and reduce the number of function evaluations needed to find the optimum of binary and non-binary discrete functions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: The National Oceanic and Atmospheric Administration’s Biogeography Branch has conducted surveys of reef fish in the Caribbean since 1999. Surveys were initially undertaken to identify essential fish habitat, but later were used to characterize and monitor reef fish populations and benthic communities over time. The Branch’s goals are to develop knowledge and products on the distribution and ecology of living marine resources and provide resource managers, scientists and the public with an improved ecosystem basis for making decisions. The Biogeography Branch monitors reef fishes and benthic communities in three study areas: (1) St. John, USVI, (2) Buck Island, St. Croix, USVI, and (3) La Parguera, Puerto Rico. In addition, the Branch has characterized the reef fish and benthic communities in the Flower Garden Banks National Marine Sanctuary, Gray’s Reef National Marine Sanctuary and around the island of Vieques, Puerto Rico. Reef fish data are collected using a stratified random sampling design and stringent measurement protocols. Over time, the sampling design has changed in order to meet different management objectives (i.e. identification of essential fish habitat vs. monitoring), but the designs have always remained: • Probabilistic – to allow inferences to a larger targeted population, • Objective – to satisfy management objectives, and • Stratified – to reduce sampling costs and obtain population estimates for strata. There are two aspects of the sampling design which are now under consideration and are the focus of this report: first, the application of a sample frame, identified as a set of points or grid elements from which a sample is selected; and second, the application of subsampling in a two-stage sampling design. To evaluate these considerations, the pros and cons of implementing a sampling frame and subsampling are discussed. Particular attention is paid to the impacts of each design on accuracy (bias), feasibility and sampling cost (precision). Further, this report presents an analysis of data to determine the optimal number of subsamples to collect if subsampling were used. (PDF contains 19 pages)