74 resultados para Trimmed likelihood


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper investigates a method of automatic pronunciation scoring for use in computer-assisted language learning (CALL) systems. The method utilizes a likelihood-based `Goodness of Pronunciation' (GOP) measure which is extended to include individual thresholds for each phone based on both averaged native confidence scores and on rejection statistics provided by human judges. Further improvements are obtained by incorporating models of the subject's native language and by augmenting the recognition networks to include expected pronunciation errors. The various GOP measures are assessed using a specially recorded database of non-native speakers which has been annotated to mark phone-level pronunciation errors. Since pronunciation assessment is highly subjective, a set of four performance measures has been designed, each of them measuring different aspects of how well computer-derived phone-level scores agree with human scores. These performance measures are used to cross-validate the reference annotations and to assess the basic GOP algorithm and its refinements. The experimental results suggest that a likelihood-based pronunciation scoring metric can achieve usable performance, especially after applying the various enhancements.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. In particular there are three areas of novelty: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation, learnt offline, to generalize in the presence of extreme illumination changes; (ii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve invariance to unseen head poses; and (iii) we introduce an accurate video sequence "reillumination" algorithm to achieve robustness to face motion patterns in video. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our system consistently demonstrated a nearly perfect recognition rate (over 99.7%), significantly outperforming state-of-the-art commercial software and methods from the literature. © Springer-Verlag Berlin Heidelberg 2006.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For many realistic scenarios, there are multiple factors that affect the clean speech signal. In this work approaches to handling two such factors, speaker and background noise differences, simultaneously are described. A new adaptation scheme is proposed. Here the acoustic models are first adapted to the target speaker via an MLLR transform. This is followed by adaptation to the target noise environment via model-based vector Taylor series (VTS) compensation. These speaker and noise transforms are jointly estimated, using maximum likelihood. Experiments on the AURORA4 task demonstrate that this adaptation scheme provides improved performance over VTS-based noise adaptation. In addition, this framework enables the speech and noise to be factorised, allowing the speaker transform estimated in one noise condition to be successfully used in a different noise condition. © 2011 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have been combined with features derived from generative models for noise-robust speech recognition for continuous digits. This paper extends this work to medium to large vocabulary tasks. The form of the score-space extracted using the generative models, and parameter tying of the discriminative model, are both discussed. Update formulae for both conditional maximum likelihood and minimum Bayes' risk training are described. Experimental results are presented on small and medium to large vocabulary noise-corrupted speech recognition tasks: AURORA 2 and 4. © 2011 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Novel statistical models are proposed and developed in this paper for automated multiple-pitch estimation problems. Point estimates of the parameters of partial frequencies of a musical note are modeled as realizations from a non-homogeneous Poisson process defined on the frequency axis. When several notes are combined, the processes for the individual notes combine to give a new Poisson process whose likelihood is easy to compute. This model avoids the data-association step of linking the harmonics of each note with the corresponding partials and is ideal for efficient Bayesian inference of unknown multiple fundamental frequencies in a signal. © 2011 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Purpose: Advocates and critics of target-setting in the workplace seem unable to reach beyond their own well-entrenched battle lines. While the advocates of goal-directed behaviour point to what they see as demonstrable advantages, the critics of target-setting highlight equally demonstrable disadvantages. Indeed, the academic literature on this topic is currently mired in controversy, with neither side seemingly capable of envisaging a better way forward. This paper seeks to break the current deadlock and move thinking forward in this important aspect of performance measurement and management by outlining a new, more fruitful approach, based on both theory and practical experience. Design/methodology/approach: The topic was approached in three phases: assembling and reading key academic and other literature on the subject of target-setting and goal-directed behaviour, with a view to understanding, in depth, the arguments advanced by the advocates and critics of target-setting; comparing these published arguments with one's own experiential findings, in order to bring the essence of disagreement into much sharper focus; and then bringing to bear the academic and practical experience to identify the essential elements of a new, more fruitful approach offering all the benefits of goal-directed behaviour with none of the typical disadvantages of target-setting. Findings: The research led to three key findings: the advocates of goal-directed behaviour and critics of target-setting each make valid points, as seen from their own current perspectives; the likelihood of these two communities, left to themselves, ever reaching a new synthesis, seems vanishingly small (with leading thinkers in the goal-directed behaviour community already acknowledging this); and, between the three authors, it was discovered that their unusual combination of academic study and practical experience enabled them to see things differently. Hence, they would like to share their new thinking more widely. Research limitations/implications: The authors fully accept that their paper is informed by extensive practical experience and, as yet, there have been no opportunities to test their findings, conclusions and recommendations through rigorous academic research. However, they hope that the paper will move thinking forward in this arena, thereby informing future academic research. Practical implications: The authors hope that the practical implications of the paper will be significant, as it outlines a novel way for organisations to capture the benefits of goal-directed behaviour with none of the disadvantages typically associated with target-setting. Social implications: Given that increased efficiency and effectiveness in the management of organisations would be good for society, the authors think the paper has interesting social implications. Originality/value: Leading thinkers in the field of goal-directed behaviour, such as Locke and Latham, and leading critics of target-setting, such as Ordóñez et al. continue to argue with one another - much like, at the turn of the nineteenth century, proponents of the "wave theory of light" and proponents of the "particle theory of light" were similarly at loggerheads. Just as this furious scientific debate was ultimately resolved by Taylor's experiment, showing that light could behave both as a particle and wave at the same time, the authors believe that the paper demonstrates that goal-directed behaviour and target-setting can successfully co-exist. © Emerald Group Publishing Limited.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a new approach based on Discriminant Analysis to map a high dimensional image feature space onto a subspace which has the following advantages: 1. each dimension corresponds to a semantic likelihood, 2. an efficient and simple multiclass classifier is proposed and 3. it is low dimensional. This mapping is learnt from a given set of labeled images with a class groundtruth. In the new space a classifier is naturally derived which performs as well as a linear SVM. We will show that projecting images in this new space provides a database browsing tool which is meaningful to the user. Results are presented on a remote sensing database with eight classes, made available online. The output semantic space is a low dimensional feature space which opens perspectives for other recognition tasks. © 2005 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a novel model for the spatio-temporal clustering of trajectories based on motion, which applies to challenging street-view video sequences of pedestrians captured by a mobile camera. A key contribution of our work is the introduction of novel probabilistic region trajectories, motivated by the non-repeatability of segmentation of frames in a video sequence. Hierarchical image segments are obtained by using a state-of-the-art hierarchical segmentation algorithm, and connected from adjacent frames in a directed acyclic graph. The region trajectories and measures of confidence are extracted from this graph using a dynamic programming-based optimisation. Our second main contribution is a Bayesian framework with a twofold goal: to learn the optimal, in a maximum likelihood sense, Random Forests classifier of motion patterns based on video features, and construct a unique graph from region trajectories of different frames, lengths and hierarchical levels. Finally, we demonstrate the use of Isomap for effective spatio-temporal clustering of the region trajectories of pedestrians. We support our claims with experimental results on new and existing challenging video sequences. © 2011 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a structured SVM framework suitable for noise-robust medium/large vocabulary speech recognition. Several theoretical and practical extensions to previous work on small vocabulary tasks are detailed. The joint feature space based on word models is extended to allow context-dependent triphone models to be used. By interpreting the structured SVM as a large margin log-linear model, illustrates that there is an implicit assumption that the prior of the discriminative parameter is a zero mean Gaussian. However, depending on the definition of likelihood feature space, a non-zero prior may be more appropriate. A general Gaussian prior is incorporated into the large margin training criterion in a form that allows the cutting plan algorithm to be directly applied. To further speed up the training process, 1-slack algorithm, caching competing hypothesis and parallelization strategies are also proposed. The performance of structured SVMs is evaluated on noise corrupted medium vocabulary speech recognition task: AURORA 4. © 2011 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language. © 2012 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Do hospitals experience safety tipping points as utilization increases, and if so, what are the implications for hospital operations management? We argue that safety tipping points occur when managerial escalation policies are exhausted and workload variability buffers are depleted. Front-line clinical staff is forced to ration resources and, at the same time, becomes more error prone as a result of elevated stress hormone levels. We confirm the existence of safety tipping points for in-hospital mortality using the discharge records of 82,280 patients across six high-mortality-risk conditions from 256 clinical departments of 83 German hospitals. Focusing on survival during the first seven days following admission, we estimate a mortality tipping point at an occupancy level of 92.5%. Among the 17% of patients in our sample who experienced occupancy above the tipping point during the first seven days of their hospital stay, high occupancy accounted for one in seven deaths. The existence of a safety tipping point has important implications for hospital management. First, flexible capacity expansion is more cost-effective for safety improvement than rigid capacity, because it will only be used when occupancy reaches the tipping point. In the context of our sample, flexible staffing saves more than 40% of the cost of a fully staffed capacity expansion, while achieving the same reduction in mortality. Second, reducing the variability of demand by pooling capacity in hospital clusters can greatly increase safety in a hospital system, because it reduces the likelihood that a patient will experience occupancy levels beyond the tipping point. Pooling the capacity of nearby hospitals in our sample reduces the number of deaths due to high occupancy by 34%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In standard Gaussian Process regression input locations are assumed to be noise free. We present a simple yet effective GP model for training on input points corrupted by i.i.d. Gaussian noise. To make computations tractable we use a local linear expansion about each input point. This allows the input noise to be recast as output noise proportional to the squared gradient of the GP posterior mean. The input noise variances are inferred from the data as extra hyperparameters. They are trained alongside other hyperparameters by the usual method of maximisation of the marginal likelihood. Training uses an iterative scheme, which alternates between optimising the hyperparameters and calculating the posterior gradient. Analytic predictive moments can then be found for Gaussian distributed test points. We compare our model to others over a range of different regression problems and show that it improves over current methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech recognition systems typically contain many Gaussian distributions, and hence a large number of parameters. This makes them both slow to decode speech, and large to store. Techniques have been proposed to decrease the number of parameters. One approach is to share parameters between multiple Gaussians, thus reducing the total number of parameters and allowing for shared likelihood calculation. Gaussian tying and subspace clustering are two related techniques which take this approach to system compression. These techniques can decrease the number of parameters with no noticeable drop in performance for single systems. However, multiple acoustic models are often used in real speech recognition systems. This paper considers the application of Gaussian tying and subspace compression to multiple systems. Results show that two speech recognition systems can be modelled using the same number of Gaussians as just one system, with little effect on individual system performance. Copyright © 2009 ISCA.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Chapter 14 Understandable by Design: How Can Products be Designed to Align with User Experience? A. Mieczakowski, PM Langdon, RH Bracewell, JJ Patmore and PJ Clarkson 14.1 Introduction Understanding users increases the likelihood that ...

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a multispectral photometric stereo method for capturing geometry of deforming surfaces. A novel photometric calibration technique allows calibration of scenes containing multiple piecewise constant chromaticities. This method estimates per-pixel photometric properties, then uses a RANSAC-based approach to estimate the dominant chromaticities in the scene. A likelihood term is developed linking surface normal, image intensity and photometric properties, which allows estimating the number of chromaticities present in a scene to be framed as a model estimation problem. The Bayesian Information Criterion is applied to automatically estimate the number of chromaticities present during calibration. A two-camera stereo system provides low resolution geometry, allowing the likelihood term to be used in segmenting new images into regions of constant chromaticity. This segmentation is carried out in a Markov Random Field framework and allows the correct photometric properties to be used at each pixel to estimate a dense normal map. Results are shown on several challenging real-world sequences, demonstrating state-of-the-art results using only two cameras and three light sources. Quantitative evaluation is provided against synthetic ground truth data. © 2011 IEEE.