Biblioteca Digital

We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique

Veja mais

Technological measures for noise control

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Design Creativity - Definition, Influences, Measures and Methods

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Combination of similarity measures for time series classification using genetic algorithms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Time series classification deals with the problem of classification of data that is multivariate in nature. This means that one or more of the attributes is in the form of a sequence. The notion of similarity or distance, used in time series data, is significant and affects the accuracy, time, and space complexity of the classification algorithm. There exist numerous similarity measures for time series data, but each of them has its own disadvantages. Instead of relying upon a single similarity measure, our aim is to find the near optimal solution to the classification problem by combining different similarity measures. In this work, we use genetic algorithms to combine the similarity measures so as to get the best performance. The weightage given to different similarity measures evolves over a number of generations so as to get the best combination. We test our approach on a number of benchmark time series datasets and present promising results.

Veja mais

Estimation of voice-onset time in continuous speech using temporal measures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes an automatic acoustic-phonetic method for estimating voice-onset time of stops. This method requires neither transcription of the utterance nor training of a classifier. It makes use of the plosion index for the automatic detection of burst onsets of stops. Having detected the burst onset, the onset of the voicing following the burst is detected using the epochal information and a temporal measure named the maximum weighted inner product. For validation, several experiments are carried out on the entire TIMIT database and two of the CMU Arctic corpora. The performance of the proposed method compares well with three state-of-the-art techniques. (C) 2014 Acoustical Society of America

Veja mais

Holistic Measures for Evaluating Prediction Models in Smart Grids

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of prediction models is often based on ``abstract metrics'' that estimate the model's ability to limit residual errors between the observed and predicted values. However, meaningful evaluation and selection of prediction models for end-user domains requires holistic and application-sensitive performance measures. Inspired by energy consumption prediction models used in the emerging ``big data'' domain of Smart Power Grids, we propose a suite of performance measures to rationally compare models along the dimensions of scale independence, reliability, volatility and cost. We include both application independent and dependent measures, the latter parameterized to allow customization by domain experts to fit their scenario. While our measures are generalizable to other domains, we offer an empirical analysis using real energy use data for three Smart Grid applications: planning, customer education and demand response, which are relevant for energy sustainability. Our results underscore the value of the proposed measures to offer a deeper insight into models' behavior and their impact on real applications, which benefit both data mining researchers and practitioners.

Veja mais

Global response sensitivity analysis using probability distance measures and generalization of Sobol's analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study introduces two new alternatives for global response sensitivity analysis based on the application of the L-2-norm and Hellinger's metric for measuring distance between two probabilistic models. Both the procedures are shown to be capable of treating dependent non-Gaussian random variable models for the input variables. The sensitivity indices obtained based on the L2-norm involve second order moments of the response, and, when applied for the case of independent and identically distributed sequence of input random variables, it is shown to be related to the classical Sobol's response sensitivity indices. The analysis based on Hellinger's metric addresses variability across entire range or segments of the response probability density function. The measure is shown to be conceptually a more satisfying alternative to the Kullback-Leibler divergence based analysis which has been reported in the existing literature. Other issues addressed in the study cover Monte Carlo simulation based methods for computing the sensitivity indices and sensitivity analysis with respect to grouped variables. Illustrative examples consist of studies on global sensitivity analysis of natural frequencies of a random multi-degree of freedom system, response of a nonlinear frame, and safety margin associated with a nonlinear performance function. (C) 2015 Elsevier Ltd. All rights reserved.

Veja mais