Biblioteca Digital

991 resultados para Raphael, 1483-1520

Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model. © 2011 IEEE.

Investigation of acoustic units for LVCSR systems

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One important issue in designing state-of-the-art LVCSR systems is the choice of acoustic units. Context dependent (CD) phones remain the dominant form of acoustic units. They can capture the co-articulatory effect in speech via explicit modelling. However, for other more complicated phonological processes, they rely on the implicit modelling ability of the underlying statistical models. Alternatively, it is possible to construct acoustic models based on higher level linguistic units, for example, syllables, to explicitly capture these complex patterns. When sufficient training data is available, this approach may show an advantage over implicit acoustic modelling. In this paper a wide range of acoustic units are investigated to improve LVCSR system performance. Significant error rate gains up to 7.1% relative (0.8% abs.) were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using word and syllable position dependent triphone and quinphone models. © 2011 IEEE.

Rapid joint speaker and noise compensation for robust speech recognition

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For speech recognition, mismatches between training and testing for speaker and noise are normally handled separately. The work presented in this paper aims at jointly applying speaker adaptation and model-based noise compensation by embedding speaker adaptation as part of the noise mismatch function. The proposed method gives a faster and more optimum adaptation compared to compensating for these two factors separately. It is also more consistent with respect to the basic assumptions of speaker and noise adaptation. Experimental results show significant and consistent gains from the proposed method. © 2011 IEEE.

Joint modelling of voicing label and continuous F0 for HMM based speech synthesis

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fundamental frequency, or F0 is critical for high quality speech synthesis in HMM based speech synthesis. Traditionally, F0 values are considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. Multi-space distribution HMM (MSDHMM) has been used for modelling the discontinuous F0. Recently, a continuous F0 modelling framework has been proposed and shown to be effective, where continuous F0 observations are assumed to always exist and voicing labels are explicitly modelled by an independent stream. In this paper, a refined continuous F0 modelling approach is proposed. Here, F0 values are assumed to be dependent on voicing labels and both are jointly modelled in a single stream. Due to the enforced dependency, the new method can effectively reduce the voicing classification error. Subjective listening tests also demonstrate that the new approach can yield significant improvements on the naturalness of the synthesised speech. A dynamic random unvoiced F0 generation method is also investigated. Experiments show that it has significant effect on the quality of synthesised speech. © 2011 IEEE.

Constrained discriminative mapping transforms for unsupervised speaker adaptation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Discriminative mapping transforms (DMTs) is an approach to robustly adding discriminative training to unsupervised linear adaptation transforms. In unsupervised adaptation DMTs are more robust to unreliable transcriptions than directly estimating adaptation transforms in a discriminative fashion. They were previously proposed for use with MLLR transforms with the associated need to explicitly transform the model parameters. In this work the DMT is extended to CMLLR transforms. As these operate in the feature space, it is only necessary to apply a different linear transform at the front-end rather than modifying the model parameters. This is useful for rapidly changing speakers/environments. The performance of DMTs with CMLLR was evaluated on the WSJ 20k task. Experimental results show that DMTs based on constrained linear transforms yield 3% to 6% relative gain over MLE transforms in unsupervised speaker adaptation. © 2011 IEEE.

Structured discriminative models for noise robust continuous speech recognition

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have been combined with features derived from generative models for noise-robust speech recognition for continuous digits. This paper extends this work to medium to large vocabulary tasks. The form of the score-space extracted using the generative models, and parameter tying of the discriminative model, are both discussed. Update formulae for both conditional maximum likelihood and minimum Bayes' risk training are described. Experimental results are presented on small and medium to large vocabulary noise-corrupted speech recognition tasks: AURORA 2 and 4. © 2011 IEEE.

Decision tree-based context clustering based on cross validation and hierarchical priors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The standard, ad-hoc stopping criteria used in decision tree-based context clustering are known to be sub-optimal and require parameters to be tuned. This paper proposes a new approach for decision tree-based context clustering based on cross validation and hierarchical priors. Combination of cross validation and hierarchical priors within decision tree-based context clustering offers better model selection and more robust parameter estimation than conventional approaches, with no tuning parameters. Experimental results on HMM-based speech synthesis show that the proposed approach achieved significant improvements in naturalness of synthesized speech over the conventional approaches. © 2011 IEEE.

Enhanced poisson sum representation for alpha-stable processes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present Poisson sum series representations for α-stable (αS) random variables and a-stable processes, in particular concentrating on continuous-time autoregressive (CAR) models driven by α-stable Lévy processes. Our representations aim to provide a conditionally Gaussian framework, which will allow parameter estimation using Rao-Blackwellised versions of state of the art Bayesian computational methods such as particle filters and Markov chain Monte Carlo (MCMC). To overcome the issues due to truncation of the series, novel residual approximations are developed. Simulations demonstrate the potential of these Poisson sum representations for inference in otherwise intractable α-stable models. © 2011 IEEE.

Structured precision modelling with Cholesky basis superposition for speech recognition

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Structured precision modelling is an important approach to improve the intra-frame correlation modelling of the standard HMM, where Gaussian mixture model with diagonal covariance are used. Previous work has all been focused on direct structured representation of the precision matrices. In this paper, a new framework is proposed, where the structure of the Cholesky square root of the precision matrix is investigated, referred to as Cholesky Basis Superposition (CBS). Each Cholesky matrix associated with a particular Gaussian distribution is represented as a linear combination of a set of Gaussian independent basis upper-triangular matrices. Efficient optimization methods are derived for both combination weights and basis matrices. Experiments on a Chinese dictation task showed that the proposed approach can significantly outperformed the direct structured precision modelling with similar number of parameters as well as full covariance modelling. © 2011 IEEE.

Exploring Indonesian aquaculture futures

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Aquaculture is the fastest-growing food production sector globally, with production projected to double within the next 15–20 years. Future growth of aquaculture is essential to providing sustainable supplies of fish in national, regional and global fish food systems; creating jobs; and maintaining fish at affordable levels for resource-poor consumers. To ensure that the anticipated growth of aquaculture remains both economically and ecologically sustainable, we need to better understand the likely patterns of growth, as well as the opportunities and challenges, that these trends present. This knowledge will enable us to better prioritize investments that will help ensure the sustainable development of the sector. In Indonesia, WorldFish and partners have applied a unique methodology to evaluate growth trajectories for aquaculture under various scenarios, as well as the opportunities and challenges these represent. Indonesia is currently the fourth largest aquaculture producer globally, and the sector needs to grow to meet future fish demand. The study overlapped economic and environmental models with quantitative and participatory approaches to understand the future of aquaculture in Indonesia. Such analyses, while not definitive, have provided new understanding of the future supply and demand for seafood in Indonesia stretching to 2030. The learning from this research provides a foundation for future interventions in Indonesian fish food systems, as well as a suite of methodologies that can be applied more widely for insightful analyses of aquaculture growth trajectories in other countries or regions.

Effects of KI encapsulation in single-walled carbon nanotubes by Raman and optical absorption spectroscopy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The effect of KI encapsulation in narrow (HiPCO) single-walled carbon nanotubes is studied via Raman spectroscopy and optical absorption. The analysis of the data explores the interplay between strain and structural modifications, bond-length changes, charge transfer, and electronic density of states. KI encapsulation appears to be consistent with both charge transfer and strain that shrink both the C-C bonds and the overall nanotube along the axial direction. The charge transfer in larger semiconducting nanotubes is low and comparable with some cases of electrochemical doping, while optical transitions between pairs of singularities of the density of states are quenched for narrow metallic nanotubes. Stronger changes in the density of states occur in some energy ranges and are attributed to polarization van der Waals interactions caused by the ionic encapsulate. Unlike doping with other species, such as atoms and small molecules, encapsulation of inorganic compounds via the molten-phase route provides stable effects due to maximal occupation of the nanotube inner space.

A task and software independent CAE course

Relevância:

10.00% 10.00%

Publicador:

A study of two stochastic search methods for structural control

Relevância:

10.00% 10.00%

Publicador:

Statistical image modelling using interscale phase relationships of complex wavelet coefficients

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A novel method for modelling the statistics of 2D photographic images useful in image restoration is defined. The new method is based on the Dual Tree Complex Wavelet Transform (DT-CWT) but a phase rotation is applied to the coefficients to create complex coefficients whose phase is shift-invariant at multiscale edge and ridge features. This is in addition to the magnitude shift invariance achieved by the DT-CWT. The increased correlation between coefficients adjacent in space and scale provides an improved mechanism for signal estimation. © 2006 IEEE.

猕猴（M．mulatta）输精管安置铜节育器后的不育观察

Relevância:

10.00% 10.00%

Publicador:

Resumo:

为探讨铜串珠节育器在人类临床上的应用前景，选健康有生育史的雌、雄猴６３只，分设实验组（ｎ＝２８：＝１：３）与对照组（ｎ＝２５：＝３～４）进行比较研究。实验组雄猴在距副率４～６公分处的输精管内置入铜串珠节育器。术后２５ｄ按上述比例进行不育性试验观察。结果：第一年不育率７５．７％；第二年、第三年分别为５７．１％、５６．４％。但实验组不育效果在个体之间有很大差异，有四只雄猴（占５７．１％）置器后第三年仍处于不育水平。试验中发现许多问题，有待深入研究探讨。

«
1
2
...
27
28
29
30
31
32
33
...
66
67
»