2 resultados para Automatic speech recognition

em Bucknell University Digital Commons - Pensilvania - USA


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao's method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

WE INVESTIGATED HOW WELL STRUCTURAL FEATURES such as note density or the relative number of changes in the melodic contour could predict success in implicit and explicit memory for unfamiliar melodies. We also analyzed which features are more likely to elicit increasingly confident judgments of "old" in a recognition memory task. An automated analysis program computed structural aspects of melodies, both independent of any context, and also with reference to the other melodies in the testset and the parent corpus of pop music. A few features predicted success in both memory tasks, which points to a shared memory component. However, motivic complexity compared to a large corpus of pop music had different effects on explicit and implicit memory. We also found that just a few features are associated with different rates of "old" judgments, whether the items were old or new. Rarer motives relative to the testset predicted hits and rarer motives relative to the corpus predicted false alarms. This data-driven analysis provides further support for both shared and separable mechanisms in implicit and explicit memory retrieval, as well as the role of distinctiveness in true and false judgments of familiarity.