158 resultados para speech signals
Resumo:
This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.
Resumo:
A modified comb filtering technique is proposed which can be used to reduce framing noise generated when speech signals are transform-coded or vector-quantized. Application of this filter to 9. 6 kbit/s speech in a vector transform coder has been found to improve the perceptual quality of the coded speech.
Resumo:
In this letter, a standard postnonlinear blind source separation algorithm is proposed, based on the MISEP method, which is widely used in linear and nonlinear independent component analysis. To best suit a wide class of postnonlinear mixtures, we adapt the MISEP method to incorporate a priori information of the mixtures. In particular, a group of three-layered perceptrons and a linear network are used as the unmixing system to separate sources in the postnonlinear mixtures, and another group of three-layered perceptron is used as the auxiliary network. The learning algorithm for the unmixing system is then obtained by maximizing the output entropy of the auxiliary network. The proposed method is applied to postnonlinear blind source separation of both simulation signals and real speech signals, and the experimental results demonstrate its effectiveness and efficiency in comparison with existing methods.
Resumo:
The use of bit-level systolic arrays in the design of a vector quantized transformed subband coding system for speech signals is described. It is shown how the major components of this system can be decomposed into a small number of highly regular building blocks that interface directly to one another. These include circuits for the computation of the discrete cosine transform, the inverse discrete cosine transform, and vector quantization codebook search.
Resumo:
The subjective performance of the G. 722 7-kHz wideband speech-coding recommendation using music signals is described. A number of audible distortions specific to music signals were found to be present in real-time evaluations of the coder. As a result, three modifications are proposed which are found to improve the performance for music signals. These modifications are compatible with the G. 722 system configuration. The results obtained clearly demonstrate the very high coding efficiency of subband ADPCM (adaptive differential pulse-code modulation) with comparison to digitally companding and ADM schemes when applied to music signals.
Resumo:
Taking as a point of departure recent scholarly interest in the geographies of spoken communication, this paper situates the cultivation of a scientific voice in a range of nineteenth-century contexts and locations. An examination of two of the century’s most celebrated science lecturers, Michael Faraday and Thomas Henry Huxley, offers a basis for more general claims about historical relations between science, speech and space. The paper begins with a survey of the ‘ecologies’ of public speaking in which advocates of science sought to carve out an effective niche. It then turns to a reconstruction of the varying and variously interpreted assumptions about authoritative and authentic speech that shaped how the platform performances of Faraday and Huxley were constructed, contested and remediated in print. Particular attention is paid to sometimes clashing ideals of vocal performance and paralinguistic communication. This signals an interest in the performative 2 dimensions of science lectures rather more than their specific cognitive content. In exploring these concerns, the paper argues that ‘finding a scientific voice’ was a fundamentally geographical enterprise driven by attempts to make science resonate with a wider oratorical culture without losing distinctive appeal and special authority
Resumo:
Research on speech and emotion is moving from a period of exploratory research into one where there is a prospect of substantial applications, notably in human-computer interaction. Progress in the area relies heavily on the development of appropriate databases. This paper addresses the issues that need to be considered in developing databases of emotional speech, and shows how the challenge of developing apropriate databases is being addressed in three major recent projects - the Belfast project, the Reading-Leeds project and the CREST-ESP project. From these and other studies the paper draws together the tools and methods that have been developed, addresses the problems that arise and indicates the future directions for the development of emotional speech databases.