Biblioteca Digital

115 resultados para Decoding Speech Prosody

A Corpus-Based Approach to Speech Enhancement from Nonstationary Noise

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Temporal dynamics and speaker characteristics are two important features of speech that distinguish speech from noise. In this paper, we propose a method to maximally extract these two features of speech for speech enhancement. We demonstrate that this can reduce the requirement for prior information about the noise, which can be difficult to estimate for fast-varying noise. Given noisy speech, the new approach estimates clean speech by recognizing long segments of the clean speech as whole units. In the recognition, clean speech sentences, taken from a speech corpus, are used as examples. Matching segments are identified between the noisy sentence and the corpus sentences. The estimate is formed by using the longest matching segments found in the corpus sentences. Longer speech segments as whole units contain more distinct dynamics and richer speaker characteristics, and can be identified more accurately from noise than shorter speech segments. Therefore, estimation based on the longest recognized segments increases the noise immunity and hence the estimation accuracy. The new approach consists of a statistical model to represent up to sentence-long temporal dynamics in the corpus speech, and an algorithm to identify the longest matching segments between the noisy sentence and the corpus sentences. The algorithm is made more robust to noise uncertainty by introducing missing-feature based noise compensation into the corpus sentences. Experiments have been conducted on the TIMIT database for speech enhancement from various types of nonstationary noise including song, music, and crosstalk speech. The new approach has shown improved performance over conventional enhancement algorithms in both objective and subjective evaluations.

Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper considers the separation and recognition of overlapped speech sentences assuming single-channel observation. A system based on a combination of several different techniques is proposed. The system uses a missing-feature approach for improving crosstalk/noise robustness, a Wiener filter for speech enhancement, hidden Markov models for speech reconstruction, and speaker-dependent/-independent modeling for speaker and speech recognition. We develop the system on the Speech Separation Challenge database, involving a task of separating and recognizing two mixing sentences without assuming advanced knowledge about the identity of the speakers nor about the signal-to-noise ratio. The paper is an extended version of a previous conference paper submitted for the challenge.

Informational masking in young and elderly listeners for speech masked by simultaneous speech and noise

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Three experiments measured the effects of age on informational masking of speech by competing speech. The experiments were designed to minimize the energetic contributions of the competing speech so that informational masking could be measured with no large corrections for energetic masking. Experiment 1 used a "speech-in-speech-in-noise" design, in which the competing speech was presented in noise at a signal-to-noise ratio (SNR) of -4 dB. This ensured that the noise primarily contributed the energetic masking but the competing speech contributed the informational masking. Equal amounts of informational masking (3 dB) were observed for young and elderly listeners, although less was found for hearing-impaired listeners. Experiment 2 tested a range of SNRs in this design and showed that informational masking increased with SNR up to about an SNR of -4 dB, but decreased thereafter. Experiment 3 further reduced the energetic contribution of the competing speech by filtering it into different frequency bands from the target speech. The elderly listeners again showed approximately the same amount of informational masking (4-5 dB), although some elderly listeners had particular difficulty understanding these stimuli in any condition. On the whole, these results suggest that young and elderly listeners were equally susceptible to informational masking. © 2009 Acoustical Society of America.

EEG decoding of semantic category reveals distributed representations for single concepts

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Achieving a clearer picture of categorial distinctions in the brain is essential for our understanding of the conceptual lexicon, but much more fine-grained investigations are required in order for this evidence to contribute to lexical research. Here we present a collection of advanced data-mining techniques that allows the category of individual concepts to be decoded from single trials of EEG data. Neural activity was recorded while participants silently named images of mammals and tools, and category could be detected in single trials with an accuracy well above chance, both when considering data from single participants, and when group-training across participants. By aggregating across all trials, single concepts could be correctly assigned to their category with an accuracy of 98%. The pattern of classifications made by the algorithm confirmed that the neural patterns identified are due to conceptual category, and not any of a series of processing-related confounds. The time intervals, frequency bands and scalp locations that proved most informative for prediction permit physiological interpretation: the widespread activation shortly after appearance of the stimulus (from 100. ms) is consistent both with accounts of multi-pass processing, and distributed representations of categories. These methods provide an alternative to fMRI for fine-grained, large-scale investigations of the conceptual lexicon. © 2010 Elsevier Inc.

Selecting Corpus-Semantic Models for Neurolinguistic Decoding

Relevância:

20.00% 20.00%

Publicador:

Parallels between machine and brain decoding

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report some existing work, inspired by analogies between human thought and machine computation, showing that the informational state of a digital computer can be decoded in a similar way to brain decoding. We then discuss some proposed work that would leverage this analogy to shed light on the amount of information that may be missed by the technical limitations of current neuroimaging technologies. © 2012 Springer-Verlag.

Decoding semantics across fMRI sessions with different stimulus modalities:a practical MVPA study

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Both embodied and symbolic accounts of conceptual organization would predict partial sharing and partial differentiation between the neural activations seen for concepts activated via different stimulus modalities. But cross-participant and cross-session variability in BOLD activity patterns makes analyses of such patterns with MVPA methods challenging. Here, we examine the effect of cross-modal and individual variation on the machine learning analysis of fMRI data recorded during a word property generation task. We present the same set of living and non-living concepts (land-mammals, or work tools) to a cohort of Japanese participants in two sessions: the first using auditory presentation of spoken words; the second using visual presentation of words written in Japanese characters. Classification accuracies confirmed that these semantic categories could be detected in single trials, with within-session predictive accuracies of 80-90%. However cross-session prediction (learning from auditory-task data to classify data from the written-word-task, or vice versa) suffered from a performance penalty, achieving 65-75% (still individually significant at p « 0.05). We carried out several follow-on analyses to investigate the reason for this shortfall, concluding that distributional differences in neither time nor space alone could account for it. Rather, combined spatio-temporal patterns of activity need to be identified for successful cross-session learning, and this suggests that feature selection strategies could be modified to take advantage of this.

Decoding Word Semantics from Magnetoencephalography Time Series Transformations

Relevância:

20.00% 20.00%

Publicador:

An analysis of the masking of speech by competing speech using self-report data (L)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many of the items in the “Speech, Spatial, and Qualities of Hearing” scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol.43, 85–99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.

Importance of temporal-envelope speech cues in different spectral areas

Relevância:

20.00% 20.00%

Publicador:

Political Liberalism, Free Speech and Public Reason

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, I critically assess John Rawls' repeated claim that the duty of civility is only a moral duty and should not be enforced by law. In the first part of the paper, I examine and reject the view that Rawls' position may be due to the practical difficulties that the legal enforcement of the duty of civility might entail. I thus claim that Rawls' position must be driven by deeper normative reasons grounded in a conception of free speech. In the second part of the paper, I therefore examine various arguments for free speech and critically assess whether they are consistent with Rawls' political liberalism. I first focus on the arguments from truth and self-fulfilment. Both arguments, I argue, rely on comprehensive doctrines and therefore cannot provide a freestanding political justification for free speech. Freedom of speech, I claim, can be justified instead on the basis of Rawls' political conception of the person and of the two moral powers. However, Rawls' wide view of public reason already allows scope for the kind of free speech necessary for the exercise of the two moral powers and therefore cannot explain Rawls' opposition to the legal enforcement of the duty of civility. Such opposition, I claim, can only be explained on the basis of a defence of unconstrained freedom of speech grounded in the ideas of democracy and political legitimacy. Yet, I conclude, while public reason and the duty of civility are essential to political liberalism, unconstrained freedom of speech is not. Rawls and political liberals could therefore renounce unconstrained freedom of speech, and endorse the legal enforcement of the duty of civility, while remaining faithful to political liberalism.

Enabling Complexity-Performance Trade-Offs for Successive Cancellation Decoding of Polar Codes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Polar codes are one of the most recent advancements in coding theory and they have attracted significant interest. While they are provably capacity achieving over various channels, they have seen limited practical applications. Unfortunately, the successive nature of successive cancellation based decoders hinders fine-grained adaptation of the decoding complexity to design constraints and operating conditions. In this paper, we propose a systematic method for enabling complexity-performance trade-offs by constructing polar codes based on an optimization problem which minimizes the complexity under a suitably defined mutual information based performance constraint. Moreover, a low-complexity greedy algorithm is proposed in order to solve the optimization problem efficiently for very large code lengths.

THE APPLICATION OF ARTIFICIAL NEURAL NETWORK TECHNIQUES TO LOW BIT-RATE SPEECH CODING

Relevância:

20.00% 20.00%

Publicador:

IMPROVED POSTFILTERING TECHNIQUE FOR BLOCK QUANTIZATION OF SPEECH

Relevância:

20.00% 20.00%

Publicador:

Insults, free speech and offensiveness

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article examines what is wrong with some expressive acts, ‘insults’. Their putative wrongfulness is distinguished from the causing of indirect harms, aggregated harms, contextual harms, and damaging misrepresentations. The article clarifies what insults are, making use of work by Neu and Austin, and argues that their wrongfulness cannot lie in the hurt that is caused to those at whom such acts are directed. Rather it must lie in what they seek to do, namely to denigrate the other. The causing of offence is at most evidence that an insult has been communicated; it is not independent grounds of proscription or constraint. The victim of an insult may know that she has been insulted but not accept or agree with the insult, and thereby submit to the insulter. Hence insults need not, as Waldron argues they do, occasion dignitary harms. They do not of themselves subvert their victims' equal moral status. The claim that hateful speech endorses inequality should not be conflated with a claim that such speech directly subverts equality.

Thus, ‘wounding words’ should not unduly trouble the liberal defender of free speech either on the grounds of preventing offence or on those of avoiding dignitary harms.

«
1
2
3
4
5
6
7
8
»