906 resultados para Speech in Noise
Resumo:
This dissertation consists of four articles and an introduction. The five parts address the same topic, nonverbal predication in Erzya, from different perspectives. The work is at the same time linguistic typology and Uralic studies. The findings based on a large corpus of empirical Erzya data, which was collected using several different methods and included recordings of the spoken language, made it possible for the present study to apply, then test and finally discuss the previous theories based on cross-linguistic data. Erzya makes use of multiple predication patterns which vary from totally analytic to the morphologically very complex. Nonverbal predicate clause types are classified on the basis of propositional acts in clauses denoting class-membership, identity, property and location. The predicates of these clauses are nouns, adjectives and locational expressions, respectively. The following three predication strategies in Erzya nonverbal predication can be identified: i. the zero-copula construction, ii. the predicative suffix construction and iii. the copula construction. It has been suggested that verbs and nouns cannot be clearly distinguished on morphological grounds when functioning as predicates in Erzya. This study shows that even though predicativity must not be considered a sufficient tool for defining parts of speech in any language, the Erzya lexical classes of adjective, noun and verb can be distinguished from each other also in predicate position. The relative frequency and degree of obligation for using the predicative suffix construction decreases when moving left to right on the scale verb adjective/locative noun ( identificational statement). The predicative suffix is the main pattern in the present tense over the whole domain of nonverbal predication in Standard Erzya, but if it is replaced it is most likely to be with a zero-copula construction in a nominal predication. This study exploits the theory of (a)symmetry for the first time in order to describe verbal vs. nonverbal predication. It is shown that the asymmetry of paradigms and constructions differentiates the lexical classes. Asymmetrical structures are motivated by functional level asymmetry. Variation in predication as such adds to the complexity of the grammar. When symmetric structures are employed, the functional complexity of grammar decreases, even though morphological complexity increases. The genre affects the employment of predication strategies in Erzya. There are differences in the relative frequency of the patterns, and some patterns are totally lacking from some of the data. The clearest difference is that the past tense predicative suffix construction occurs relatively frequently in Standard Erzya, while it occurs infrequently in the other data. Also, the predicative suffixes of the present tense are used more regularly in written Standard Erzya than in any other genre. The genre also affects the incidence of the translative in uľ(ń)ems copula constructions. In translations from Russian to Erzya the translative case is employed relatively frequently in comparison to other data. This study reveals differences between the two Mordvinic languages Erzya and Moksha. The predicative suffixes (bound person markers) of the present tense are used more regularly in Moksha in all kinds of nonverbal predicate clauses compared to Erzya. It should further be observed that identificational statements are encoded with a predicative suffix in Moksha, but seldom in Erzya. Erzya clauses are more frequently encoded using zero-constructions, displaying agreement in number only.
Resumo:
It has been shown that the conventional practice of designing a compensated hot wire amplifier with a fixed ceiling to floor ratio results in considerable and unnecessary increase in noise level at compensation settings other than optimum (which is at the maximum compensation at the highest frequency of interest). The optimum ceiling to floor ratio has been estimated to be between 1.5-2.0 ωmaxM. Application of the above considerations to an amplifier in which the ceiling to floor ratio is optimized at each compensation setting (for a given amplifier band-width), shows the usefulness of the method in improving the signal to noise ratio.
Resumo:
Traditional subspace based speech enhancement (SSE)methods use linear minimum mean square error (LMMSE) estimation that is optimal if the Karhunen Loeve transform (KLT) coefficients of speech and noise are Gaussian distributed. In this paper, we investigate the use of Gaussian mixture (GM) density for modeling the non-Gaussian statistics of the clean speech KLT coefficients. Using Gaussian mixture model (GMM), the optimum minimum mean square error (MMSE) estimator is found to be nonlinear and the traditional LMMSE estimator is shown to be a special case. Experimental results show that the proposed method provides better enhancement performance than the traditional subspace based methods.Index Terms: Subspace based speech enhancement, Gaussian mixture density, MMSE estimation.
Resumo:
Effective feature extraction for robust speech recognition is a widely addressed topic and currently there is much effort to invoke non-stationary signal models instead of quasi-stationary signal models leading to standard features such as LPC or MFCC. Joint amplitude modulation and frequency modulation (AM-FM) is a classical non-parametric approach to non-stationary signal modeling and recently new feature sets for automatic speech recognition (ASR) have been derived based on a multi-band AM-FM representation of the signal. We consider several of these representations and compare their performances for robust speech recognition in noise, using the AURORA-2 database. We show that FEPSTRUM representation proposed is more effective than others. We also propose an improvement to FEPSTRUM based on the Teager energy operator (TEO) and show that it can selectively outperform even FEPSTRUM
Resumo:
In this paper we propose a postprocessing technique for a spectrogram diffusion based harmonic/percussion decom- position algorithm. The proposed technique removes har- monic instrument leakages in the percussion enhanced out- puts of the baseline algorithm. The technique uses median filtering and an adaptive detection of percussive segments in subbands followed by piecewise signal reconstruction using envelope properties to ensure that percussion is enhanced while harmonic leakages are suppressed. A new binary mask is created for the percussion signal which upon applying on the original signal improves harmonic versus percussion separation. We compare our algorithm with two recent techniques and show that on a database of polyphonic Indian music, the postprocessing algorithm improves the harmonic versus percussion decomposition significantly.
Resumo:
In this paper, we propose a new sub-band approach to estimate the glottal activity. The method is based on the spectral harmonicity and the sub-band temporal properties of voiced speech. We propose a method to represent glottal excitation signal using sub-band temporal envelope. Instants of maximum glottal excitation or Glottal Closure Instants (GCI) are extracted from the estimated glottal excitation pattern and the result is compared with a standard GCI computation method, DYPSA [1]. The performance of the algorithm is also compared for the noisy signal and it is shown that the proposed method is less variant to GCI estimation under noisy conditions compared to DYPSA. The algorithm is evaluated on the CMU-ARCTIC database.
Resumo:
We present the study of low-frequency noise, or 1/f noise, in degenerately doped Si: P and Ge: P delta-layers at low temperatures. For the Si: P d-layers we find that the noise is several orders of magnitude lower than that of bulk Si: P systems in the metallic regime and is one of the lowest values reported for doped semiconductors. Ge: P d-layers as a function of perpendicular magnetic field, shows a factor of two reduction in noise magnitude at the scale of B-phi, where B-phi is phase breaking field. We show that this is a characteristic feature of universal conductance fluctuations.
Resumo:
The authors report a detailed investigation of the flicker noise (1/f noise) in graphene films obtained from chemical vapour deposition (CVD) and chemical reduction of graphene oxide. The authors find that in the case of polycrystalline graphene films grown by CVD, the grain boundaries and other structural defects are the dominant source of noise by acting as charged trap centres resulting in huge increase in noise as compared with that of exfoliated graphene. A study of the kinetics of defects in hydrazine-reduced graphene oxide (RGO) films as a function of the extent of reduction showed that for longer hydrazine treatment time strong localised crystal defects are introduced in RGO, whereas the RGO with shorter hydrazine treatment showed the presence of large number of mobile defects leading to higher noise amplitude.
Resumo:
Speech enhancement in stationary noise is addressed using the ideal channel selection framework. In order to estimate the binary mask, we propose to classify each time-frequency (T-F) bin of the noisy signal as speech or noise using Discriminative Random Fields (DRF). The DRF function contains two terms - an enhancement function and a smoothing term. On each T-F bin, we propose to use an enhancement function based on likelihood ratio test for speech presence, while Ising model is used as smoothing function for spectro-temporal continuity in the estimated binary mask. The effect of the smoothing function over successive iterations is found to reduce musical noise as opposed to using only enhancement function. The binary mask is inferred from the noisy signal using Iterated Conditional Modes (ICM) algorithm. Sentences from NOIZEUS corpus are evaluated from 0 dB to 15 dB Signal to Noise Ratio (SNR) in 4 kinds of additive noise settings: additive white Gaussian noise, car noise, street noise and pink noise. The reconstructed speech using the proposed technique is evaluated in terms of average segmental SNR, Perceptual Evaluation of Speech Quality (PESQ) and Mean opinion Score (MOS).
Resumo:
We wished to replicate evidence that an experimental paradigm of speech illusions is associated with psychotic experiences. Fifty-four patients with a first episode of psychosis (FEP) and 150 healthy subjects were examined in an experimental paradigm assessing the presence of speech illusion in neutral white noise. Socio-demographic, cognitive function and family history data were collected. The Positive and Negative Syndrome Scale (PANSS) was administered in the patient group and the Structured Interview for Schizotypy-Revised (SIS-R), and the Community Assessment of Psychic Experiences (CAPE) in the control group. Patients had a much higher rate of speech illusions (33.3% versus 8.7%, ORadjusted: 5.1, 95% CI: 2.3-11.5), which was only partly explained by differences in IQ (ORadjusted: 3.4, 95% CI: 1.4-8.3). Differences were particularly marked for signals in random noise that were perceived as affectively salient (ORadjusted: 9.7, 95% CI: 1.8-53.9). Speech illusion tended to be associated with positive symptoms in patients (ORadjusted: 3.3, 95% CI: 0.9-11.6), particularly affectively salient illusions (ORadjusted: 8.3, 95% CI: 0.7-100.3). In controls, speech illusions were not associated with positive schizotypy (ORadjusted: 1.1, 95% CI: 0.3-3.4) or self-reported psychotic experiences (ORadjusted: 1.4, 95% CI: 0.4-4.6). Experimental paradigms indexing the tendency to detect affectively salient signals in noise may be used to identify liability to psychosis.
Resumo:
For many realistic scenarios, there are multiple factors that affect the clean speech signal. In this work approaches to handling two such factors, speaker and background noise differences, simultaneously are described. A new adaptation scheme is proposed. Here the acoustic models are first adapted to the target speaker via an MLLR transform. This is followed by adaptation to the target noise environment via model-based vector Taylor series (VTS) compensation. These speaker and noise transforms are jointly estimated, using maximum likelihood. Experiments on the AURORA4 task demonstrate that this adaptation scheme provides improved performance over VTS-based noise adaptation. In addition, this framework enables the speech and noise to be factorised, allowing the speaker transform estimated in one noise condition to be successfully used in a different noise condition. © 2011 IEEE.
Resumo:
While cochlear implants (CIs) usually provide high levels of speech recognition in quiet, speech recognition in noise remains challenging. To overcome these difficulties, it is important to understand how implanted listeners separate a target signal from interferers. Stream segregation has been studied extensively in both normal and electric hearing, as a function of place of stimulation. However, the effects of pulse rate, independent of place, on the perceptual grouping of sequential sounds in electric hearing have not yet been investigated. A rhythm detection task was used to measure stream segregation. The results of this study suggest that while CI listeners can segregate streams based on differences in pulse rate alone, the amount of stream segregation observed decreases as the base pulse rate increases. Further investigation of the perceptual dimensions encoded by the pulse rate and the effect of sequential presentation of different stimulation rates on perception could be beneficial for the future development of speech processing strategies for CIs.
Resumo:
A modified comb filtering technique is proposed which can be used to reduce framing noise generated when speech signals are transform-coded or vector-quantized. Application of this filter to 9. 6 kbit/s speech in a vector transform coder has been found to improve the perceptual quality of the coded speech.