982 resultados para Noise-vocoded Speech


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The authors examined whether background noise can be habituated to in the laboratory by using memory for prose tasks in 3 experiments. Experiment 1 showed that background speech can be habituated to after 20 min exposure and that meaning and repetition had no effect on the degree of habituation seen. Experiment 2 showed that office noise without speech can also be habituated to. Finally, Experiment 3 showed that a 5-min period of quiet, but not a change in voice, was sufficient to partially restore the disruptive effects of the background noise previously habituated to. These results are interpreted in light of current theories regarding the effects of background noise and habituation; practical implications for office planning are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Three experiments measured constancy in speech perception, using natural-speech messages or noise-band vocoder versions of them. The eight vocoder-bands had equally log-spaced center-frequencies and the shapes of corresponding “auditory” filters. Consequently, the bands had the temporal envelopes that arise in these auditory filters when the speech is played. The “sir” or “stir” test-words were distinguished by degrees of amplitude modulation, and played in the context; “next you’ll get _ to click on.” Listeners identified test-words appropriately, even in the vocoder conditions where the speech had a “noise-like” quality. Constancy was assessed by comparing the identification of test-words with low or high levels of room reflections across conditions where the context had either a low or a high level of reflections. Constancy was obtained with both the natural and the vocoded speech, indicating that the effect arises through temporal-envelope processing. Two further experiments assessed perceptual weighting of the different bands, both in the test word and in the context. The resulting weighting functions both increase monotonically with frequency, following the spectral characteristics of the test-word’s [s]. It is suggested that these two weighting functions are similar because they both come about through the perceptual grouping of the test-word’s bands.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

INTRODUCTION The Rondo is a single-unit cochlear implant (CI) audio processor comprising the identical components as its behind-the-ear predecessor, the Opus 2. An interchange of the Opus 2 with the Rondo leads to a shift of the microphone position toward the back of the head. This study aimed to investigate the influence of the Rondo wearing position on speech intelligibility in noise. METHODS Speech intelligibility in noise was measured in 4 spatial configurations with 12 experienced CI users using the German adaptive Oldenburg sentence test. A physical model and a numerical model were used to enable a comparison of the observations. RESULTS No statistically significant differences of the speech intelligibility were found in the situations in which the signal came from the front and the noise came from the frontal, ipsilateral, or contralateral side. The signal-to-noise ratio (SNR) was significantly better with the Opus 2 in the case with the noise presented from the back (4.4 dB, p < 0.001). The differences in the SNR were significantly worse with the Rondo processors placed further behind the ear than closer to the ear. CONCLUSION The study indicates that CI users with the receiver/stimulator implanted in positions further behind the ear are expected to have higher difficulties in noisy situations when wearing the single-unit audio processor.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but such approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks on the other hand, optimise the parameters of speech enhancement algorithms based on state sequences generated by a speech recogniser for utterances of known transcriptions. Previous applications of LIMA frameworks have generated a set of global enhancement parameters for all model states without taking in account the distribution of model occurrence, making optimisation susceptible to favouring frequently occurring models, in particular silence. In this paper, we demonstrate the existence of highly disproportionate phonetic distributions on two corpora with distinct speech tasks, and propose to normalise the influence of each phone based on a priori occurrence probabilities. Likelihood analysis and speech recognition experiments verify this approach for improving ASR performance in noisy environments.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Model based compensation schemes are a powerful approach for noise robust speech recognition. Recently there have been a number of investigations into adaptive training, and estimating the noise models used for model adaptation. This paper examines the use of EM-based schemes for both canonical models and noise estimation, including discriminative adaptive training. One issue that arises when estimating the noise model is a mismatch between the noise estimation approximation and final model compensation scheme. This paper proposes FA-style compensation where this mismatch is eliminated, though at the expense of a sensitivity to the initial noise estimates. EM-based discriminative adaptive training is evaluated on in-car and Aurora4 tasks. FA-style compensation is then evaluated in an incremental mode on the in-car task. © 2011 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Model-based approaches to handling additive background noise and channel distortion, such as Vector Taylor Series (VTS), have been intensively studied and extended in a number of ways. In previous work, VTS has been extended to handle both reverberant and background noise, yielding the Reverberant VTS (RVTS) scheme. In this work, rather than assuming the observation vector is generated by the reverberation of a sequence of background noise corrupted speech vectors, as in RVTS, the observation vector is modelled as a superposition of the background noise and the reverberation of clean speech. This yields a new compensation scheme RVTS Joint (RVTSJ), which allows an easy formulation for joint estimation of both additive and reverberation noise parameters. These two compensation schemes were evaluated and compared on a simulated reverberant noise corrupted AURORA4 task. Both yielded large gains over VTS baseline system, with RVTSJ outperforming the previous RVTS scheme. © 2011 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Interference by siren background-noise with speech transmitted from radio equipment (3) of an emergency-service vehicle, is reduced by apparatus (1) that subtracts (43) an estimate nk of the correlated siren-noise component from the contaminated signal yk supplied by the cab-microphone (2). The estimate nk is computed by FIR (finite impulse response) filtering of a siren-reference signal xk supplied by a unit (4) from one or more microphones located on or near the siren, or from the electric waveform driving the siren. The filter-coefficients wk are adjusted according to an LMS (least mean square) adaptive algorithm that is applied to the correlated-noise component ek of the fed-back noise-reduced signal, so as to bring about iterative cancellation with close frequency-tracking, of the siren noise.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Speech perception runs smoothly and automatically when there is silence in the background, but when the speech signal is degraded by background noise or by reverberation, effortful cognitive processing is needed to compensate for the signal distortion. Previous research has typically investigated the effects of signal-to-noise ratio (SNR) and reverberation time in isolation, whilst few have looked at their interaction. In this study, we probed how reverberation time and SNR influence recall of words presented in participants' first- (L1) and second-language (L2). A total of 72 children (10 years old) participated in this study. The to-be-recalled wordlists were played back with two different reverberation times (0.3 and 1.2 s) crossed with two different SNRs (+3 dBA and +12 dBA). Children recalled fewer words when the spoken words were presented in L2 in comparison with recall of spoken words presented in L1. Words that were presented with a high SNR (+12 dBA) improved recall compared to a low SNR (+3 dBA). Reverberation time interacted with SNR to the effect that at +12 dB the shorter reverberation time improved recall, but at +3 dB it impaired recall. The effects of the physical sound variables (SNR and reverberation time) did not interact with language. © 2016 Hurtig, Keus van de Poll, Pekkola, Hygge, Ljung and Sörqvist.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Vector Taylor Series (VTS) model based compensation is a powerful approach for noise robust speech recognition. An important extension to this approach is VTS adaptive training (VAT), which allows canonical models to be estimated on diverse noise-degraded training data. These canonical model can be estimated using EM-based approaches, allowing simple extensions to discriminative VAT (DVAT). However to ensure a diagonal corrupted speech covariance matrix the Jacobian (loading matrix) relating the noise and clean speech is diagonalised. In this work an approach for yielding optimal diagonal loading matrices based on minimising the expected KL-divergence between the diagonal loading matrix and "correct" distributions is proposed. The performance of DVAT using the standard and optimal diagonalisation was evaluated on both in-car collected data and the Aurora4 task. © 2012 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La voz como herramienta de trabajo de los docentes, puede afectarse por su uso prolongado, abuso o conductas de mal uso, que desencadenan limitaciones funcionales de origen laboral. Uno de los síntomas más frecuentes de quienes usan masivamente su voz con fines ocupacionales es la fatiga laríngea (FL), o cansancio vocal por debilitamiento muscular. El presente estudio quasiexperimental longitudinal pre- postest evaluó el efecto que el uso de la voz, analizando variables sociodemográficas, de salud y trabajo, los estilos de vida y los factores de riesgo ocupacionales, pero principalmente el efecto que produce el uso prolongado de la voz sobre las variables físico acústicas después de un día de trabajo, en 99 docentes de una institución de educación superior en Colombia, en comparación con trabajadores con menor uso vocal. Se aplicó un cuestionario de sintomatología vocal para controlar los sesgos, se le tomaron grabaciones pre y post jornada a cada trabajador con el software Speech Analizer® y se reportaron los cambios subjetivos tras un día de trabajo a cada trabajador. Fueron hallados cambios en las variables físico – acústicas como efecto del uso prolongado de la voz después de un día de trabajo en los dos grupos de participantes, en cuyo caso el efecto fue más significativo en los docentes que en los administrativos – no docentes. El riesgo de presentar trastornos de la voz se asoció directamente con la exposición a factores de riesgo ocupacionales y aquellos asociados a condiciones de salud y al estilo de vida de los individuos, cuyas consecuencias fueron mayores para el grupo de docentes; dado que al ser la voz su principal herramienta de trabajo, el uso fue mayor y asimismo la probabilidad de desencadenar sintomatología vocal, derivada de la fatiga laríngea. La variable de fo promedio para la fonación sostenida de la vocal /a/, que representa una sonido neutro en tonalidad o el tono habitual, mostró diferencias significativas entre grupos (p=0,048). Para este caso, el grupo de docentes registró un aumento de la fo en el postest en comparación con un cambio no significativo para el grupo de administrativos luego del uso prolongado de la voz. En consecuencia, hubo diferencias en el valor registrado para la máxima fo (p =0,025), mínima fo (p=0,011) y el rango de fo (p=0,012) en la emisión sostenida de la vocal /a/. Para el caso del grupo de administrativos, las diferencias significativas estuvieron dadas por la disminución de la fo, rango y máxima y mínima frecuencia en las tres vocales (/a/, /i/, /o/) en contraste con lo ocurrido para el grupo de docentes. En la intensidad de la voz fueron encontradas también diferencias significativas entre grupos (p=0,001) con un decrecimiento del volumen en el postest, tanto promedio como mínimo, máximo y rango de la intensidad, en la fonación sostenida de la vocal /a/ para el grupo de docentes; ninguna significancia estadística fue hallada en el grupo de administrativos para estas variables. Se demostró a través de mediciones objetivas y resultados verificables, el fenómeno de la fatiga laríngea, asociados a los efectos que se presentan tras la demanda vocal continua, discriminando el impacto, entre las variables de cargo y género.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pós-graduação em Engenharia Elétrica - FEIS

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Balcony acoustic treatments can mitigate the effects of community road traffic noise. To further investigate, a theoretical study into the effects of balcony acoustic treatment combinations on speech interference and transmission is conducted for various street geometries. Nine different balcony types are investigated using a combined specular and diffuse reflection computer model. Diffusion in the model is calculated using the radiosity technique. The balcony types include a standard balcony with or without a ceiling and with various combinations of parapet, ceiling absorption and ceiling shield. A total of 70 balcony and street geometrical configurations are analyzed with each balcony type, resulting in 630 scenarios. In each scenario the reverberation time, speech interference level (SIL) and speech transmission index (STI) are calculated. These indicators are compared to determine trends based on the effects of propagation path, inclusion of opposite buildings and difference with a reference position outside the balcony. The results demonstrate trends in SIL and STI with different balcony types. It is found that an acoustically treated balcony reduces speech interference. A parapet provides the largest improvement, followed by absorption on the ceiling. The largest reductions in speech interference arise when a combination of balcony acoustic treatments are applied.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Residential balcony design influences speech interference levels caused by road traffic noise and a simplified design methodology is needed for optimising balcony acoustic treatments. This research comprehensively assesses speech interference levels and benefits of nine different balcony designs situated in urban street canyons through the use of a combined direct, specular reflection and diffuse reflection path theoretical model. This thesis outlines the theory, analysis and results that lead up to the presentation of a practical design guide which can be used to predict the acoustic effects of balcony geometry and acoustic treatments in streets with variable geometry and acoustic characteristics.