71 resultados para Speech signals
Resumo:
Perceptual voice analysis is a subjective process. However, despite reports of varying degrees of intrajudge and interjudge reliability, it is widely used in clinical voice evaluation. One of the ways to improve the reliability of this procedure is to provide judges with signals as external standards so that comparison can be made in relation to these anchor signals. The present study used a Klatt speech synthesizer to create a set of speech signals with varying degree of three different voice qualities based on a Cantonese sentence. The primary objective of the study was to determine whether different abnormal voice qualities could be synthesized using the built-in synthesis parameters using a perceptual study. The second objective was to determine the relationship between acoustic characteristics of the synthesized signals and perceptual judgment. Twenty Cantonese-speaking speech pathologists with at least three years of clinical experience in perceptual voice evaluation were asked to undertake two tasks. The first was to decide whether the voice quality of the synthesized signals was normal or not. The second was to decide whether the abnormal signals should be described as rough, breathy, or vocal fry. The results showed that signals generated with a small degree of aspiration noise were perceived as breathiness while signals with a small degree of flutter or double pulsing were perceived as roughness. When the flutter or double pulsing increased further, tremor and vocal fry, rather than roughness, were perceived. Furthermore, the amount of aspiration noise, flutter, or double pulsing required for male voice stimuli was different from that required for the female voice stimuli with a similar level of perceptual breathiness and roughness. These findings showed that changes in perceived vocal quality could be achieved by systematic modifications of synthesis parameters. This opens up the possibility of using synthesized voice signals as external standards or anchors to improve the reliability of clinical perceptual voice evaluation. (C) 2002 Acoustical Society of America.
Resumo:
In recent years, acoustic perturbation measurement has gained clinical and research popularity due to the ease of availability of commercial acoustic analysing software packages in the market. However, because the measurement itself depends critically on the accuracy of frequency tracking from the voice signal, researchers argue that perturbation measures are not suitable for analysing dysphonic voice samples, which are aperiodic in nature. This study compares the fundamental frequency, relative amplitude perturbation, shimmer percent and noise-to-harmonic ratio between a group of dysphonic and non-dysphonic subjects. One hundred and twelve dysphonic subjects ( 93 females and 19 males) and 41 non-dysphonic subjects ( 35 females and 6 males) participated in the study. All the 153 voice samples were categorized into type I ( periodic or nearly periodic), type II ( signals with subharmonic frequencies that approach the fundamental frequency) and type III ( aperiodic) signals. Only the type I ( periodic and nearly periodic) voice signals were acoustically analysed for perturbation measures. Results revealed that the dysphonic female group presented significantly lower fundamental frequency, significantly higher relative amplitude perturbation and shimmer percent values than the non-dysphonic female group. However, none of these three perturbation measures were able to differentiate between male dysphonic and male non-dysphonic subjects. The noise-to-harmonic ratio failed to differentiate between the dysphonic and non-dysphonic voices for both gender groups. These results question the sensitivity of acoustic perturbation measures in detecting dysphonia and suggest that contemporary acoustic perturbation measures are not suitable for analysing dysphonic voice signals, which are even nearly periodic. Copyright (C) 2005 S. Karger AG, Basel.
Resumo:
The primary objective of this study was to assess the lingual kinematic strategies used by younger and older adults to increase rate of speech. It was hypothesised that the strategies used by the older adults would differ from the young adults either as a direct result of, or in response to a need to compensate for, age-related changes in the tongue. Electromagnetic articulography was used to examine the tongue movements of eight young (M526.7 years) and eight older (M567.1 years) females during repetitions of /ta/ and /ka/ at a controlled moderate rate and then as fast as possible. The younger and older adults were found to significantly reduce consonant durations and increase syllable repetition rate by similar proportions. To achieve these reduced durations both groups appeared to use the same strategy, that of reducing the distances travelled by the tongue. Further comparisons at each rate, however, suggested a speed-accuracy trade-off and increased speech monitoring in the older adults. The results may assist in differentiating articulatory changes associated with normal aging from pathological changes found in disorders that affect the older population.
Resumo:
Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1–2 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition
Resumo:
The purpose of this study was to explore the potential advantages, both theoretical and applied, of preserving low-frequency acoustic hearing in cochlear implant patients. Several hypotheses are presented that predict that residual low-frequency acoustic hearing along with electric stimulation for high frequencies will provide an advantage over traditional long-electrode cochlear implants for the recognition of speech in competing backgrounds. A simulation experiment in normal-hearing subjects demonstrated a clear advantage for preserving low-frequency residual acoustic hearing for speech recognition in a background of other talkers, but not in steady noise. Three subjects with an implanted "short-electrode" cochlear implant and preserved low-frequency acoustic hearing were also tested on speech recognition in the same competing backgrounds and compared to a larger group of traditional cochlear implant users. Each of the three short-electrode subjects performed better than any of the traditional long-electrode implant subjects for speech recognition in a background of other talkers, but not in steady noise, in general agreement with the simulation studies. When compared to a subgroup of traditional implant users matched according to speech recognition ability in quiet, the short-electrode patients showed a 9-dB advantage in the multitalker background. These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients. The results are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.
Resumo:
The differences in spectral shape resolution abilities among cochlear implant ~CI! listeners, and between CI and normal-hearing ~NH! listeners, when listening with the same number of channels ~12!, was investigated. In addition, the effect of the number of channels on spectral shape resolution was examined. The stimuli were rippled noise signals with various ripple frequency-spacings. An adaptive 4IFC procedure was used to determine the threshold for resolvable ripple spacing, which was the spacing at which an interchange in peak and valley positions could be discriminated. The results showed poorer spectral shape resolution in CI compared to NH listeners ~average thresholds of approximately 3000 and 400 Hz, respectively!, and wide variability among CI listeners ~range of approximately 800 to 8000 Hz!. There was a significant relationship between spectral shape resolution and vowel recognition. The spectral shape resolution thresholds of NH listeners increased as the number of channels increased from 1 to 16, while the CI listeners showed a performance plateau at 4–6 channels, which is consistent with previous results using speech recognition measures. These results indicate that this test may provide a measure of CI performance which is time efficient and non-linguistic, and therefore, if verified, may provide a useful contribution to the prediction of speech perception in adults and children who use CIs.
Resumo:
The purpose of the present study was to examine the benefits of providing audible speech to listeners with sensorineural hearing loss when the speech is presented in a background noise. Previous studies have shown that when listeners have a severe hearing loss in the higher frequencies, providing audible speech (in a quiet background) to these higher frequencies usually results in no improvement in speech recognition. In the present experiments, speech was presented in a background of multitalker babble to listeners with various severities of hearing loss. The signal was low-pass filtered at numerous cutoff frequencies and speech recognition was measured as additional high-frequency speech information was provided to the hearing-impaired listeners. It was found in all cases, regardless of hearing loss or frequency range, that providing audible speech resulted in an increase in recognition score. The change in recognition as the cutoff frequency was increased, along with the amount of audible speech information in each condition (articulation index), was used to calculate the "efficiency" of providing audible speech. Efficiencies were positive for all degrees of hearing loss. However, the gains in recognition were small, and the maximum score obtained by an listener was low, due to the noise background. An analysis of error patterns showed that due to the limited speech audibility in a noise background, even severely impaired listeners used additional speech audibility in the high frequencies to improve their perception of the "easier" features of speech including voicing
Resumo:
A narrow absorption feature in an atomic or molecular gas (such as iodine or methane) is used as the frequency reference in many stabilized lasers. As part of the stabilization scheme an optical frequency dither is applied to the laser. In optical heterodyne experiments, this dither is transferred to the RF beat signal, reducing the spectral power density and hence the signal to noise ratio over that in the absence of dither. We removed the dither by mixing the raw beat signal with a dithered local oscillator signal. When the dither waveform is matched to that of the reference laser the output signal from the mixer is rendered dither free. Application of this method to a Winters iodine-stabilized helium-neon laser reduced the bandwidth of the beat signal from 6 MHz to 390 kHz, thereby lowering the detection threshold from 5 pW of laser power to 3 pW. In addition, a simple signal detection model is developed which predicts similar threshold reductions.
Resumo:
The purpose of this paper is to provide a cross-linguistic survey of the variation of coding strategies that are available for the grammatical distinction between direct and indirect speech representation with a particular focus on the expression of indirect reported speech. Cross-linguistic data from a sample of 42 languages will be provided to illustrate the range of available grammatical coding strategies.
Resumo:
Parkinson's disease (PD) is a neurodegenerative movement disorder primarily due to basal ganglia dysfunction. While much research has been conducted on Parkinsonian deficits in the traditional arena of musculoskeletal limb movement, research in other functional motor tasks is lacking. The present study examined articulation in PD with increasingly complex sequences of articulatory movement. Of interest was whether dysfunction would affect articulation in the same manner as in limb-movement impairment. In particular, since very Similar (homogeneous) articulatory sequences (the tongue twister effect) are more difficult for healthy individuals to achieve than dissimilar (heterogeneous) gestures, while the reverse may apply for skeletal movements in PD, we asked which factor would dominate when PD patients articulated various grades of artificial tongue twisters: the influence of disease or a possible difference between the two motor systems. Execution was especially impaired when articulation involved a sequence of motor program heterogeneous in terms of place of articulation. The results are suggestive of a hypokinesic tendency in complex sequential articulatory movement as in limb movement. It appears that PD patients do show abnormalities in articulatory movement which are similar to those of the musculoskeletal system. The present study suggests that an underlying disease effect modulates movement impairment across different functional motor systems. (C) 1998 Academic Press.
Resumo:
The present fundamental knowledge of fluid turbulence has been established primarily from hot- and cold-wire measurements. Unfortunately, however, these measurements necessarily suffer from contamination by noise since no certain method has previously been available to optimally filter noise from the measured signals. This limitation has impeded our progress of understanding turbulence profoundly. We address this limitation by presenting a simple, fast-convergent iterative scheme to digitally filter signals optimally and find Kolmogorov scales definitely. The great efficacy of the scheme is demonstrated by its application to the instantaneous velocity measured in a turbulent jet.