61 resultados para acoustic speech recognition system
em University of Queensland eSpace - Australia
Resumo:
The purpose of this study was to explore the potential advantages, both theoretical and applied, of preserving low-frequency acoustic hearing in cochlear implant patients. Several hypotheses are presented that predict that residual low-frequency acoustic hearing along with electric stimulation for high frequencies will provide an advantage over traditional long-electrode cochlear implants for the recognition of speech in competing backgrounds. A simulation experiment in normal-hearing subjects demonstrated a clear advantage for preserving low-frequency residual acoustic hearing for speech recognition in a background of other talkers, but not in steady noise. Three subjects with an implanted "short-electrode" cochlear implant and preserved low-frequency acoustic hearing were also tested on speech recognition in the same competing backgrounds and compared to a larger group of traditional cochlear implant users. Each of the three short-electrode subjects performed better than any of the traditional long-electrode implant subjects for speech recognition in a background of other talkers, but not in steady noise, in general agreement with the simulation studies. When compared to a subgroup of traditional implant users matched according to speech recognition ability in quiet, the short-electrode patients showed a 9-dB advantage in the multitalker background. These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients. The results are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.
Resumo:
Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1–2 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition
Resumo:
The purpose of the present study was to examine the benefits of providing audible speech to listeners with sensorineural hearing loss when the speech is presented in a background noise. Previous studies have shown that when listeners have a severe hearing loss in the higher frequencies, providing audible speech (in a quiet background) to these higher frequencies usually results in no improvement in speech recognition. In the present experiments, speech was presented in a background of multitalker babble to listeners with various severities of hearing loss. The signal was low-pass filtered at numerous cutoff frequencies and speech recognition was measured as additional high-frequency speech information was provided to the hearing-impaired listeners. It was found in all cases, regardless of hearing loss or frequency range, that providing audible speech resulted in an increase in recognition score. The change in recognition as the cutoff frequency was increased, along with the amount of audible speech information in each condition (articulation index), was used to calculate the "efficiency" of providing audible speech. Efficiencies were positive for all degrees of hearing loss. However, the gains in recognition were small, and the maximum score obtained by an listener was low, due to the noise background. An analysis of error patterns showed that due to the limited speech audibility in a noise background, even severely impaired listeners used additional speech audibility in the high frequencies to improve their perception of the "easier" features of speech including voicing
Resumo:
I investigated the genetic relationship between male and female components of the mate recognition system and how this relationship influenced the subsequent evolution of the two traits, in a series of replicate populations of interspecific hybrids. Thirty populations of hybrids between Drosophila serrata and Drosophila birchii were established and maintained for 24 generations. At the fifth generation after hybridization, the mating success of hybrid individuals with the D. serrata parent was determined. The genetic correlation between male and female components of the male recognition system, as a consequence of pleiotropy or tight physical linkage, was found to be significant but low (r = 0.388). This result suggested that pleiotropy may play only a minor role in the evolution of mate recognition in this system. At the twenty-fourth generation after hybridization, the mating success of the hybrids was again determined. The evolution of male and female components was investigated by analyzing the direction of evolution of each hybrid line with respect to its initial position in relation to the genetic regression. Male and female components appeared to converge on a single equilibrium point, rather than evolving along trajectories with slope equal to the genetic regression, toward a line of equilibria.
Resumo:
This paper discusses methodological issues in the development of a multitiered, phonetic annotation system, intended to capture pronunciation variation in the speech of second language learners and to serve in construction of a data base for training ASR models to recognize major pronunciation variants in the assessment of accented English.
Resumo:
Compression amplification significantly alters the acoustic speech signal in comparison to linear amplification. The central hypothesis of the present study was that the compression settings of a two-channel aid that best preserved the acoustic properties of speech compared to linear amplification would yield the best perceptual results, and that the compression settings that most altered the acoustic properties of speech compared to linear would yield significantly poorer speech perception. On the basis of initial acoustic analysis of the test stimuli recorded through a hearing aid, two different compression amplification settings were chosen for the perceptual study. Participants were 74 adults with mild to moderate sensorineural hearing impairment. Overall, the speech perception results supported the hypothesis. A further aim of the study was to determine if variation in participants' speech perception with compression amplification (compared to linear amplification) could be explained by the individual characteristics of age, degree of loss, dynamic range, temporal resolution, and frequency selectivity; however, no significant relationships were found.
Resumo:
This paper presents a corpus-based descriptive analysis of the most prevalent transfer effects and connected speech processes observed in a comparison of 11 Vietnamese English speakers (6 females, 5 males) and 12 Australian English speakers (6 males, 6 females) over 24 grammatical paraphrase items. The phonetic processes are segmentally labelled in terms of IPA diacritic features using the EMU speech database system with the aim of labelling departures from native-speaker pronunciation. An analysis of prosodic features was made using ToBI framework. The results show many phonetic and prosodic processes which make non-native speakers’ speech distinct from native ones. The corpusbased methodology of analysing foreign accent may have implications for the evaluation of non-native accent, accented speech recognition and computer assisted pronunciation- learning.
Resumo:
The evolution of a positive genetic correlation between male and female components of mate recognition systems will result as a consequence of assortative mating and, in particular, is central to a number of theories of sexual selection. Although the existence of such genetic correlations has been investigated in a number of taxa, it has yet to be shown that such correlations evolve and whether they may evolve as rapidly as suggested by sexual selection models. In this study, I used a hybridization experiment to disrupt natural mate recognition systems and then observed the subsequent evolutionary dynamics of the genetic correlation between male and female components for 56 generations in hybrids between Drosophila serrata and Drosophila birchii. The genetic correlation between male and female components evolved from 0.388 at generation 5 to 1.017 at generation 37 and then declined to -0.040 after a further 19 generations. These results indicated that the genetic basis of the mate recognition system in the hybrid populations evolved rapidly. The initial rapid increase in the genetic correlation was consistent with the classic assumption that male and female components will coevolve under sexual selection. The subsequent decline in genetic correlation may be attributable to the fixation of major genes or, alternatively, may be a result of a cyclic evolutionary change in mate recognition.
Resumo:
The interaction between natural and sexual selection is central to many theories of how mate choice and reproductive isolation evolve, but their joint effect on the evolution of mate recognition has not, to my knowledge, been investigated in an evolutionary experiment. Natural and sexual selection were manipulated in interspecific hybrid populations of Drosophila to determine their effects on the evolution of a mate recognition system comprised of cuticular hydrocarbons (CHCs). The effect of natural selection in isolation indicated that CHCs were costly for males and females to produce. The effect of sexual selection in isolation indicated that females preferred males with a particular CHC composition. However, the interaction between natural and sexual selection had a greater effect on the evolution of the mate recognition system than either process in isolation. When natural and sexual selection were permitted to operate in combination, male CHCs became exaggerated to a greater extent than in the presence of sexual selection alone, and female CHCs evolved against the direction of natural selection. This experiment demonstrated that the interaction between natural and sexual selection is critical in determining the direction and magnitude of the evolutionary response of the mate recognition system.
Resumo:
The differences in spectral shape resolution abilities among cochlear implant ~CI! listeners, and between CI and normal-hearing ~NH! listeners, when listening with the same number of channels ~12!, was investigated. In addition, the effect of the number of channels on spectral shape resolution was examined. The stimuli were rippled noise signals with various ripple frequency-spacings. An adaptive 4IFC procedure was used to determine the threshold for resolvable ripple spacing, which was the spacing at which an interchange in peak and valley positions could be discriminated. The results showed poorer spectral shape resolution in CI compared to NH listeners ~average thresholds of approximately 3000 and 400 Hz, respectively!, and wide variability among CI listeners ~range of approximately 800 to 8000 Hz!. There was a significant relationship between spectral shape resolution and vowel recognition. The spectral shape resolution thresholds of NH listeners increased as the number of channels increased from 1 to 16, while the CI listeners showed a performance plateau at 4–6 channels, which is consistent with previous results using speech recognition measures. These results indicate that this test may provide a measure of CI performance which is time efficient and non-linguistic, and therefore, if verified, may provide a useful contribution to the prediction of speech perception in adults and children who use CIs.
Resumo:
Automatic signature verification is a well-established and an active area of research with numerous applications such as bank check verification, ATM access, etc. This paper proposes a novel approach to the problem of automatic off-line signature verification and forgery detection. The proposed approach is based on fuzzy modeling that employs the Takagi-Sugeno (TS) model. Signature verification and forgery detection are carried out using angle features extracted from box approach. Each feature corresponds to a fuzzy set. The features are fuzzified by an exponential membership function involved in the TS model, which is modified to include structural parameters. The structural parameters are devised to take account of possible variations due to handwriting styles and to reflect moods. The membership functions constitute weights in the TS model. The optimization of the output of the TS model with respect to the structural parameters yields the solution for the parameters. We have also derived two TS models by considering a rule for each input feature in the first formulation (Multiple rules) and by considering a single rule for all input features in the second formulation. In this work, we have found that TS model with multiple rules is better than TS model with single rule for detecting three types of forgeries; random, skilled and unskilled from a large database of sample signatures in addition to verifying genuine signatures. We have also devised three approaches, viz., an innovative approach and two intuitive approaches using the TS model with multiple rules for improved performance. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
These are the full proceedings of the conference.
Resumo:
Despite growing clinical use, cervical auscultation suffers from a lack of research-based data. One of the strongest criticisms of cervical auscultation is that there has been little research to demonstrate how dysphagic swallowing sounds are different from normal swallowing sounds, In order to answer this question, however, one first needs to document the acoustic characteristics of normal, nondysphagic swallowing sounds, This article provides the first normative database of normal swallowing sounds for the adult population. The current investigation documents the acoustic characteristics of normal swallowing sounds for individuals from 18 to more than 60 years of age over a range of thin liquid volumes. Previous research has shown the normal swallow to be a dynamic event. The normal swallow is sensitive to aging of the oropharyngeal system, and also to the volume of bolus swallowed. The current investigation found that the acoustic signals generated during swallowing were sensitive to an individual's age and to the volume of the bolus swallowed. There were also some gender-specific differences in the acoustic profile of the swallowing sound, It is anticipated that the results will provide a catalyst for further research into cervical auscultation.
Resumo:
Purpose: This pilot study explored the feasibility and effectiveness of an Internet-based telerehabilitation application for the assessment of motor speech disorders in adults with acquired neurological impairment. Method: Using a counterbalanced, repeated measures research design, 2 speech-language pathologists assessed 19 speakers with dysarthria on a battery of perceptual assessments. The assessments included a 19-item version of the Frenchay Dysarthria Assessment (FDA; P. Enderby, 1983), the Assessment of Intelligibility of Dysarthric Speech (K. M. Yorkston & D. R. Beukelman, 1981), perceptual analysis of a speech sample, and an overall rating of severity of the dysarthria. One assessment was conducted in the traditional face-to-face manner, whereas the other assessment was conducted using an online, custom-built telerehabilitation application. This application enabled real-time videoconferencing at 128 kb/s and the transfer of store-and-forward audio and video data between the speaker and speech-language pathologist sites. The assessment methods were compared using the J.M.Bland and D.G.Altman (1986, 1999) limits-of-agreement method and percentage level of agreement between the 2 methods. Results: Measurements of severity of dysarthria, percentage intelligibility in sentences, and most perceptual ratings made in the telerehabilitation environment were found to fall within the clinically acceptable criteria. However, several ratings on the FDA were not comparable between the environments, and explanations for these results were explored. Conclusions: The online assessment of motor speech disorders using an Internet-based telerehabilitation system is feasible. This study suggests that with additional refinement of the technology and assessment protocols, reliable assessment of motor speech disorders over the Internet is possible. Future research methods are outlined.
Resumo:
Previous investigations employing electropalatography (EPG) have identified articulatory timing deficits in individuals with acquired dysarthria. However, this technology is yet to be applied to the articulatory timing disturbance present in Parkinson's disease (PD). As a result, the current investigation aimed to use EPG to comprehensively examine the temporal aspects of articulation in a group of nine individuals with PD at sentence, word and segment level. This investigation followed on from a prior study (McAuliffe, Ward and Murdoch) and similarly, aimed to compare the results of the participants with PD to a group of aged (n=7) and young controls (n=8) to determine if ageing contributed to any articulatory timing deficits observed. Participants were required to read aloud the phrase I saw a ___ today'' with the EPG palate in-situ. Target words included the consonants /1/, /s/ and /t/ in initial position in both the /i/ and /a/ vowel environments. Perceptual investigation of speech rate was conducted in addition to objective measurement of sentence, word and segment duration. Segment durations included the total segment length and duration of the approach, closure/constriction and release phases of EPG consonant production. Results of the present study revealed impaired speech rate, perceptually, in the group with PD. However, this was not confirmed objectively. Electropalatographic investigation of segment durations indicated that, in general, the group with PD demonstrated segment durations consistent with the control groups. Only one significant difference was noted, with the group with PD exhibiting significantly increased duration of the release phase for /1a/ when compared to both the control groups. It is, therefore, possible that EPG failed to detect lingual movement impairment as it does not measure the complete tongue movement towards and away from the hard palate. Furthermore, the contribution of individual variation to the present findings should not be overlooked.