195 resultados para intelligibility
Resumo:
BACKGROUND AND OBJECTIVE: In the Swiss version of the Freiburg speech intelligibility test five test words from the original German recording which are rarely used in Switzerland have been exchanged. Furthermore, differences in the transfer functions between headphone and loudspeaker presentation are not taken into account during calibration. New settings for the levels of the individual test words in the recommended recording and small changes in calibration procedures led us to make a verification of the currently used normative values.PATIENTS AND METHODS: Speech intelligibility was measured in 20 subjects with normal hearing using monosyllabic words and numbers via headphones and loudspeakers.RESULTS: On average, 50% speech intelligibility was reached at levels which were 7.5 dB lower under free-field conditions than for headphone presentation. The average difference between numbers and monosyllabic words was found to be 9.6 dB, which is considerably lower than the 14 dB of the current normative curves.CONCLUSIONS: There is a good agreement between our measurements and the normative values for tests using monosyllabic words and headphones, but not for numbers or free-field measurements.
Resumo:
INTRODUCTION The Rondo is a single-unit cochlear implant (CI) audio processor comprising the identical components as its behind-the-ear predecessor, the Opus 2. An interchange of the Opus 2 with the Rondo leads to a shift of the microphone position toward the back of the head. This study aimed to investigate the influence of the Rondo wearing position on speech intelligibility in noise. METHODS Speech intelligibility in noise was measured in 4 spatial configurations with 12 experienced CI users using the German adaptive Oldenburg sentence test. A physical model and a numerical model were used to enable a comparison of the observations. RESULTS No statistically significant differences of the speech intelligibility were found in the situations in which the signal came from the front and the noise came from the frontal, ipsilateral, or contralateral side. The signal-to-noise ratio (SNR) was significantly better with the Opus 2 in the case with the noise presented from the back (4.4 dB, p < 0.001). The differences in the SNR were significantly worse with the Rondo processors placed further behind the ear than closer to the ear. CONCLUSION The study indicates that CI users with the receiver/stimulator implanted in positions further behind the ear are expected to have higher difficulties in noisy situations when wearing the single-unit audio processor.
Resumo:
OBJECTIVE To evaluate the speech intelligibility in noise with a new cochlear implant (CI) processor that uses a pinna effect imitating directional microphone system. STUDY DESIGN Prospective experimental study. SETTING Tertiary referral center. PATIENTS Ten experienced, unilateral CI recipients with bilateral severe-to-profound hearing loss. INTERVENTION All participants performed speech in noise tests with the Opus 2 processor (omnidirectional microphone mode only) and the newer Sonnet processor (omnidirectional and directional microphone mode). MAIN OUTCOME MEASURE The speech reception threshold (SRT) in noise was measured in four spatial settings. The test sentences were always presented from the front. The noise was arriving either from the front (S0N0), the ipsilateral side of the CI (S0NIL), the contralateral side of the CI (S0NCL), or the back (S0N180). RESULTS The directional mode improved the SRTs by 3.6 dB (p < 0.01), 2.2 dB (p < 0.01), and 1.3 dB (p < 0.05) in the S0N180, S0NIL, and S0NCL situations, when compared with the Sonnet in the omnidirectional mode. There was no statistically significant difference in the S0N0 situation. No differences between the Opus 2 and the Sonnet in the omnidirectional mode were observed. CONCLUSION Speech intelligibility with the Sonnet system was statistically different to speech recognition with the Opus 2 system suggesting that CI users might profit from the pinna effect imitating directionality mode in noisy environments.
Resumo:
Mode of access: Internet.
Resumo:
The objective of this study was to evaluate the effects of posteroventral pallidotomy on perceptual and physiological measures of articulatory function and speech intelligibility in Parkinson disease (M). The study examined 11 participants with M who underwent posteroventral pallidotomy Physiological measures of hp and tongue function. and perceptual measures of speech intelligibility were obtained prepallidotomy and 3 months postpallidotomy. The participants with PD were also assessed on the Unified Parkinsons Disease Rating Scale (UPDRS Part III) In addition, the study included a group of 16 participants with PD who did not undergo pallidotomy and a group of 30 nonneurologically impaired participants. Analyses of physiological articulatory function and speech intelligibility did not reveal corresponding improvements in motor speech function as observed in general limb motor function postpallidotomy. Overall, individual reliable change analyses revealed that the majority of surgical PD participants demonstrated no reliable change on perceptual and physiological measures of articulation. The cur rent study revealed preliminary evidence that articulatury function and speech intelligibility did not change following posteroventral pallidotomy in a group of individuals with PD.
Resumo:
Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output—a key source of phonetic detail—from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (=30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (˜N1 + N2), F2 (˜N3 + N4) and the higher formants (F3' ˜ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.
Resumo:
This thesis examines the main aim of teaching pronunciation in second language acquisition in the Syrian context. In other words, it investigates the desirable end point, namely: whether it is native-like accent, or intelligible pronunciation. This thesis also investigates the factors that affect native-like pronunciation and intelligible accent. It also analyses English language teaching methods. The currently used English pronunciation course is examined in detail too. The aim is to find out the learners’ aim of pronunciation, the best teaching method for achieving that aim, and the most appropriate course book that fulfils the aim. In order to find out learners’ aim in pronunciation, a qualitative research is undertaken. The research takes advantage of some aspects of case study. It is also supported by a questionnaire to gather data. The result of this research can be regarded as an attempt to bring the Syrian context to the current trends in the teaching of English pronunciation. The results show that learners are satisfied with intelligible pronunciation. The currently used teaching method (grammar-translation method) may be better replaced by the (communicative approach) which is more appropriate than the currently used method. It is also more effective to change the currently used book to a new one that corresponds to that aim. The current theories and issues in teaching English pronunciation that support learners’ intelligibility will be taken into account in the newly proposed course book.
Resumo:
An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics-for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition.
Resumo:
The role of source properties in across-formant integration was explored using three-formant (F1+F2+F3) analogues of natural sentences (targets). In experiment 1, F1+F3 were harmonic analogues (H1+H3) generated using a monotonous buzz source and second-order resonators; in experiment 2, F1+F3 were tonal analogues (T1+T3). F2 could take either form (H2 or T2). Target formants were always presented monaurally; the receiving ear was assigned randomly on each trial. In some conditions, only the target was present; in others, a competitor for F2 (F2C) was presented contralaterally. Buzz-excited or tonal competitors were created using the time-reversed frequency and amplitude contours of F2. Listeners must reject F2C to optimize keyword recognition. Whether or not a competitor was present, there was no effect of source mismatch between F1+F3 and F2. The impact of adding F2C was modest when it was tonal but large when it was harmonic, irrespective of whether F2C matched F1+F3. This pattern was maintained when harmonic and tonal counterparts were loudness-matched (experiment 3). Source type and competition, rather than acoustic similarity, governed the phonetic contribution of a formant. Contrary to earlier research using dichotic targets, requiring across-ear integration to optimize intelligibility, H2C was an equally effective informational masker for H2 as for T2.
Resumo:
The category of the `at-risk youth' currently underpins a good deal of youth policy, and in particular, education policy. Primarily, the category is centred around a range of programmes associated with the need for state intervention, intervention which largely occurs `at a distance' within domains such as the school and the family. While it is argued that in some ways, the `at-risk youth' simply replaces older characterisations used in the policing of the young, it will also be argued that the preventative policies associated with `risk' are constituted in terms of factors rather than individuals; that prevention is no longer primarily based upon personal expertise, but rather upon the gathering and collation of statistical knowledge which identifies `risks' within given populations; and that `risk' permits a greater number of young people to be brought into the field of regulatory strategies. Importantly, the category of the `at-risk youth' underpins crucial sections of policy documents such as the Finn Report (into credentialling/ education and vocational competency). In this case, youth is deemed to be `at-risk' of not making the transition to adulthood successfully. It will be argued that not only is the Finn Report significant in the administrative and cultural shaping of the category of `youth', but also by employing the notion of `risk', the Report puts in place yet another element of an effective network of governmental intelligibility covering the young. Finally, it will be argued that young women, as a specific example of a `risk' group (vis-a-vis obtaining certain types of employment), require particular forms of intervention, primarily through changing the vocational aspirations of their parents.
Resumo:
The category of the `at-risk' youth currently underpins a good deal of youth policy. Primarily, it centres around a range of programs associated with the need for state intervention. The `at-risk' youth tenuously appears at the intersection of a variety of knowledges/problematisations, such as vocational guidance, youth welfare, family management, and so on. Whilst it is argued that in some ways, the `at-risk' youth simply replaces older characterisations used in the policing of the young, it will also be argued that the preventative policies associated with `risk' are constituted in terms of factors rather than individuals, that prevention is no longer primarily based upon personal expertise, but rather upon the gathering and collation of statistical knowledge which identifies `risks' within given populations, and that `risk' legitimates unlimited governmental intervention. Importantly, the category of the `at-risk' youth underpins crucial sections of policy documents such as the Finn Report (into credentialling/education and vocational competency). In this case, youth is deemed to be `at-risk' of not making the transition to adulthood successfully. It will be argued that not only is the Finn Report significant in the administrative and cultural shaping of the category of `youth', but also by employing the notion of `risk', the Report puts in place yet another element of an effective network of governmental intelligibility covering the young. Finally, it will be argued that young women, as a specific an example of a `risk' group (vis-a-vis obtaining certain types of employment), require particular forms of intervention, primarily through changing the vocational aspirations of their parents.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
This thesis critically analyses sperm donation practices from a child-centred perspective. It examines the effects, both personal and social, of disrupting the unity of biological and social relatedness in families affected by donor conception. It examines how disruption is facilitated by a process of mediation which is detailed using a model provided by Sunderland (2002). This model identifies mediating movements - alienation, translation, re-contextualisation and absorption - which help to explain the powerful and dominating material, and social and political processes which occur in biotechnology, or in reproductive technology in this case. The understanding of such movements and mediation of meanings is inspired by the complementary work of Silverstone (1999) and Sunderland. This model allows for a more critical appreciation of the movement of meaning from previously inalienable aspects of life to alienable products through biotechnology (Sunderland, 2002). Once this mediation in donor conception is subjected to critical examination here, it is then approached from different angles of investigation. The thesis posits that two conflicting notions of the self are being applied to fertility-frustrated adults and the offspring of reproductive interventions. Adults using reproductive interventions receive support to maximise their genetic continuity, but in so doing they create and dismiss the corresponding genetic discontinuity produced for the offspring. The offspring’s kinship and identity are then framed through an experimental postmodernist notion, presenting them as social rather than innate constructs. The adults using the reproductive intervention, on the other hand, have their identity and kinship continuity framed and supported as normative, innate, and based on genetic connection. This use of shifting frameworks is presented as unjust and harmful, creating double standards and a corrosion of kinship values, connection and intelligibility between generations; indeed, it is put forward as adult-centric. The analysis of other forms of human kinship dislocation provided by this thesis explores an under-utilised resource which is used to counter the commonly held opinion that any disruption of social and genetic relatedness for donor offspring is insignificant. The experiences of adoption and the stolen generations are used to inform understanding of the personal and social effects of such kinship disruption and potential reunion for donor offspring. These examples, along with laws governing international human rights, further strengthen the appeal here for normative principles and protections based on collective knowledge and standards to be applied to children of reproductive technology. The thesis presents the argument that the framing and regulation of reproductive technology is excessively influenced by industry providers and users. The interests of these parties collide with and corrode any accurate assessments and protections afforded to the children of reproductive technology. The thesis seeks to counter such encroachments and concludes by presenting these protections, frameworks, and human experiences as resources which can help to address the problems created for the offspring of such reproductive interventions, thereby illustrating why these reproductive interventions should be discontinued.
Resumo:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.