961 resultados para automatic speech recognition
Resumo:
Semantic priming occurs when a subject is faster in recognising a target word when it is preceded by a related word compared to an unrelated word. The effect is attributed to automatic or controlled processing mechanisms elicited by short or long interstimulus intervals (ISIs) between primes and targets. We employed event-related functional magnetic resonance imaging (fMRI) to investigate blood oxygen level dependent (BOLD) responses associated with automatic semantic priming using an experimental design identical to that used in standard behavioural priming tasks. Prime-target semantic strength was manipulated by using lexical ambiguity primes (e.g., bank) and target words related to dominant or subordinate meaning of the ambiguity. Subjects made speeded lexical decisions (word/nonword) on dominant related, subordinate related, and unrelated word pairs presented randomly with a short ISI. The major finding was a pattern of reduced activity in middle temporal and inferior prefrontal regions for dominant versus unrelated and subordinate versus unrelated comparisons, respectively. These findings are consistent with both a dual process model of semantic priming and recent repetition priming data that suggest that reductions in BOLD responses represent neural priming associated with automatic semantic activation and implicate the left middle temporal cortex and inferior prefrontal cortex in more automatic aspects of semantic processing.
Resumo:
Many attempts have been made to overcome problems involved in character recognition which have resulted in the manufacture of character reading machines. An investigation into a new approach to character recognition is described. Features for recognition are Fourier coefficients. These are generated optically by convolving characters with periodic gratings. The development of hardware to enable automatic measurement of contrast and position of periodic shadows produced by the convolution is described. Fourier coefficients of character sets were measured, many of which are tabulated. Their analysis revealed that a few low frequency sampling points could be selected to recognise sets of numerals. Limited treatment is given to show the effect of type face variations on the values of coefficients which culminated in the location of six sampling frequencies used as features to recognise numerals in two type fonts. Finally, the construction of two character recognition machines is compared and contrasted. The first is a pilot plant based on a test bed optical Fourier analyser, while the second is a more streamlined machine d(3signed for high speed reading. Reasons to indicate that the latter machine would be the most suitable to adapt for industrial and commercial applications are discussed.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1?+?F2?+?F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1?+?F2C?+?F3C; F2?+?F3), where F2C?+?F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0?=?constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.
Resumo:
This thesis describes the investigation of an adaptive method of attenuation control for digital speech signals in an analogue-digital environment and its effects on the transmission performance of a national telecommunication network. The first part gives the design of a digital automatic gain control, able to operate upon a P.C.M. signal in its companded form and whose operation is based upon the counting of peaks of the digital speech signal above certain threshold levels. A study was ma.de of a digital automatic gain control (d.a.g.c.) in open-loop configuration and closed-loop configuration. The former was adopted as the means for carrying out the automatic control of attenuation. It was simulated and tested, both objectively and subjectively. The final part is the assessment of the effects on telephone connections of a d.a.g.c. that introduces gains of 6 dB or 12 dB. This work used a Telephone Connection Assessment Model developed at The University of Aston in Birmingham. The subjective tests showed that the d.a.g.c. gives advantage for listeners when the speech level is very low. The benefit is not great when speech is only a little quieter than preferred. The assessment showed that, when a standard British Telecom earphone is used, insertion of gain is desirable if speech voltage across the earphone terminals is below an upper limit of -38 dBV. People commented upon the presence of an adaptive-like effect during the tests. This could be the reason why they voted against the insertion of gain at level only little quieter than preferred, when they may otherwise have judged it to be desirable. A telephone connection with a d.a.g.c. in has a degree of difficulty less than half of that without it. The score Excellent plus Good is 10-30% greater.
Resumo:
In an isolated syllable, a formant will tend to be segregated perceptually if its fundamental frequency (F0) differs from that of the other formants. This study explored whether similar results are found for sentences, and specifically whether differences in F0 (?F0) also influence across-formant grouping in circumstances where the exclusion or inclusion of the manipulated formant critically determines speech intelligibility. Three-formant (F1 + F2 + F3) analogues of almost continuously voiced natural sentences were synthesized using a monotonous glottal source (F0 = 150 Hz). Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3; F2), where F2C is a competitor for F2 that listeners must resist to optimize recognition. Competitors were created using time-reversed frequency and amplitude contours of F2, and F0 was manipulated (?F0 = ±8, ±2, or 0 semitones relative to the other formants). Adding F2C typically reduced intelligibility, and this reduction was greatest when ?F0 = 0. There was an additional effect of absolute F0 for F2C, such that competitor efficacy was greater for higher F0s. However, competitor efficacy was not due to energetic masking of F3 by F2C. The results are consistent with the proposal that a grouping “primitive” based on common F0 influences the fusion and segregation of concurrent formants in sentence perception.
Resumo:
Speech comprises dynamic and heterogeneous acoustic elements, yet it is heard as a single perceptual stream even when accompanied by other sounds. The relative contributions of grouping “primitives” and of speech-specific grouping factors to the perceptual coherence of speech are unclear, and the acoustical correlates of the latter remain unspecified. The parametric manipulations possible with simplified speech signals, such as sine-wave analogues, make them attractive stimuli to explore these issues. Given that the factors governing perceptual organization are generally revealed only where competition operates, the second-formant competitor (F2C) paradigm was used, in which the listener must resist competition to optimize recognition [Remez et al., Psychol. Rev. 101, 129-156 (1994)]. Three-formant (F1+F2+F3) sine-wave analogues were derived from natural sentences and presented dichotically (one ear = F1+F2C+F3; opposite ear = F2). Different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, regardless of their amplitude characteristics. In contrast, F2Cs with constant frequency contours were completely ineffective. Competitor efficacy was not due to energetic masking of F3 by F2C. These findings indicate that modulation of the frequency, but not the amplitude, contour is critical for across-formant grouping.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 − F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. © 2014 The Author(s).
Resumo:
How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints. © Springer Science+Business Media New York 2013.
Resumo:
Despite being nominated as a key potential interaction technique for supporting today's mobile technology user, the widespread commercialisation of speech-based input is currently being impeded by unacceptable recognition error rates. Developing effective speech-based solutions for use in mobile contexts, given the varying extent of background noise, is challenging. The research presented in this paper is part of an ongoing investigation into how best to incorporate speechbased input within mobile data collection applications. Specifically, this paper reports on a comparison of three different commercially available microphones in terms of their efficacy to facilitate mobile, speech-based data entry. We describe, in detail, our novel evaluation design as well as the results we obtained.
Resumo:
The paper reports on preliminary results of an ongoing research aiming at development of an automatic procedure for recognition of discourse-compositional structure of scientific and technical texts, which is required in many NLP applications. The procedure exploits as discourse markers various domain-independent words and expressions that are specific for scientific and technical texts and organize scientific discourse. The paper discusses features of scientific discourse and common scientific lexicon comprising such words and expressions. Methodological issues of development of a computer dictionary for common scientific lexicon are concerned; basic principles of its organization are described as well. Main steps of the discourse-analyzing procedure based on the dictionary and surface syntactical analysis are pointed out.