904 resultados para audio-visual automatic speech recognition
Resumo:
Many attempts have been made to overcome problems involved in character recognition which have resulted in the manufacture of character reading machines. An investigation into a new approach to character recognition is described. Features for recognition are Fourier coefficients. These are generated optically by convolving characters with periodic gratings. The development of hardware to enable automatic measurement of contrast and position of periodic shadows produced by the convolution is described. Fourier coefficients of character sets were measured, many of which are tabulated. Their analysis revealed that a few low frequency sampling points could be selected to recognise sets of numerals. Limited treatment is given to show the effect of type face variations on the values of coefficients which culminated in the location of six sampling frequencies used as features to recognise numerals in two type fonts. Finally, the construction of two character recognition machines is compared and contrasted. The first is a pilot plant based on a test bed optical Fourier analyser, while the second is a more streamlined machine d(3signed for high speed reading. Reasons to indicate that the latter machine would be the most suitable to adapt for industrial and commercial applications are discussed.
Resumo:
Parkinson's disease (PD) is a common disorder of middle-aged and elderly people, in which there is degeneration of the extra-pyramidal motor system. In some patients, the disease is associated with a range of visual signs and symptoms, including defects in visual acuity, colour vision, the blink reflex, pupil reactivity, saccadic and smooth pursuit movements and visual evoked potentials. In addition, there may be psychophysical changes, disturbances of complex visual functions such as visuospatial orientation and facial recognition, and chronic visual hallucinations. Some of the treatments associated with PD may have adverse ocular reactions. If visual problems are present, they can have an important effect on overall motor function, and quality of life of patients can be improved by accurate diagnosis and correction of such defects. Moreover, visual testing is useful in separating PD from other movement disorders with visual symptoms, such as dementia with Lewy bodies (DLB), multiple system atrophy (MSA) and progressive supranuclear palsy (PSP). Although not central to PD, visual signs and symptoms can be an important though obscure aspect of the disease and should not be overlooked.
Resumo:
This paper presents a case study of the use of a visual interactive modelling system to investigate issues involved in the management of a hospital ward. Visual Interactive Modelling systems are seen to offer the learner the opportunity to explore operational management issues from a varied perspective and to provide an interactive system in which the learner receives feedback on the consequences of their actions. However to maximise the potential learning experience for a student requires the recognition that they require task structure which helps them to understand the concepts involved. These factors can be incorporated into the visual interactive model by providing an interface customised to guide the student through the experimentation. Recent developments of VIM systems in terms of their connectivity with the programming language Visual Basic facilitates this customisation.
Resumo:
How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1?+?F2?+?F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1?+?F2C?+?F3C; F2?+?F3), where F2C?+?F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0?=?constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.
Resumo:
According to some models of visual selective attention, objects in a scene activate corresponding neural representations, which compete for perceptual awareness and motor behavior. During a visual search for a target object, top-down control exerted by working memory representations of the target's defining properties resolves competition in favor of the target. These models, however, ignore the existence of associative links among object representations. Here we show that such associations can strongly influence deployment of attention in humans. In the context of visual search, objects associated with the target were both recalled more often and recognized more accurately than unrelated distractors. Notably, both target and associated objects competitively weakened recognition of unrelated distractors and slowed responses to a luminance probe. Moreover, in a speeded search protocol, associated objects rendered search both slower and less accurate. Finally, the first saccades after onset of the stimulus array were more often directed toward associated than control items.
Resumo:
This thesis describes the investigation of an adaptive method of attenuation control for digital speech signals in an analogue-digital environment and its effects on the transmission performance of a national telecommunication network. The first part gives the design of a digital automatic gain control, able to operate upon a P.C.M. signal in its companded form and whose operation is based upon the counting of peaks of the digital speech signal above certain threshold levels. A study was ma.de of a digital automatic gain control (d.a.g.c.) in open-loop configuration and closed-loop configuration. The former was adopted as the means for carrying out the automatic control of attenuation. It was simulated and tested, both objectively and subjectively. The final part is the assessment of the effects on telephone connections of a d.a.g.c. that introduces gains of 6 dB or 12 dB. This work used a Telephone Connection Assessment Model developed at The University of Aston in Birmingham. The subjective tests showed that the d.a.g.c. gives advantage for listeners when the speech level is very low. The benefit is not great when speech is only a little quieter than preferred. The assessment showed that, when a standard British Telecom earphone is used, insertion of gain is desirable if speech voltage across the earphone terminals is below an upper limit of -38 dBV. People commented upon the presence of an adaptive-like effect during the tests. This could be the reason why they voted against the insertion of gain at level only little quieter than preferred, when they may otherwise have judged it to be desirable. A telephone connection with a d.a.g.c. in has a degree of difficulty less than half of that without it. The score Excellent plus Good is 10-30% greater.
Resumo:
Spatial generalization skills in school children aged 8-16 were studied with regard to unfamiliar objects that had been previously learned in a cross-modal priming and learning paradigm. We observed a developmental dissociation with younger children recognizing objects only from previously learnt perspectives whereas older children generalized acquired object knowledge to new viewpoints as well. Haptic and - to a lesser extent - visual priming improved spatial generalization in all but the youngest children. The data supports the idea of dissociable, view-dependent and view-invariant object representations with different developmental trajectories that are subject to modulatory effects of priming. Late-developing areas in the parietal or the prefrontal cortex may account for the retarded onset of view-invariant object recognition. © 2006 Elsevier B.V. All rights reserved.
Resumo:
In an isolated syllable, a formant will tend to be segregated perceptually if its fundamental frequency (F0) differs from that of the other formants. This study explored whether similar results are found for sentences, and specifically whether differences in F0 (?F0) also influence across-formant grouping in circumstances where the exclusion or inclusion of the manipulated formant critically determines speech intelligibility. Three-formant (F1 + F2 + F3) analogues of almost continuously voiced natural sentences were synthesized using a monotonous glottal source (F0 = 150 Hz). Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3; F2), where F2C is a competitor for F2 that listeners must resist to optimize recognition. Competitors were created using time-reversed frequency and amplitude contours of F2, and F0 was manipulated (?F0 = ±8, ±2, or 0 semitones relative to the other formants). Adding F2C typically reduced intelligibility, and this reduction was greatest when ?F0 = 0. There was an additional effect of absolute F0 for F2C, such that competitor efficacy was greater for higher F0s. However, competitor efficacy was not due to energetic masking of F3 by F2C. The results are consistent with the proposal that a grouping “primitive” based on common F0 influences the fusion and segregation of concurrent formants in sentence perception.
Resumo:
Background - It is well established that the left inferior frontal gyrus plays a key role in the cerebral cortical network that supports reading and visual word recognition. Less clear is when in time this contribution begins. We used magnetoencephalography (MEG), which has both good spatial and excellent temporal resolution, to address this question. Methodology/Principal Findings - MEG data were recorded during a passive viewing paradigm, chosen to emphasize the stimulus-driven component of the cortical response, in which right-handed participants were presented words, consonant strings, and unfamiliar faces to central vision. Time-frequency analyses showed a left-lateralized inferior frontal gyrus (pars opercularis) response to words between 100–250 ms in the beta frequency band that was significantly stronger than the response to consonant strings or faces. The left inferior frontal gyrus response to words peaked at ~130 ms. This response was significantly later in time than the left middle occipital gyrus, which peaked at ~115 ms, but not significantly different from the peak response in the left mid fusiform gyrus, which peaked at ~140 ms, at a location coincident with the fMRI–defined visual word form area (VWFA). Significant responses were also detected to words in other parts of the reading network, including the anterior middle temporal gyrus, the left posterior middle temporal gyrus, the angular and supramarginal gyri, and the left superior temporal gyrus. Conclusions/Significance - These findings suggest very early interactions between the vision and language domains during visual word recognition, with speech motor areas being activated at the same time as the orthographic word-form is being resolved within the fusiform gyrus. This challenges the conventional view of a temporally serial processing sequence for visual word recognition in which letter forms are initially decoded, interact with their phonological and semantic representations, and only then gain access to a speech code.
Resumo:
Speech comprises dynamic and heterogeneous acoustic elements, yet it is heard as a single perceptual stream even when accompanied by other sounds. The relative contributions of grouping “primitives” and of speech-specific grouping factors to the perceptual coherence of speech are unclear, and the acoustical correlates of the latter remain unspecified. The parametric manipulations possible with simplified speech signals, such as sine-wave analogues, make them attractive stimuli to explore these issues. Given that the factors governing perceptual organization are generally revealed only where competition operates, the second-formant competitor (F2C) paradigm was used, in which the listener must resist competition to optimize recognition [Remez et al., Psychol. Rev. 101, 129-156 (1994)]. Three-formant (F1+F2+F3) sine-wave analogues were derived from natural sentences and presented dichotically (one ear = F1+F2C+F3; opposite ear = F2). Different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, regardless of their amplitude characteristics. In contrast, F2Cs with constant frequency contours were completely ineffective. Competitor efficacy was not due to energetic masking of F3 by F2C. These findings indicate that modulation of the frequency, but not the amplitude, contour is critical for across-formant grouping.
Resumo:
Hemispheric differences in the learning and generalization of pattern categories were explored in two experiments involving sixteen patients with unilateral posterior, cerebral lesions in the left (LH) or right (RH) hemisphere. In each experiment participants were first trained to criterion in a supervised learning paradigm to categorize a set of patterns that either consisted of simple geometric forms (Experiment 1) or unfamiliar grey-level images (Experiment 2). They were then tested for their ability to generalize acquired categorical knowledge to contrast-reversed versions of the learning patterns. The results showed that RH lesions impeded category learning of unfamiliar grey-level images more severely than LH lesions, whereas this relationship appeared reversed for categories defined by simple geometric forms. With regard to generalization to contrast reversal, categorization performance of LH and RH patients was unaffected in the case of simple geometric forms. However, generalization to of contrast-reversed grey-level images distinctly deteriorated for patients with LH lesions relative to those with RH lesions, with the latter (but not the former) being consistently unable to identify the pattern manipulation. These findings suggest a differential use of contrast information in the representation of pattern categories in the two hemispheres. Such specialization appears in line with previous distinctions between a predominantly lefthemispheric, abstract-analytical and a righthemispheric, specific-holistic representation of object categories, and their prediction of a mandatory representation of contrast polarity in the RH. Some implications for the well-established dissociation of visual disorders for the recognition of faces and letters are discussed.