27 resultados para audiovisual speech perception
Resumo:
Advanced Research Projects Agency (ONR N00014-92-J-4015); National Science Foundation (IRI-90-24877); Office of Naval Research (N00014-91-J-4100)
Resumo:
A neural network model of early visual processing offers an explanation of brightness effects often associated with illusory contours. Top-down feedback from the model's analog of visual cortical complex cells to model lateral geniculate nucleus (LGN) cells are used to enhance contrast at line ends and other areas of boundary discontinuity. The result is an increase in perceived brightness outside a dark line end, akin to what Kennedy (1979) termed "brightness buttons" in his analysis of visual illusions. When several lines form a suitable configuration, as in an Ehrenstein pattern, the perceptual effect of enhanced brightness can be quite strong. Model simulations show the generation of brightness buttons. With the LGN model circuitry embedded in a larger model of preattentive vision, simulations using complex inputs show the interaction of the brightness buttons with real and illusory contours.
Resumo:
A neural network model is presented to account for the three dimensional perception of visual space by way of an analog Gestalt-like perceptual mechanism.
Resumo:
A neural model of peripheral auditory processing is described and used to separate features of coarticulated vowels and consonants. After preprocessing of speech via a filterbank, the model splits into two parallel channels, a sustained channel and a transient channel. The sustained channel is sensitive to relatively stable parts of the speech waveform, notably synchronous properties of the vocalic portion of the stimulus it extends the dynamic range of eighth nerve filters using coincidence deteectors that combine operations of raising to a power, rectification, delay, multiplication, time averaging, and preemphasis. The transient channel is sensitive to critical features at the onsets and offsets of speech segments. It is built up from fast excitatory neurons that are modulated by slow inhibitory interneurons. These units are combined over high frequency and low frequency ranges using operations of rectification, normalization, multiplicative gating, and opponent processing. Detectors sensitive to frication and to onset or offset of stop consonants and vowels are described. Model properties are characterized by mathematical analysis and computer simulations. Neural analogs of model cells in the cochlear nucleus and inferior colliculus are noted, as are psychophysical data about perception of CV syllables that may be explained by the sustained transient channel hypothesis. The proposed sustained and transient processing seems to be an auditory analog of the sustained and transient processing that is known to occur in vision.
Resumo:
This article describes a neural network model that addresses the acquisition of speaking skills by infants and subsequent motor equivalent production of speech sounds. The model learns two mappings during a babbling phase. A phonetic-to-orosensory mapping specifies a vocal tract target for each speech sound; these targets take the form of convex regions in orosensory coordinates defining the shape of the vocal tract. The babbling process wherein these convex region targets are formed explains how an infant can learn phoneme-specific and language-specific limits on acceptable variability of articulator movements. The model also learns an orosensory-to-articulatory mapping wherein cells coding desired movement directions in orosensory space learn articulator movements that achieve these orosensory movement directions. The resulting mapping provides a natural explanation for the formation of coordinative structures. This mapping also makes efficient use of redundancy in the articulator system, thereby providing the model with motor equivalent capabilities. Simulations verify the model's ability to compensate for constraints or perturbations applied to the articulators automatically and without new learning and to explain contextual variability seen in human speech production.
Resumo:
How do visual form and motion processes cooperate to compute object motion when each process separately is insufficient? Consider, for example, a deer moving behind a bush. Here the partially occluded fragments of motion signals available to an observer must be coherently grouped into the motion of a single object. A 3D FORMOTION model comprises five important functional interactions involving the brain’s form and motion systems that address such situations. Because the model’s stages are analogous to areas of the primate visual system, we refer to the stages by corresponding anatomical names. In one of these functional interactions, 3D boundary representations, in which figures are separated from their backgrounds, are formed in cortical area V2. These depth-selective V2 boundaries select motion signals at the appropriate depths in MT via V2-to-MT signals. In another, motion signals in MT disambiguate locally incomplete or ambiguous boundary signals in V2 via MT-to-V1-to-V2 feedback. The third functional property concerns resolution of the aperture problem along straight moving contours by propagating the influence of unambiguous motion signals generated at contour terminators or corners. Here, sparse “feature tracking signals” from, e.g., line ends, are amplified to overwhelm numerically superior ambiguous motion signals along line segment interiors. In the fourth, a spatially anisotropic motion grouping process takes place across perceptual space via MT-MST feedback to integrate veridical feature-tracking and ambiguous motion signals to determine a global object motion percept. The fifth property uses the MT-MST feedback loop to convey an attentional priming signal from higher brain areas back to V1 and V2. The model's use of mechanisms such as divisive normalization, endstopping, cross-orientation inhibition, and longrange cooperation is described. Simulated data include: the degree of motion coherence of rotating shapes observed through apertures, the coherent vs. element motion percepts separated in depth during the chopsticks illusion, and the rigid vs. non-rigid appearance of rotating ellipses.
Resumo:
This article describes further evidence for a new neural network theory of biological motion perception that is called a Motion Boundary Contour System. This theory clarifies why parallel streams Vl-> V2 and Vl-> MT exist for static form and motion form processing among the areas Vl, V2, and MT of visual cortex. The Motion Boundary Contour System consists of several parallel copies, such that each copy is activated by a different range of receptive field sizes. Each copy is further subdivided into two hierarchically organized subsystems: a Motion Oriented Contrast Filter, or MOC Filter, for preprocessing moving images; and a Cooperative-Competitive Feedback Loop, or CC Loop, for generating emergent boundary segmentations of the filtered signals. The present article uses the MOC Filter to explain a variety of classical and recent data about short-range and long-range apparent motion percepts that have not yet been explained by alternative models. These data include split motion; reverse-contrast gamma motion; delta motion; visual inertia; group motion in response to a reverse-contrast Ternus display at short interstimulus intervals; speed-up of motion velocity as interfiash distance increases or flash duration decreases; dependence of the transition from element motion to group motion on stimulus duration and size; various classical dependencies between flash duration, spatial separation, interstimulus interval, and motion threshold known as Korte's Laws; and dependence of motion strength on stimulus orientation and spatial frequency. These results supplement earlier explanations by the model of apparent motion data that other models have not explained; a recent proposed solution of the global aperture problem, including explanations of motion capture and induced motion; an explanation of how parallel cortical systems for static form perception and motion form perception may develop, including a demonstration that these parallel systems are variations on a common cortical design; an explanation of why the geometries of static form and motion form differ, in particular why opposite orientations differ by 90°, whereas opposite directions differ by 180°, and why a cortical stream Vl -> V2 -> MT is needed; and a summary of how the main properties of other motion perception models can be assimilated into different parts of the Motion Boundary Contour System design.
Synchronized Oscillations During Cooperative Feature Lining in a Cortical Model of Visual Perception
Resumo:
A neural network model of synchronized oscillations in visual cortex is presented to account for recent neurophysiological findings that such synchronization may reflect global properties of the stimulus. In these experiments, synchronization of oscillatory firing responses to moving bar stimuli occurred not only for nearby neurons, but also occurred between neurons separated by several cortical columns (several mm of cortex) when these neurons shared some receptive field preferences specific to the stimuli. These results were obtained for single bar stimuli and also across two disconnected, but colinear, bars moving in the same direction. Our model and computer simulations obtain these synchrony results across both single and double bar stimuli using different, but formally related, models of preattentive visual boundary segmentation and attentive visual object recognition, as well as nearest-neighbor and randomly coupled models.
Resumo:
This article describes further evidence for a new neural network theory of biological motion perception. The theory clarifies why parallel streams Vl --> V2, Vl --> MT, and Vl --> V2 --> MT exist for static form and motion form processing among the areas Vl, V2, and MT of visual cortex. The theory suggests that the static form system (Static BCS) generates emergent boundary segmentations whose outputs are insensitive to direction-ofcontrast and insensitive to direction-of-motion, whereas the motion form system (Motion BCS) generates emergent boundary segmentations whose outputs are insensitive to directionof-contrast but sensitive to direction-of-motion. The theory is used to explain classical and recent data about short-range and long-range apparent motion percepts that have not yet been explained by alternative models. These data include beta motion; split motion; gamma motion and reverse-contrast gamma motion; delta motion; visual inertia; the transition from group motion to element motion in response to a Ternus display as the interstimulus interval (ISI) decreases; group motion in response to a reverse-contrast Ternus display even at short ISIs; speed-up of motion velocity as interflash distance increases or flash duration decreases; dependence of the transition from element motion to group motion on stimulus duration and size; various classical dependencies between flash duration, spatial separation, ISI, and motion threshold known as Korte's Laws; dependence of motion strength on stimulus orientation and spatial frequency; short-range and long-range form-color interactions; and binocular interactions of flashes to different eyes.
Resumo:
A neural network model of synchronized oscillator activity in visual cortex is presented in order to account for recent neurophysiological findings that such synchronization may reflect global properties of the stimulus. In these recent experiments, it was reported that synchronization of oscillatory firing responses to moving bar stimuli occurred not only for nearby neurons, but also occurred between neurons separated by several cortical columns (several mm of cortex) when these neurons shared some receptive field preferences specific to the stimuli. These results were obtained not only for single bar stimuli but also across two disconnected, but colinear, bars moving in the same direction. Our model and computer simulations obtain these synchrony results across both single and double bar stimuli. For the double bar case, synchronous oscillations are induced in the region between the bars, but no oscillations are induced in the regions beyond the stimuli. These results were achieved with cellular units that exhibit limit cycle oscillations for a robust range of input values, but which approach an equilibrium state when undriven. Single and double bar synchronization of these oscillators was achieved by different, but formally related, models of preattentive visual boundary segmentation and attentive visual object recognition, as well as nearest-neighbor and randomly coupled models. In preattentive visual segmentation, synchronous oscillations may reflect the binding of local feature detectors into a globally coherent grouping. In object recognition, synchronous oscillations may occur during an attentive resonant state that triggers new learning. These modelling results support earlier theoretical predictions of synchronous visual cortical oscillations and demonstrate the robustness of the mechanisms capable of generating synchrony.
Resumo:
How do human observers perceive a coherent pattern of motion from a disparate set of local motion measures? Our research has examined how ambiguous motion signals along straight contours are spatially integrated to obtain a globally coherent perception of motion. Observers viewed displays containing a large number of apertures, with each aperture containing one or more contours whose orientations and velocities could be independently specified. The total pattern of the contour trajectories across the individual apertures was manipulated to produce globally coherent motions, such as rotations, expansions, or translations. For displays containing only straight contours extending to the circumferences of the apertures, observers' reports of global motion direction were biased whenever the sampling of contour orientations was asymmetric relative to the direction of motion. Performance was improved by the presence of identifiable features, such as line ends or crossings, whose trajectories could be tracked over time. The reports of our observers were consistent with a pooling process involving a vector average of measures of the component of velocity normal to contour orientation, rather than with the predictions of the intersection-of-constraints analysis in velocity space.
Resumo:
A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and analyzed. The model neurally instantiates ideas front the spectral pitch modeling literature and joins them to basic neural network signal processing designs to simulate a broader range of perceptual pitch data than previous spectral models. The components of the model arc interpreted as peripheral mechanical and neural processing stages, which arc capable of being incorporated into a larger network architecture for separating multiple sound sources in the environment. The core of the new model transforms a spectral representation of an acoustic source into a spatial distribution of pitch strengths. The SPINET model uses a weighted "harmonic sieve" whereby the strength of activation of a given pitch depends upon a weighted sum of narrow regions around the harmonics of the nominal pitch value, and higher harmonics contribute less to a pitch than lower ones. Suitably chosen harmonic weighting functions enable computer simulations of pitch perception data involving mistuned components, shifted harmonics, and various types of continuous spectra including rippled noise. It is shown how the weighting functions produce the dominance region, how they lead to octave shifts of pitch in response to ambiguous stimuli, and how they lead to a pitch region in response to the octave-spaced Shepard tone complexes and Deutsch tritones without the use of attentional mechanisms to limit pitch choices. An on-center off-surround network in the model helps to produce noise suppression, partial masking and edge pitch. Finally, it is shown how peripheral filtering and short term energy measurements produce a model pitch estimate that is sensitive to certain component phase relationships.