14 resultados para UNCONDITIONED STIMULUS
em Massachusetts Institute of Technology
Resumo:
Tsunoda et al. (2001) recently studied the nature of object representation in monkey inferotemporal cortex using a combination of optical imaging and extracellular recordings. In particular, they examined IT neuron responses to complex natural objects and "simplified" versions thereof. In that study, in 42% of the cases, optical imaging revealed a decrease in the number of activation patches in IT as stimuli were "simplified". However, in 58% of the cases, "simplification" of the stimuli actually led to the appearance of additional activation patches in IT. Based on these results, the authors propose a scheme in which an object is represented by combinations of active and inactive columns coding for individual features. We examine the patterns of activation caused by the same stimuli as used by Tsunoda et al. in our model of object recognition in cortex (Riesenhuber 99). We find that object-tuned units can show a pattern of appearance and disappearance of features identical to the experiment. Thus, the data of Tsunoda et al. appear to be in quantitative agreement with a simple object-based representation in which an object's identity is coded by its similarities to reference objects. Moreover, the agreement of simulations and experiment suggests that the simplification procedure used by Tsunoda (2001) is not necessarily an accurate method to determine neuronal tuning.
Resumo:
The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view-tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.
Resumo:
We discuss a variety of object recognition experiments in which human subjects were presented with realistically rendered images of computer-generated three-dimensional objects, with tight control over stimulus shape, surface properties, illumination, and viewpoint, as well as subjects' prior exposure to the stimulus objects. In all experiments recognition performance was: (1) consistently viewpoint dependent; (2) only partially aided by binocular stereo and other depth information, (3) specific to viewpoints that were familiar; (4) systematically disrupted by rotation in depth more than by deforming the two-dimensional images of the stimuli. These results are consistent with recently advanced computational theories of recognition based on view interpolation.
Resumo:
Brightness judgments are a key part of the primate brain's visual analysis of the environment. There is general consensus that the perceived brightness of an image region is based not only on its actual luminance, but also on the photometric structure of its neighborhood. However, it is unclear precisely how a region's context influences its perceived brightness. Recent research has suggested that brightness estimation may be based on a sophisticated analysis of scene layout in terms of transparency, illumination and shadows. This work has called into question the role of low-level mechanisms, such as lateral inhibition, as explanations for brightness phenomena. Here we describe experiments with displays for which low-level and high-level analyses make qualitatively different predictions, and with which we can quantitatively assess the trade-offs between low-level and high-level factors. We find that brightness percepts in these displays are governed by low-level stimulus properties, even when these percepts are inconsistent with higher-level interpretations of scene layout. These results point to the important role of low-level mechanisms in determining brightness percepts.
Resumo:
Object recognition in the visual cortex is based on a hierarchical architecture, in which specialized brain regions along the ventral pathway extract object features of increasing levels of complexity, accompanied by greater invariance in stimulus size, position, and orientation. Recent theoretical studies postulate a non-linear pooling function, such as the maximum (MAX) operation could be fundamental in achieving such invariance. In this paper, we are concerned with neurally plausible mechanisms that may be involved in realizing the MAX operation. Four canonical circuits are proposed, each based on neural mechanisms that have been previously discussed in the context of cortical processing. Through simulations and mathematical analysis, we examine the relative performance and robustness of these mechanisms. We derive experimentally verifiable predictions for each circuit and discuss their respective physiological considerations.
Resumo:
Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of short-term memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.
Resumo:
In macaque inferotemporal cortex (IT), neurons have been found to respond selectively to complex shapes while showing broad tuning ("invariance") with respect to stimulus transformations such as translation and scale changes and a limited tuning to rotation in depth. Training monkeys with novel, paperclip-like objects, Logothetis et al. could investigate whether these invariance properties are due to experience with exhaustively many transformed instances of an object or if there are mechanisms that allow the cells to show response invariance also to previously unseen instances of that object. They found object-selective cells in anterior IT which exhibited limited invariance to various transformations after training with single object views. While previous models accounted for the tuning of the cells for rotations in depth and for their selectivity to a specific object relative to a population of distractor objects, the model described here attempts to explain in a biologically plausible way the additional properties of translation and size invariance. Using the same stimuli as in the experiment, we find that model IT neurons exhibit invariance properties which closely parallel those of real neurons. Simulations show that the model is capable of unsupervised learning of view-tuned neurons. The model also allows to make experimentally testable predictions regarding novel stimulus transformations and combinations of stimuli.
Resumo:
Baylis & Driver (Nature Neuroscience, 2001) have recently presented data on the response of neurons in macaque inferotemporal cortex (IT) to various stimulus transformations. They report that neurons can generalize over contrast and mirror reversal, but not over figure-ground reversal. This finding is taken to demonstrate that ``the selectivity of IT neurons is not determined simply by the distinctive contours in a display, contrary to simple edge-based models of shape recognition'', citing our recently presented model of object recognition in cortex (Riesenhuber & Poggio, Nature Neuroscience, 1999). In this memo, I show that the main effects of the experiment can be obtained by performing the appropriate simulations in our simple feedforward model. This suggests for IT cell tuning that the possible contributions of explicit edge assignment processes postulated in (Baylis & Driver, 2001) might be smaller than expected.
Resumo:
In a recent experiment, Freedman et al. recorded from inferotemporal (IT) and prefrontal cortices (PFC) of monkeys performing a "cat/dog" categorization task (Freedman 2001 and Freedman, Riesenhuber, Poggio, Miller 2001). In this paper we analyze the tuning properties of view-tuned units in our HMAX model of object recognition in cortex (Riesenhuber 1999) using the same paradigm and stimuli as in the experiment. We then compare the simulation results to the monkey inferotemporal neuron population data. We find that view-tuned model IT units that were trained without any explicit category information can show category-related tuning as observed in the experiment. This suggests that the tuning properties of experimental IT neurons might primarily be shaped by bottom-up stimulus-space statistics, with little influence of top-down task-specific information. The population of experimental PFC neurons, on the other hand, shows tuning properties that cannot be explained just by stimulus tuning. These analyses are compatible with a model of object recognition in cortex (Riesenhuber 2000) in which a population of shape-tuned neurons provides a general basis for neurons tuned to different recognition tasks.
Resumo:
Different theoretical models have tried to investigate the feasibility of recurrent neural mechanisms for achieving direction selectivity in the visual cortex. The mathematical analysis of such models has been restricted so far to the case of purely linear networks. We present an exact analytical solution of the nonlinear dynamics of a class of direction selective recurrent neural models with threshold nonlinearity. Our mathematical analysis shows that such networks have form-stable stimulus-locked traveling pulse solutions that are appropriate for modeling the responses of direction selective cortical neurons. Our analysis shows also that the stability of such solutions can break down giving raise to a different class of solutions ("lurching activity waves") that are characterized by a specific spatio-temporal periodicity. These solutions cannot arise in models for direction selectivity with purely linear spatio-temporal filtering.
Resumo:
abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.
Resumo:
The question of how shape is represented is of central interest to understanding visual processing in cortex. While tuning properties of the cells in early part of the ventral visual stream, thought to be responsible for object recognition in the primate, are comparatively well understood, several different theories have been proposed regarding tuning in higher visual areas, such as V4. We used the model of object recognition in cortex presented by Riesenhuber and Poggio (1999), where more complex shape tuning in higher layers is the result of combining afferent inputs tuned to simpler features, and compared the tuning properties of model units in intermediate layers to those of V4 neurons from the literature. In particular, we investigated the issue of shape representation in visual area V1 and V4 using oriented bars and various types of gratings (polar, hyperbolic, and Cartesian), as used in several physiology experiments. Our computational model was able to reproduce several physiological findings, such as the broadening distribution of the orientation bandwidths and the emergence of a bias toward non-Cartesian stimuli. Interestingly, the simulation results suggest that some V4 neurons receive input from afferents with spatially separated receptive fields, leading to experimentally testable predictions. However, the simulations also show that the stimulus set of Cartesian and non-Cartesian gratings is not sufficiently complex to probe shape tuning in higher areas, necessitating the use of more complex stimulus sets.
Resumo:
Numerous psychophysical experiments have shown an important role for attentional modulations in vision. Behaviorally, allocation of attention can improve performance in object detection and recognition tasks. At the neural level, attention increases firing rates of neurons in visual cortex whose preferred stimulus is currently attended to. However, it is not yet known how these two phenomena are linked, i.e., how the visual system could be "tuned" in a task-dependent fashion to improve task performance. To answer this question, we performed simulations with the HMAX model of object recognition in cortex [45]. We modulated firing rates of model neurons in accordance with experimental results about effects of feature-based attention on single neurons and measured changes in the model's performance in a variety of object recognition tasks. It turned out that recognition performance could only be improved under very limited circumstances and that attentional influences on the process of object recognition per se tend to display a lack of specificity or raise false alarm rates. These observations lead us to postulate a new role for the observed attention-related neural response modulations.
Resumo:
The Japanese economy entered a long recession in spring 1997. Its economic growth has been much lower than in the US and the EU despite large fiscal stimulus packages, a monetary policy which has brought interest rates to zero since 1999, injections of public money to recapitalize banks, and programs of liberalization and deregulation. How could all these policies have failed to bring the Japanese economy back on a sustainable growth path? This paper argues that the failure of Japan's efforts to restore a sound economic environment is the result of having deliberately chosen inappropriate and inadequate monetary and fiscal instruments to tackle the macroeconomic and structural problems that have burdened the Japanese economy since the burst of the financial bubble at the beginning of the 90s. These choices were deliberate, since the "right" policies (in primis the resolution of the banking crisis) presented unbearable political costs, not only for the ruling parties, but also for the bureaucratic and business elites. The misfortunes of the Japanese economy during the long recession not only allow us to draw important economic policy lessons, but also stimulate reflections on the disruptive role on economic policies caused by powerful vested interests when an economy needs broad and deep structural changes. The final part of the paper focuses on ways to tackle Japan's banking crisis. In particular, it explores the Scandinavian solution, which, mutatis mutandis, might serve Japanese policy-makers well.