997 resultados para Aisberg-2004-10
Resumo:
If we are provided a face database with only one example view per person, is it possible to recognize new views of them under a variety of different poses, especially views rotated in depth from the original example view? We investigate using prior knowledge about faces plus each single example view to generate virtual views of each person, or views of the face as seen from different poses. Prior knowledge of faces is represented in an example-based way, using 2D views of a prototype face seen rotating in depth. The synthesized virtual views are evaluated as example views in a view-based approach to pose-invariant face recognition. They are shown to improve the recognition rate over the scenario where only the single real view is used.
Resumo:
The correspondence problem in computer vision is basically a matching task between two or more sets of features. In this paper, we introduce a vectorized image representation, which is a feature-based representation where correspondence has been established with respect to a reference image. This representation has two components: (1) shape, or (x, y) feature locations, and (2) texture, defined as the image grey levels mapped onto the standard reference image. This paper explores an automatic technique for "vectorizing" face images. Our face vectorizer alternates back and forth between computation steps for shape and texture, and a key idea is to structure the two computations so that each one uses the output of the other. A hierarchical coarse-to-fine implementation is discussed, and applications are presented to the problems of facial feature detection and registration of two arbitrary faces.
Resumo:
This paper addresses the problem of nonlinear multivariate root finding. In an earlier paper we described a system called Newton which finds roots of systems of nonlinear equations using refinements of interval methods. The refinements are inspired by AI constraint propagation techniques. Newton is competative with continuation methods on most benchmarks and can handle a variety of cases that are infeasible for continuation methods. This paper presents three "cuts" which we believe capture the essential theoretical ideas behind the success of Newton. This paper describes the cuts in a concise and abstract manner which, we believe, makes the theoretical content of our work more apparent. Any implementation will need to adopt some heuristic control mechanism. Heuristic control of the cuts is only briefly discussed here.
Resumo:
MIT SchMUSE (pronounced "shmooz") is a concurrent, distributed, delegation-based object-oriented interactive environment with persistent storage. It is designed to run in a "capricious" network environment, where servers can migrate from site to site and can regularly become unavailable. Our design introduces a new form of unique identifiers called "globally unique tickets" that provide globally unique time/space stamps for objects and classes without being location specific. Object location is achieved by a distributed hierarchical lazy lookup mechanism that we call "realm resolution." We also introduce a novel mechanism called "message deferral" for enhanced reliability in the face of remote delegation. We conclude with a comparison to related work and a projection of future work on MIT SchMUSE.
Resumo:
Template matching by means of cross-correlation is common practice in pattern recognition. However, its sensitivity to deformations of the pattern and the broad and unsharp peaks it produces are significant drawbacks. This paper reviews some results on how these shortcomings can be removed. Several techniques (Matched Spatial Filters, Synthetic Discriminant Functions, Principal Components Projections and Reconstruction Residuals) are reviewed and compared on a common task: locating eyes in a database of faces. New variants are also proposed and compared: least squares Discriminant Functions and the combined use of projections on eigenfunctions and the corresponding reconstruction residuals. Finally, approximation networks are introduced in an attempt to improve filter design by the introduction of nonlinearity.
Resumo:
The Saliency Network proposed by Shashua and Ullman is a well-known approach to the problem of extracting salient curves from images while performing gap completion. This paper analyzes the Saliency Network. The Saliency Network is attractive for several reasons. First, the network generally prefers long and smooth curves over short or wiggly ones. While computing saliencies, the network also fills in gaps with smooth completions and tolerates noise. Finally, the network is locally connected, and its size is proportional to the size of the image. Nevertheless, our analysis reveals certain weaknesses with the method. In particular, we show cases in which the most salient element does not lie on the perceptually most salient curve. Furthermore, in some cases the saliency measure changes its preferences when curves are scaled uniformly. Also, we show that for certain fragmented curves the measure prefers large gaps over a few small gaps of the same total size. In addition, we analyze the time complexity required by the method. We show that the number of steps required for convergence in serial implementations is quadratic in the size of the network, and in parallel implementations is linear in the size of the network. We discuss problems due to coarse sampling of the range of possible orientations. We show that with proper sampling the complexity of the network becomes cubic in the size of the network. Finally, we consider the possibility of using the Saliency Network for grouping. We show that the Saliency Network recovers the most salient curve efficiently, but it has problems with identifying any salient curve other than the most salient one.
Resumo:
The MOS transistor physical model as described in [3] is presented here as a network model. The goal is to obtain an accurate model, suitable for simulation, free from certain problems reported in the literature [13], and conceptually as simple as possible. To achieve this goal the original model had to be extended and modified. The paper presents the derivation of the network model from physical equations, including the corrections which are required for simulation and which compensate for simplifications introduced in the original physical model. Our intrinsic MOS model consists of three nonlinear voltage-controlled capacitors and a dependent current source. The charges of the capacitors and the current of the current source are functions of the voltages $V_{gs}$, $V_{bs}$, and $V_{ds}$. The complete model consists of the intrinsic model plus the parasitics. The apparent simplicity of the model is a result of hiding information in the characteristics of the nonlinear components. The resulted network model has been checked by simulation and analysis. It is shown that the network model is suitable for simulation: It is defined for any value of the voltages; the functions involved are continuous and satisfy Lipschitz conditions with no jumps at region boundaries; Derivatives have been computed symbolically and are available for use by the Newton-Raphson method. The model"s functions can be measured from the terminals. It is also shown that small channel effects can be included in the model. Higher frequency effects can be modeled by using a network consisting of several sections of the basic lumped model. Future plans include a detailed comparison of the network model with models such as SPICE level 3 and a comparison of the multi- section higher frequency model with experiments.
Resumo:
I describe an exploration criterion that attempts to minimize the error of a learner by minimizing its estimated squared bias. I describe experiments with locally-weighted regression on two simple kinematics problems, and observe that this "bias-only" approach outperforms the more common "variance-only" exploration approach, even in the presence of noise.
Resumo:
Binocular rivalry refers to the alternating perceptions experienced when two dissimilar patterns are stereoscopically viewed. To study the neural mechanism that underlies such competitive interactions, single cells were recorded in the visual areas V1, V2, and V4, while monkeys reported the perceived orientation of rivaling sinusoidal grating patterns. A number of neurons in all areas showed alternating periods of excitation and inhibition that correlated with the perceptual dominance and suppression of the cell"s preferred orientation. The remaining population of cells were not influenced by whether or not the optimal stimulus orientation was perceptually suppressed. Response modulation during rivalry was not correlated with cell attributes such as monocularity, binocularity, or disparity tuning. These results suggest that the awareness of a visual pattern during binocular rivalry arises through interactions between neurons at different levels of visual pathways, and that the site of suppression is unlikely to correspond to a particular visual area, as often hypothesized on the basis of psychophysical observations. The cell-types of modulating neurons and their overwhelming preponderance in higher rather than in early visual areas also suggests -- together with earlier psychophysical evidence -- the possibility of a common mechanism underlying rivalry as well as other bistable percepts, such as those experienced with ambiguous figures.
Resumo:
When stimuli presented to the two eyes differ considerably, stable binocular fusion fails, and the subjective percept alternates between the two monocular images, a phenomenon known as binocular rivalry. The influence of attention over this perceptual switching has long been studied, and although there is evidence that attention can affect the alternation rate, its role in the overall dynamics of the rivalry process remains unclear. The present study investigated the relationship between the attention paid to the rivalry stimulus, and the dynamics of the perceptual alternations. Specifically, the temporal course of binocular rivalry was studied as the subjects performed difficult nonvisual and visual concurrent tasks, directing their attention away from the rivalry stimulus. Periods of complete perceptual dominance were compared for the attended condition, where the subjects reported perceptual changes, and the unattended condition, where one of the simultaneous tasks was performed. During both the attended and unattended conditions, phases of rivalry dominance were obtained by analyzing the subject"s optokinetic nystagmus recorded by an electrooculogram, where the polarity of the nystagmus served as an objective indicator of the perceived direction of motion. In all cases, the presence of a difficult concurrent task had little or no effect on the statistics of the alternations, as judged by two classic tests of rivalry, although the overall alternation rate showed a small but significant increase with the concurrent task. It is concluded that the statistical patterns of rivalry alternations are not governed by attentional shifts or decision-making on the part of the subject.
Resumo:
We provide a theory of the three-dimensional interpretation of a class of line-drawings called p-images, which are interpreted by the human vision system as parallelepipeds ("boxes"). Despite their simplicity, p-images raise a number of interesting vision questions: *Why are p-images seen as three-dimensional objects? Why not just as flatimages? *What are the dimensions and pose of the perceived objects? *Why are some p-images interpreted as rectangular boxes, while others are seen as skewed, even though there is no obvious distinction between the images? *When p-images are rotated in three dimensions, why are the image-sequences perceived as distorting objects---even though structure-from-motion would predict that rigid objects would be seen? *Why are some three-dimensional parallelepipeds seen as radically different when viewed from different viewpoints? We show that these and related questions can be answered with the help of a single mathematical result and an associated perceptual principle. An interesting special case arises when there are right angles in the p-image. This case represents a singularity in the equations and is mystifying from the vision point of view. It would seem that (at least in this case) the vision system does not follow the ordinary rules of geometry but operates in accordance with other (and as yet unknown) principles.
Resumo:
Compliant control is a standard method for performing fine manipulation tasks, like grasping and assembly, but it requires estimation of the state of contact between the robot arm and the objects involved. Here we present a method to learn a model of the movement from measured data. The method requires little or no prior knowledge and the resulting model explicitly estimates the state of contact. The current state of contact is viewed as the hidden state variable of a discrete HMM. The control dependent transition probabilities between states are modeled as parametrized functions of the measurement We show that their parameters can be estimated from measurements concurrently with the estimation of the parameters of the movement in each state of contact. The learning algorithm is a variant of the EM procedure. The E step is computed exactly; solving the M step exactly would require solving a set of coupled nonlinear algebraic equations in the parameters. Instead, gradient ascent is used to produce an increase in likelihood.
Resumo:
We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition -- the classification of handwritten digits.
Resumo:
We present techniques for computing upper and lower bounds on the likelihoods of partial instantiations of variables in sigmoid and noisy-OR networks. The bounds determine confidence intervals for the desired likelihoods and become useful when the size of the network (or clique size) precludes exact computations. We illustrate the tightness of the obtained bounds by numerical experiments.
Resumo:
Does knowledge of language consist of symbolic rules? How do children learn and use their linguistic knowledge? To elucidate these questions, we present a computational model that acquires phonological knowledge from a corpus of common English nouns and verbs. In our model the phonological knowledge is encapsulated as boolean constraints operating on classical linguistic representations of speech sounds in term of distinctive features. The learning algorithm compiles a corpus of words into increasingly sophisticated constraints. The algorithm is incremental, greedy, and fast. It yields one-shot learning of phonological constraints from a few examples. Our system exhibits behavior similar to that of young children learning phonological knowledge. As a bonus the constraints can be interpreted as classical linguistic rules. The computational model can be implemented by a surprisingly simple hardware mechanism. Our mechanism also sheds light on a fundamental AI question: How are signals related to symbols?