941 resultados para Pattern Recognition, Visual
Resumo:
This thesis presents a biologically plausible model of an attentional mechanism for forming position- and scale-invariant representations of objects in the visual world. The model relies on a set of control neurons to dynamically modify the synaptic strengths of intra-cortical connections so that information from a windowed region of primary visual cortex (Vl) is selectively routed to higher cortical areas. Local spatial relationships (i.e., topography) within the attentional window are preserved as information is routed through the cortex, thus enabling attended objects to be represented in higher cortical areas within an object-centered reference frame that is position and scale invariant. The representation in V1 is modeled as a multiscale stack of sample nodes with progressively lower resolution at higher eccentricities. Large changes in the size of the attentional window are accomplished by switching between different levels of the multiscale stack, while positional shifts and small changes in scale are accomplished by translating and rescaling the window within a single level of the stack. The control signals for setting the position and size of the attentional window are hypothesized to originate from neurons in the pulvinar and in the deep layers of visual cortex. The dynamics of these control neurons are governed by simple differential equations that can be realized by neurobiologically plausible circuits. In pre-attentive mode, the control neurons receive their input from a low-level "saliency map" representing potentially interesting regions of a scene. During the pattern recognition phase, control neurons are driven by the interaction between top-down (memory) and bottom-up (retinal input) sources. The model respects key neurophysiological, neuroanatomical, and psychophysical data relating to attention, and it makes a variety of experimentally testable predictions.
Resumo:
Ultrafast temporal pattern generation and recognition with femtosecond laser technology is presented, analyzed, and experimentally implemented. Ultrafast temporal pattern generation and recognition are realized by taking advantage of two well-known techniques: the space-time conversion technique and the ultrafast pulse measurement technique. Here the temporal pattern for the designed multiple pulses, optimized with a preassumed Gaussian spectral distribution of an ultrashort pulse, is described. With the simulation of a Gaussian spectral distribution, we realize that the uniformity of the generated multiple ultrafast temporal pulses is relevant to the repeated number of modulation periods in the mask in the spectral plane. Moreover, the change of Gaussian spectral phases with the wavelengths in the modulated phase plate is considered. Experiments of ultrafast temporal pattern recognition by the frequency-resolved optical gating (FROG) characterization technique are also given. (C) 2004 Society of Photo-Optical Instrumentation Engineers.
Resumo:
First responders are in danger when they perform tasks in damaged buildings after earthquakes. Structural collapse due to the failure of critical load bearing structural members (e.g. columns) during a post-earthquake event such as an aftershock can make first responders victims, considering they are unable to assess the impact of the damage inflicted in load bearing members. The writers here propose a method that can provide first responders with a crude but quick estimate of the damage inflicted in load bearing members. Under the proposed method, critical structural members (reinforced concrete columns in this study) are identified from digital visual data and the damage superimposed on these structural members is detected with the help of Visual Pattern Recognition techniques. The correlation of the two (e.g. the position, orientation and size of a crack on the surface of a column) is used to query a case-based reasoning knowledge base, which contains apriori classified states of columns according to the damage inflicted on them. When query results indicate the column's damage state is severe, the method assumes that a structural collapse is likely and first responders are warned to evacuate.
Resumo:
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems. © 2013 IEEE.
Resumo:
Cook, Anthony; Gibbens, M.J., (2006) 'Constructing Visual Taxonomies by Shape', 18th International Conference on Pattern Recognition (ICPR'06) Volume 2, pp. 732 - 735 RAE2008
Resumo:
In order to use virtual reality as a sport analysis tool, we need to be sure that an immersed athlete reacts realistically in a virtual environment. This has been validated for a real handball goalkeeper facing a virtual thrower. However, we currently ignore which visual variables induce a realistic motor behavior of the immersed handball goalkeeper. In this study, we used virtual reality to dissociate the visual information related to the movements of the player from the visual information related to the trajectory of the ball. Thus, the aim is to evaluate the relative influence of these different visual information sources on the goalkeeper's motor behavior. We tested 10 handball goalkeepers who had to predict the final position of the virtual ball in the goal when facing the following: only the throwing action of the attacking player (TA condition), only the resulting ball trajectory (BA condition), and both the throwing action of the attacking player and the resulting ball trajectory (TB condition). Here we show that performance was better in the BA and TB conditions, but contrary to expectations, performance was substantially worse in the TA condition. A significant effect of ball landing zone does, however, suggest that the relative importance between visual information from the player and the ball depends on the targeted zone in the goal. In some cases, body-based cues embedded in the throwing actions may have a minor influence on the ball trajectory and vice versa. Kinematics analysis was then combined with these results to determine why such differences occur depending on the ball landing zone and consequently how it can clarify the role of different sources of visual information on the motor behavior of an athlete immersed in a virtual environment.
Resumo:
This paper investigated using lip movements as a behavioural biometric for person authentication. The system was trained, evaluated and tested using the XM2VTS dataset, following the Lausanne Protocol configuration II. Features were selected from the DCT coefficients of the greyscale lip image. This paper investigated the number of DCT coefficients selected, the selection process, and static and dynamic feature combinations. Using a Gaussian Mixture Model - Universal Background Model framework an Equal Error Rate of 2.20% was achieved during evaluation and on an unseen test set a False Acceptance Rate of 1.7% and False Rejection Rate of 3.0% was achieved. This compares favourably with face authentication results on the same dataset whilst not being susceptible to spoofing attacks.
Resumo:
Models of visual perception are based on image representations in cortical area V1 and higher areas which contain many cell layers for feature extraction. Basic simple, complex and end-stopped cells provide input for line, edge and keypoint detection. In this paper we present an improved method for multi-scale line/edge detection based on simple and complex cells. We illustrate the line/edge representation for object reconstruction, and we present models for multi-scale face (object) segregation and recognition that can be embedded into feedforward dorsal and ventral data streams (the “what” and “where” subsystems) with feedback streams from higher areas for obtaining translation, rotation and scale invariance.
Resumo:
Object recognition requires that templates with canonical views are stored in memory. Such templates must somehow be normalised. In this paper we present a novel method for obtaining 2D translation, rotation and size invariance. Cortical simple, complex and end-stopped cells provide multi-scale maps of lines, edges and keypoints. These maps are combined such that objects are characterised. Dynamic routing in neighbouring neural layers allows feature maps of input objects and stored templates to converge. We illustrate the construction of group templates and the invariance method for object categorisation and recognition in the context of a cortical architecture, which can be applied in computer vision.
Resumo:
Positioning a robot with respect to objects by using data provided by a camera is a well known technique called visual servoing. In order to perform a task, the object must exhibit visual features which can be extracted from different points of view. Then, visual servoing is object-dependent as it depends on the object appearance. Therefore, performing the positioning task is not possible in presence of nontextured objets or objets for which extracting visual features is too complex or too costly. This paper proposes a solution to tackle this limitation inherent to the current visual servoing techniques. Our proposal is based on the coded structured light approach as a reliable and fast way to solve the correspondence problem. In this case, a coded light pattern is projected providing robust visual features independently of the object appearance
Resumo:
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).