874 resultados para Recontextualised found object
Resumo:
As a key issue in spatial cognitive developmental research, the coding of object location plays an important role in children's cognitive development. The development of location coding is a precondition for children's adaptation to their environments, and the development of corresponding ability could enhance children's adaptation ability and improve their synthetic diathesis. In this paper, under the improved paradigm of object searching, 7-, 9- and 11-year-olds of urban primary school students were involved in two studies including the total of four experiments. The children were examined upon the ability to encode target location in terms of the distance between two landmarks, three points on a line, the intersection of two lines, or the corresponding points on two parallel lines. The experiments were designed to explore the primary school children's cognitive developmental process upon spatial object location and the correlative restricting factors. From the studies, the following conclusions were drawn: 1)The ability of 7-year-olds to represent target location in terms of the relationships of points and lines is in the inceptive stage and appears unstable. Meanwhile, the same ability of 9-year-olds is in a state of fast developing. The 9-year-olds' performance depends on how difficult the task is. It is stable when task is easy while unstable when task becomes difficult. The ability of 11-year-olds reaches much-developed state and the group's performance is independent of the difficulty of tasks. 2) The correlate coefficient is significant between Raven Standard Inference ability levels and the performance of representing target location in terms of the relationships of points. Those children with good performance in Raven Standard Inference Test have good performance in target location coding. The case is true for all different age groups. As of the task in terms of the relationships of lines, the correlate coefficient between Raven Standard Inference ability levels and children's performance of representing target location is found significant only for the 7-year-olds' group. The case is not true for the groups of 9- and 11-year-olds. It is also found that the correlate coefficient is significant between the sum of performance and Raven Standard Inference ability levels, and that is true for all age groups. 3) Effects from task variable exist upon children's above-mentioned cognitive performance. The effects are different according to different difficulty levels of tasks. Also, they are different according to the different ages. 4) The subjects who failed in the 'no cues for encoding given' situation were able to improve their performances when the cues of encoding were given. Therefore it is possible to improve the primary school children's corresponding cognitive performance by providing the cues of encoding. 5) Two kinds of efficient strategies were used to solve the problem. They are trial-comparison strategy and anticipation-directed strategy.
Resumo:
The time-courses of orthographic, phonological and semantic processing of Chinese characters were investigated systematically with multi-channel event-related potentials (ERPs). New evidences concerning whether phonology or semantics is processed first and whether phonology mediates semantic access were obtained, supporting and developing the new concept of repetition, overlapping, and alternating processing in Chinese character recognition. Statistic parameter mapping based on physiological double dissociation has been developed. Seven experiments were conducted: I) deciding which type of structure, left-right or non-left-right, the character displayed on the screen was; 2) deciding whether or not there was a vowel/a/in the pronunciation of the character; 3) deciding which classification, natural object or non-natural object, the character was; 4) deciding which color, red or green, the character was; 5) deciding which color, red or green, the non-character was; 6) fixing on the non-character; 7) fixing on the crosslet. The main results are: 1. N240 and P240:N240 and P240 localized at occipital and prefrontal respectively were found in experiments 1, 2, 3, and 4, but not in experiments 5, 6, or 7. The difference between the former 4 and the latter 3 experiments was only their stimuli: the former's were true Chinese characters while the latter's were non-characters or crosslet. Thus Chinese characters were related to these two components, which reflected unique processing of Chinese characters peaking at about 240 msec. 2. Basic visual feature analysis: In comparison with experiment 7 there was a common cognitive process in experiments 1, 2, 4, and 6 - basic visual feature analysis. The corresponding ERP amplitude increase in most sites started from about 60 msec. 3. Orthography: The ERP differences located at the main processing area of orthography (occipital) between experiments 1, 2, 3, 4 and experiment 5 started from about 130 msec. This was the category difference between Chinese characters and non-characters, which revealed that orthographic processing started from about 130 msec. The ERP differences between the experiments 1, 2, 3 and the experiment 4 occurred in 210-250, 230-240, and 190-250 msec respectively, suggesting orthography was processed again. These were the differences between language and non-language tasks, which revealed a higher level processing than that in the above mentioned 130 msec. All the phenomena imply that the orthographic processing does not finished in one time of processing; the second time of processing is not a simple repetition, but a higher level one. 4. Phonology: The ERPs of experiment 2 (phonological task) were significantly stronger than those of experiment 3 (semantic task) at the main processing areas of phonology (temporal and left prefrontal) starting from about 270 msec, which revealed phonologic processing. The ERP differences at left frontal between experiment 2 and experiment 1 (orthographic task) started from about 250 msec. When comparing phonological task with experiment 4 (character color decision), the ERP differences at left temporal and prefrontal started from about 220 msec. Thus phonological processing may start before 220 msec. 5. Semantic: The ERPs of experiment 3 (semantic task) were significantly stronger than those of experiment 2 (phonological task) at the main processing areas of semantics (parietal and occipital) starting from about 290 msec, which revealed semantic processing. The ERP differences at these areas between experiment 3 and experiment 4 (character color decision) started from about 270 msec. The ERP differences between experiment 3 and experiment 1 (orthographic task) started from about 260 msec. Thus semantic processing may start before 260 msec. 6. Overlapping of phonological and semantic processing: From about 270 to 350 msec, the ERPs of experiment 2 (phonological task) were significantly larger than those of experiment 3 (semantic task) at the main processing areas of phonology (temporal and left prefrontal); while from about 290-360 msec, the ERPs of experiment 3 were significantly larger than those of experiment 2 at the main processing areas of semantics (frontal, parietal, and occipital). Thus phonological processing may start earlier than semantic and their time-courses may alternate, which reveals parallel processing. 7. Semantic processing needs part phonology: When experiment 1 (orthographic task) served as baseline, the ERPs of experiment 2 and 3 (phonological and semantic tasks) significantly increased at the main processing areas of phonology (left temporal and frontal) starting from about 250 msec. The ERPs of experiment 3, besides, increased significantly at the main processing areas of semantics (parietal and frontal) starting from about 260 msec. When experiment 4 (character color decision) served as baseline, the ERPs of experiment 2 and 3 significantly increased at phonological areas (left temporal and frontal) starting from about 220 msec. The ERPs of experiment 3, similarly, increased significantly at semantic areas (parietal and frontal) starting from about270 msec. Hence, before semantic processing, a part of phonological information may be required. The conclusion could be got from above results in the present experimental conditions: 1. The basic visual feature processing starts from about 60 msec; 2. Orthographic processing starts from about 130 msec, and repeats at about 240 msec. The second processing is not simple repetition of the first one, but a higher level processing; 3. Phonological processing begins earlier than semantic, and their time-courses overlap; 4. Before semantic processing, a part of phonological information may be required; 5. The repetition, overlapping, and alternating of the orthographic, phonological and semantic processing of Chinese characters could exist in cognition. Thus the problem of whether phonology mediates semantics access is not a simple, but a complicated issue.
Resumo:
We consider the problem of matching model and sensory data features in the presence of geometric uncertainty, for the purpose of object localization and identification. The problem is to construct sets of model feature and sensory data feature pairs that are geometrically consistent given that there is uncertainty in the geometry of the sensory data features. If there is no geometric uncertainty, polynomial-time algorithms are possible for feature matching, yet these approaches can fail when there is uncertainty in the geometry of data features. Existing matching and recognition techniques which account for the geometric uncertainty in features either cannot guarantee finding a correct solution, or can construct geometrically consistent sets of feature pairs yet have worst case exponential complexity in terms of the number of features. The major new contribution of this work is to demonstrate a polynomial-time algorithm for constructing sets of geometrically consistent feature pairs given uncertainty in the geometry of the data features. We show that under a certain model of geometric uncertainty the feature matching problem in the presence of uncertainty is of polynomial complexity. This has important theoretical implications by demonstrating an upper bound on the complexity of the matching problem, an by offering insight into the nature of the matching problem itself. These insights prove useful in the solution to the matching problem in higher dimensional cases as well, such as matching three-dimensional models to either two or three-dimensional sensory data. The approach is based on an analysis of the space of feasible transformation parameters. This paper outlines the mathematical basis for the method, and describes the implementation of an algorithm for the procedure. Experiments demonstrating the method are reported.
Resumo:
We report a series of psychophysical experiments that explore different aspects of the problem of object representation and recognition in human vision. Contrary to the paradigmatic view which holds that the representations are three-dimensional and object-centered, the results consistently support the notion of view-specific representations that include at most partial depth information. In simulated experiments that involved the same stimuli shown to the human subjects, computational models built around two-dimensional multiple-view representations replicated our main psychophysical results, including patterns of generalization errors and the time course of perceptual learning.
Resumo:
A scheme for recognizing 3D objects from single 2D images is introduced. The scheme proceeds in two stages. In the first stage, the categorization stage, the image is compared to prototype objects. For each prototype, the view that most resembles the image is recovered, and, if the view is found to be similar to the image, the class identity of the object is determined. In the second stage, the identification stage, the observed object is compared to the individual models of its class, where classes are expected to contain objects with relatively similar shapes. For each model, a view that matches the image is sought. If such a view is found, the object's specific identity is determined. The advantage of categorizing the object before it is identified is twofold. First, the image is compared to a smaller number of models, since only models that belong to the object's class need to be considered. Second, the cost of comparing the image to each model in a classis very low, because correspondence is computed once for the whoel class. More specifically, the correspondence and object pose computed in the categorization stage to align the prototype with the image are reused in the identification stage to align the individual models with the image. As a result, identification is reduced to a series fo simple template comparisons. The paper concludes with an algorithm for constructing optimal prototypes for classes of objects.
Resumo:
In order to recognize an object in an image, we must determine the best transformation from object model to the image. In this paper, we show that for features from coplanar surfaces which undergo linear transformations in space, there exist projections invariant to the surface motions up to rotations in the image field. To use this property, we propose a new alignment approach to object recognition based on centroid alignment of corresponding feature groups. This method uses only a single pair of 2D model and data. Experimental results show the robustness of the proposed method against perturbations of feature positions.
Resumo:
This paper describes the main features of a view-based model of object recognition. The model tries to capture general properties to be expected in a biological architecture for object recognition. The basic module is a regularization network in which each of the hidden units is broadly tuned to a specific view of the object to be recognized.
Resumo:
How does the brain recognize three-dimensional objects? We trained monkeys to recognize computer rendered objects presented from an arbitrarily chosen training view, and subsequently tested their ability to generalize recognition for other views. Our results provide additional evidence in favor of with a recognition model that accomplishes view-invariant performance by storing a limited number of object views or templates together with the capacity to interpolate between the templates (Poggio and Edelman, 1990).
Resumo:
The need to generate new views of a 3D object from a single real image arises in several fields, including graphics and object recognition. While the traditional approach relies on the use of 3D models, we have recently introduced techniques that are applicable under restricted conditions but simpler. The approach exploits image transformations that are specific to the relevant object class and learnable from example views of other "prototypical" objects of the same class. In this paper, we introduce such a new technique by extending the notion of linear class first proposed by Poggio and Vetter. For linear object classes it is shown that linear transformations can be learned exactly from a basis set of 2D prototypical views. We demonstrate the approach on artificial objects and then show preliminary evidence that the technique can effectively "rotate" high- resolution face images from a single 2D view.
Resumo:
We present a unifying framework in which "object-independent" modes of variation are learned from continuous-time data such as video sequences. These modes of variation can be used as "generators" to produce a manifold of images of a new object from a single example of that object. We develop the framework in the context of a well-known example: analyzing the modes of spatial deformations of a scene under camera movement. Our method learns a close approximation to the standard affine deformations that are expected from the geometry of the situation, and does so in a completely unsupervised (i.e. ignorant of the geometry of the situation) fashion. We stress that it is learning a "parameterization", not just the parameter values, of the data. We then demonstrate how we have used the same framework to derive a novel data-driven model of joint color change in images due to common lighting variations. The model is superior to previous models of color change in describing non-linear color changes due to lighting.
Resumo:
While navigating in an environment, a vision system has to be able to recognize where it is and what the main objects in the scene are. In this paper we present a context-based vision system for place and object recognition. The goal is to identify familiar locations (e.g., office 610, conference room 941, Main Street), to categorize new environments (office, corridor, street) and to use that information to provide contextual priors for object recognition (e.g., table, chair, car, computer). We present a low-dimensional global image representation that provides relevant information for place recognition and categorization, and how such contextual information introduces strong priors that simplify object recognition. We have trained the system to recognize over 60 locations (indoors and outdoors) and to suggest the presence and locations of more than 20 different object types. The algorithm has been integrated into a mobile system that provides real-time feedback to the user.
Resumo:
This memo describes the initial results of a project to create a self-supervised algorithm for learning object segmentation from video data. Developmental psychology and computational experience have demonstrated that the motion segmentation of objects is a simpler, more primitive process than the detection of object boundaries by static image cues. Therefore, motion information provides a plausible supervision signal for learning the static boundary detection task and for evaluating performance on a test set. A video camera and previously developed background subtraction algorithms can automatically produce a large database of motion-segmented images for minimal cost. The purpose of this work is to use the information in such a database to learn how to detect the object boundaries in novel images using static information, such as color, texture, and shape. This work was funded in part by the Office of Naval Research contract #N00014-00-1-0298, in part by the Singapore-MIT Alliance agreement of 11/6/98, and in part by a National Science Foundation Graduate Student Fellowship.
Resumo:
Recovering a volumetric model of a person, car, or other object of interest from a single snapshot would be useful for many computer graphics applications. 3D model estimation in general is hard, and currently requires active sensors, multiple views, or integration over time. For a known object class, however, 3D shape can be successfully inferred from a single snapshot. We present a method for generating a ``virtual visual hull''-- an estimate of the 3D shape of an object from a known class, given a single silhouette observed from an unknown viewpoint. For a given class, a large database of multi-view silhouette examples from calibrated, though possibly varied, camera rigs are collected. To infer a novel single view input silhouette's virtual visual hull, we search for 3D shapes in the database which are most consistent with the observed contour. The input is matched to component single views of the multi-view training examples. A set of viewpoint-aligned virtual views are generated from the visual hulls corresponding to these examples. The 3D shape estimate for the input is then found by interpolating between the contours of these aligned views. When the underlying shape is ambiguous given a single view silhouette, we produce multiple visual hull hypotheses; if a sequence of input images is available, a dynamic programming approach is applied to find the maximum likelihood path through the feasible hypotheses over time. We show results of our algorithm on real and synthetic images of people.
Resumo:
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.
Resumo:
We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes.