895 resultados para Object recognition
Resumo:
The automatic interpretation of conventional traffic signs is very complex and time consuming. The paper concerns an automatic warning system for driving assistance. It does not interpret the standard traffic signs on the roadside; the proposal is to incorporate into the existing signs another type of traffic sign whose information will be more easily interpreted by a processor. The type of information to be added is profuse and therefore the most important object is the robustness of the system. The basic proposal of this new philosophy is that the co-pilot system for automatic warning and driving assistance can interpret with greater ease the information contained in the new sign, whilst the human driver only has to interpret the "classic" sign. One of the codings that has been tested with good results and which seems to us easy to implement is that which has a rectangular shape and 4 vertical bars of different colours. The size of these signs is equivalent to the size of the conventional signs (approximately 0.4 m2). The colour information from the sign can be easily interpreted by the proposed processor and the interpretation is much easier and quicker than the information shown by the pictographs of the classic signs
Resumo:
La visió és probablement el nostre sentit més dominant a partir del qual derivem la majoria d'informació del món que ens envolta. A través de la visió podem percebre com són les coses, on són i com es mouen. En les imatges que percebem amb el nostre sistema de visió podem extreure'n característiques com el color, la textura i la forma, i gràcies a aquesta informació som capaços de reconèixer objectes fins i tot quan s'observen sota unes condicions totalment diferents. Per exemple, som capaços de distingir un mateix objecte si l'observem des de diferents punts de vista, distància, condicions d'il·luminació, etc. La Visió per Computador intenta emular el sistema de visió humà mitjançant un sistema de captura d'imatges, un ordinador, i un conjunt de programes. L'objectiu desitjat no és altre que desenvolupar un sistema que pugui entendre una imatge d'una manera similar com ho realitzaria una persona. Aquesta tesi es centra en l'anàlisi de la textura per tal de realitzar el reconeixement de superfícies. La motivació principal és resoldre el problema de la classificació de superfícies texturades quan han estat capturades sota diferents condicions, com ara distància de la càmera o direcció de la il·luminació. D'aquesta forma s'aconsegueix reduir els errors de classificació provocats per aquests canvis en les condicions de captura. En aquest treball es presenta detalladament un sistema de reconeixement de textures que ens permet classificar imatges de diferents superfícies capturades en diferents condicions. El sistema proposat es basa en un model 3D de la superfície (que inclou informació de color i forma) obtingut mitjançant la tècnica coneguda com a 4-Source Colour Photometric Stereo (CPS). Aquesta informació és utilitzada posteriorment per un mètode de predicció de textures amb l'objectiu de generar noves imatges 2D de les textures sota unes noves condicions. Aquestes imatges virtuals que es generen seran la base del nostre sistema de reconeixement, ja que seran utilitzades com a models de referència per al nostre classificador de textures. El sistema de reconeixement proposat combina les Matrius de Co-ocurrència per a l'extracció de característiques de textura, amb la utilització del Classificador del veí més proper. Aquest classificador ens permet al mateix temps aproximar la direcció d'il·luminació present en les imatges que s'utilitzen per testejar el sistema de reconeixement. És a dir, serem capaços de predir l'angle d'il·luminació sota el qual han estat capturades les imatges de test. Els resultats obtinguts en els diferents experiments que s'han realitzat demostren la viabilitat del sistema de predicció de textures, així com del sistema de reconeixement.
Resumo:
Garment information tracking is required for clean room garment management. In this paper, we present a camera-based robust system with implementation of Optical Character Reconition (OCR) techniques to fulfill garment label recognition. In the system, a camera is used for image capturing; an adaptive thresholding algorithm is employed to generate binary images; Connected Component Labelling (CCL) is then adopted for object detection in the binary image as a part of finding the ROI (Region of Interest); Artificial Neural Networks (ANNs) with the BP (Back Propagation) learning algorithm are used for digit recognition; and finally the system is verified by a system database. The system has been tested. The results show that it is capable of coping with variance of lighting, digit twisting, background complexity, and font orientations. The system performance with association to the digit recognition rate has met the design requirement. It has achieved real-time and error-free garment information tracking during the testing.
Resumo:
Light Detection And Ranging (LIDAR) is an important modality in terrain and land surveying for many environmental, engineering and civil applications. This paper presents the framework for a recently developed unsupervised classification algorithm called Skewness Balancing for object and ground point separation in airborne LIDAR data. The main advantages of the algorithm are threshold-freedom and independence from LIDAR data format and resolution, while preserving object and terrain details. The framework for Skewness Balancing has been built in this contribution with a prediction model in which unknown LIDAR tiles can be categorised as “hilly” or “moderate” terrains. Accuracy assessment of the model is carried out using cross-validation with an overall accuracy of 95%. An extension to the algorithm is developed to address the overclassification issue for hilly terrain. For moderate terrain, the results show that from the classified tiles detached objects (buildings and vegetation) and attached objects (bridges and motorway junctions) are separated from bare earth (ground, roads and yards) which makes Skewness Balancing ideal to be integrated into geographic information system (GIS) software packages.
Resumo:
A new class of shape features for region classification and high-level recognition is introduced. The novel Randomised Region Ray (RRR) features can be used to train binary decision trees for object category classification using an abstract representation of the scene. In particular we address the problem of human detection using an over segmented input image. We therefore do not rely on pixel values for training, instead we design and train specialised classifiers on the sparse set of semantic regions which compose the image. Thanks to the abstract nature of the input, the trained classifier has the potential to be fast and applicable to extreme imagery conditions. We demonstrate and evaluate its performance in people detection using a pedestrian dataset.
Resumo:
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).
Resumo:
Pictorial representations of three-dimensional objects are often used to investigate animal cognitive abilities; however, investigators rarely evaluate whether the animals conceptualize the two-dimensional image as the object it is intended to represent. We tested for picture recognition in lion-tailed macaques by presenting five monkeys with digitized images of familiar foods on a touch screen. Monkeys viewed images of two different foods and learned that they would receive a piece of the one they touched first. After demonstrating that they would reliably select images of their preferred foods on one set of foods, animals were transferred to images of a second set of familiar foods. We assumed that if the monkeys recognized the images, they would spontaneously select images of their preferred foods on the second set of foods. Three monkeys selected images of their preferred foods significantly more often than chance on their first transfer session. In an additional test of the monkeys' picture recognition abilities, animals were presented with pairs of food images containing a medium-preference food paired with either a high-preference food or a low-preference food. The same three monkeys selected the medium-preference foods significantly more often when they were paired with low-preference foods and significantly less often when those same foods were paired with high-preference foods. Our novel design provided convincing evidence that macaques recognized the content of two-dimensional images on a touch screen. Results also suggested that the animals understood the connection between the two-dimensional images and the three-dimensional objects they represented.
Resumo:
PURPOSE: We aimed at further elucidating whether aphasic patients' difficulties in understanding non-canonical sentence structures, such as Passive or Object-Verb-Subject sentences, can be attributed to impaired morphosyntactic cue recognition, and to problems in integrating competing interpretations. METHODS: A sentence-picture matching task with canonical and non-canonical spoken sentences was performed using concurrent eye tracking. Accuracy, reaction time, and eye tracking data (fixations) of 50 healthy subjects and 12 aphasic patients were analysed. RESULTS: Patients showed increased error rates and reaction times, as well as delayed fixation preferences for target pictures in non-canonical sentences. Patients' fixation patterns differed from healthy controls and revealed deficits in recognizing and immediately integrating morphosyntactic cues. CONCLUSION: Our study corroborates the notion that difficulties in understanding syntactically complex sentences are attributable to a processing deficit encompassing delayed and therefore impaired recognition and integration of cues, as well as increased competition between interpretations.
Resumo:
Federal Highway Administration, Office of Safety and Traffic Operations Research and Development, McLean, Va.
Resumo:
A recently proposed colour based tracking algorithm has been established to track objects in real circumstances [Zivkovic, Z., Krose, B. 2004. An EM-like algorithm for color-histogram-based object tracking. In: Proc, IEEE Conf. on Computer Vision and Pattern Recognition, pp. 798-803]. To improve the performance of this technique in complex scenes, in this paper we propose a new algorithm for optimally adapting the ellipse outlining the objects of interest. This paper presents a Lagrangian based method to integrate a regularising component into the covariance matrix to be computed. Technically, we intend to reduce the residuals between the estimated probability distribution and the expected one. We argue that, by doing this, the shape of the ellipse can be properly adapted in the tracking stage. Experimental results show that the proposed method has favourable performance in shape adaption and object localisation.
Resumo:
The present study examines the effect of the goodness of view on the minimal exposure time required to recognize depth-rotated objects. In a previous study, Verfaillie and Boutsen (1995) derived scales of goodness of view, using a new corpus of images of depth-rotated objects. In the present experiment, a subset of this corpus (five views of 56 objects) is used to determine the recognition exposure time for each view, by increasing exposure time across successive presentations until the object is recognized. The results indicate that, for two thirds of the objects, good views are recognized more frequently and have lower recognition exposure times than bad views.
Resumo:
We present a video-based system which interactively captures the geometry of a 3D object in the form of a point cloud, then recognizes and registers known objects in this point cloud in a matter of seconds (fig. 1). In order to achieve interactive speed, we exploit both efficient inference algorithms and parallel computation, often on a GPU. The system can be broken down into two distinct phases: geometry capture, and object inference. We now discuss these in further detail. © 2011 IEEE.
Resumo:
Inhibition of return (IOR) effects, in which participants detect a target in a cued box more slowly than one in an uncued box, suggest that behavior is aided by inhibition of recently attended irrelevant locations. To investigate the controversial question of whether inhibition can be applied to object identity in these tasks, in the present research we presented faces upright or inverted during cue and/or target sequences. IOR was greater when both cue and target faces were upright than when cue and/or target faces were inverted. Because the only difference between the conditions was the ease of facial recognition, this result indicates that inhibition was applied to object identity. Interestingly, inhibition of object identity affected IOR both whenencoding a cue face andretrieving information about a target face. Accordingly, we propose that episodic retrieval of inhibition associated with object identity may mediate behavior in cuing tasks.
Resumo:
This dissertation establishes a novel system for human face learning and recognition based on incremental multilinear Principal Component Analysis (PCA). Most of the existing face recognition systems need training data during the learning process. The system as proposed in this dissertation utilizes an unsupervised or weakly supervised learning approach, in which the learning phase requires a minimal amount of training data. It also overcomes the inability of traditional systems to adapt to the testing phase as the decision process for the newly acquired images continues to rely on that same old training data set. Consequently when a new training set is to be used, the traditional approach will require that the entire eigensystem will have to be generated again. However, as a means to speed up this computational process, the proposed method uses the eigensystem generated from the old training set together with the new images to generate more effectively the new eigensystem in a so-called incremental learning process. In the empirical evaluation phase, there are two key factors that are essential in evaluating the performance of the proposed method: (1) recognition accuracy and (2) computational complexity. In order to establish the most suitable algorithm for this research, a comparative analysis of the best performing methods has been carried out first. The results of the comparative analysis advocated for the initial utilization of the multilinear PCA in our research. As for the consideration of the issue of computational complexity for the subspace update procedure, a novel incremental algorithm, which combines the traditional sequential Karhunen-Loeve (SKL) algorithm with the newly developed incremental modified fast PCA algorithm, was established. In order to utilize the multilinear PCA in the incremental process, a new unfolding method was developed to affix the newly added data at the end of the previous data. The results of the incremental process based on these two methods were obtained to bear out these new theoretical improvements. Some object tracking results using video images are also provided as another challenging task to prove the soundness of this incremental multilinear learning method.