841 resultados para Object based video
Resumo:
The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than generating ranked lists of websites.
Resumo:
Comunicación presentada en el XI Workshop of Physical Agents, Valencia, 9-10 septiembre 2010.
Resumo:
In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units). It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mobile agents in the scene. It is also necessary to integrate the vision module into a global system that operates in a complex environment by receiving images from multiple acquisition devices at video frequency. Offering relevant information to higher level systems, monitoring and making decisions in real time, it must accomplish a set of requirements, such as: time constraints, high availability, robustness, high processing speed and re-configurability. We have built a system able to represent and analyze the motion in video acquired by a multi-camera network and to process multi-source data in parallel on a multi-GPU architecture.
Resumo:
The use of RGB-D sensors for mapping and recognition tasks in robotics or, in general, for virtual reconstruction has increased in recent years. The key aspect of these kinds of sensors is that they provide both depth and color information using the same device. In this paper, we present a comparative analysis of the most important methods used in the literature for the registration of subsequent RGB-D video frames in static scenarios. The analysis begins by explaining the characteristics of the registration problem, dividing it into two representative applications: scene modeling and object reconstruction. Then, a detailed experimentation is carried out to determine the behavior of the different methods depending on the application. For both applications, we used standard datasets and a new one built for object reconstruction.
Resumo:
This paper describes a study and analysis of surface normal-base descriptors for 3D object recognition. Specifically, we evaluate the behaviour of descriptors in the recognition process using virtual models of objects created from CAD software. Later, we test them in real scenes using synthetic objects created with a 3D printer from the virtual models. In both cases, the same virtual models are used on the matching process to find similarity. The difference between both experiments is in the type of views used in the tests. Our analysis evaluates three subjects: the effectiveness of 3D descriptors depending on the viewpoint of camera, the geometry complexity of the model and the runtime used to do the recognition process and the success rate to recognize a view of object among the models saved in the database.
Resumo:
Traditional visual servoing systems do not deal with the topic of moving objects tracking. When these systems are employed to track a moving object, depending on the object velocity, visual features can go out of the image, causing the fail of the tracking task. This occurs specially when the object and the robot are both stopped and then the object starts the movement. In this work, we have employed a retina camera based on Address Event Representation (AER) in order to use events as input in the visual servoing system. The events launched by the camera indicate a pixel movement. Event visual information is processed only at the moment it occurs, reducing the response time of visual servoing systems when they are used to track moving objects.
Resumo:
Background: The pupillary light reflex characterizes the direct and consensual response of the eye to the perceived brightness of a stimulus. It has been used as indicator of both neurological and optic nerve pathologies. As with other eye reflexes, this reflex constitutes an almost instantaneous movement and is linked to activation of the same midbrain area. The latency of the pupillary light reflex is around 200 ms, although the literature also indicates that the fastest eye reflexes last 20 ms. Therefore, a system with sufficiently high spatial and temporal resolutions is required for accurate assessment. In this study, we analyzed the pupillary light reflex to determine whether any small discrepancy exists between the direct and consensual responses, and to ascertain whether any other eye reflex occurs before the pupillary light reflex. Methods: We constructed a binocular video-oculography system two high-speed cameras that simultaneously focused on both eyes. This was then employed to assess the direct and consensual responses of each eye using our own algorithm based on Circular Hough Transform to detect and track the pupil. Time parameters describing the pupillary light reflex were obtained from the radius time-variation. Eight healthy subjects (4 women, 4 men, aged 24–45) participated in this experiment. Results: Our system, which has a resolution of 15 microns and 4 ms, obtained time parameters describing the pupillary light reflex that were similar to those reported in previous studies, with no significant differences between direct and consensual reflexes. Moreover, it revealed an incomplete reflex blink and an upward eye movement at around 100 ms that may correspond to Bell’s phenomenon. Conclusions: Direct and consensual pupillary responses do not any significant temporal differences. The system and method described here could prove useful for further assessment of pupillary and blink reflexes. The resolution obtained revealed the existence reported here of an early incomplete blink and an upward eye movement.
Resumo:
Mathematical morphology addresses the problem of describing shapes in an n-dimensional space using the concepts of set theory. A series of standardized morphological operations are defined, and they are applied to the shapes to transform them using another shape called the structuring element. In an industrial environment, the process of manufacturing a piece is based on the manipulation of a primitive object via contact with a tool that transforms the object progressively to obtain the desired design. The analogy with the morphological operation of erosion is obvious. Nevertheless, few references about the relation between the morphological operations and the process of design and manufacturing can be found. The non-deterministic nature of classic mathematical morphology makes it very difficult to adapt their basic operations to the dynamics of concepts such as the ordered trajectory. A new geometric model is presented, inspired by the classic morphological paradigm, which can define objects and apply morphological operations that transform these objects. The model specializes in classic morphological operations, providing them with the determinism inherent in dynamic processes that require an order of application, as is the case for designing and manufacturing objects in professional computer-aided design and manufacturing (CAD/CAM) environments. The operators are boundary-based so that only the points in the frontier are handled. As a consequence, the process is more efficient and more suitable for use in CAD/CAM systems.
Resumo:
Plane model extraction from three-dimensional point clouds is a necessary step in many different applications such as planar object reconstruction, indoor mapping and indoor localization. Different RANdom SAmple Consensus (RANSAC)-based methods have been proposed for this purpose in recent years. In this study, we propose a novel method-based on RANSAC called Multiplane Model Estimation, which can estimate multiple plane models simultaneously from a noisy point cloud using the knowledge extracted from a scene (or an object) in order to reconstruct it accurately. This method comprises two steps: first, it clusters the data into planar faces that preserve some constraints defined by knowledge related to the object (e.g., the angles between faces); and second, the models of the planes are estimated based on these data using a novel multi-constraint RANSAC. We performed experiments in the clustering and RANSAC stages, which showed that the proposed method performed better than state-of-the-art methods.
Resumo:
In this project, we propose the implementation of a 3D object recognition system which will be optimized to operate under demanding time constraints. The system must be robust so that objects can be recognized properly in poor light conditions and cluttered scenes with significant levels of occlusion. An important requirement must be met: the system must exhibit a reasonable performance running on a low power consumption mobile GPU computing platform (NVIDIA Jetson TK1) so that it can be integrated in mobile robotics systems, ambient intelligence or ambient assisted living applications. The acquisition system is based on the use of color and depth (RGB-D) data streams provided by low-cost 3D sensors like Microsoft Kinect or PrimeSense Carmine. The range of algorithms and applications to be implemented and integrated will be quite broad, ranging from the acquisition, outlier removal or filtering of the input data and the segmentation or characterization of regions of interest in the scene to the very object recognition and pose estimation. Furthermore, in order to validate the proposed system, we will create a 3D object dataset. It will be composed by a set of 3D models, reconstructed from common household objects, as well as a handful of test scenes in which those objects appear. The scenes will be characterized by different levels of occlusion, diverse distances from the elements to the sensor and variations on the pose of the target objects. The creation of this dataset implies the additional development of 3D data acquisition and 3D object reconstruction applications. The resulting system has many possible applications, ranging from mobile robot navigation and semantic scene labeling to human-computer interaction (HCI) systems based on visual information.
Resumo:
Many applications including object reconstruction, robot guidance, and. scene mapping require the registration of multiple views from a scene to generate a complete geometric and appearance model of it. In real situations, transformations between views are unknown and it is necessary to apply expert inference to estimate them. In the last few years, the emergence of low-cost depth-sensing cameras has strengthened the research on this topic, motivating a plethora of new applications. Although they have enough resolution and accuracy for many applications, some situations may not be solved with general state-of-the-art registration methods due to the signal-to-noise ratio (SNR) and the resolution of the data provided. The problem of working with low SNR data, in general terms, may appear in any 3D system, then it is necessary to propose novel solutions in this aspect. In this paper, we propose a method, μ-MAR, able to both coarse and fine register sets of 3D points provided by low-cost depth-sensing cameras, despite it is not restricted to these sensors, into a common coordinate system. The method is able to overcome the noisy data problem by means of using a model-based solution of multiplane registration. Specifically, it iteratively registers 3D markers composed by multiple planes extracted from points of multiple views of the scene. As the markers and the object of interest are static in the scenario, the transformations obtained for the markers are applied to the object in order to reconstruct it. Experiments have been performed using synthetic and real data. The synthetic data allows a qualitative and quantitative evaluation by means of visual inspection and Hausdorff distance respectively. The real data experiments show the performance of the proposal using data acquired by a Primesense Carmine RGB-D sensor. The method has been compared to several state-of-the-art methods. The results show the good performance of the μ-MAR to register objects with high accuracy in presence of noisy data outperforming the existing methods.
Resumo:
Sensing techniques are important for solving problems of uncertainty inherent to intelligent grasping tasks. The main goal here is to present a visual sensing system based on range imaging technology for robot manipulation of non-rigid objects. Our proposal provides a suitable visual perception system of complex grasping tasks to support a robot controller when other sensor systems, such as tactile and force, are not able to obtain useful data relevant to the grasping manipulation task. In particular, a new visual approach based on RGBD data was implemented to help a robot controller carry out intelligent manipulation tasks with flexible objects. The proposed method supervises the interaction between the grasped object and the robot hand in order to avoid poor contact between the fingertips and an object when there is neither force nor pressure data. This new approach is also used to measure changes to the shape of an object’s surfaces and so allows us to find deformations caused by inappropriate pressure being applied by the hand’s fingers. Test was carried out for grasping tasks involving several flexible household objects with a multi-fingered robot hand working in real time. Our approach generates pulses from the deformation detection method and sends an event message to the robot controller when surface deformation is detected. In comparison with other methods, the obtained results reveal that our visual pipeline does not use deformations models of objects and materials, as well as the approach works well both planar and 3D household objects in real time. In addition, our method does not depend on the pose of the robot hand because the location of the reference system is computed from a recognition process of a pattern located place at the robot forearm. The presented experiments demonstrate that the proposed method accomplishes a good monitoring of grasping task with several objects and different grasping configurations in indoor environments.
Resumo:
The purpose of the current dissertation is to identify the features of effective interventions by exploring the experiences of youth with ASD who participate in such interventions, through two intervention studies (Studies 1 and 2) and one interview study (Study 3). Studies 1 and 2 were designed to support the development of social competence of youth with ASD through Structured Play with LEGO TM (Study 1, 12 youths with ASD, ages 7–12) and Minecraft TM (Study 2, 4 youths with ASD, ages 11–13). Over the course of the sessions, the play of the youth developed from parallel play (children playing alone, without interacting) to co-operative play (playing together with shared objectives). The results of Study 2 showed that rates of initiations and levels of engagement increased from the first session to the final session. In Study 3, 12 youths with ASD (ages 10–14) and at least one of their parents were interviewed to explore what children and their parents want from programs designed to improve social competence, which activities and practices were perceived to promote social competence by the participants, and which factors affected their decisions regarding these programs. The adolescents and parents looked for programs that supported social development and emotional wellbeing, but did not always have access to the programs they would have preferred, with factors such as cost and location reducing their options. Three overarching themes emerged through analysis of the three studies: (a) interests of the youth; (b) structure, both through interactions and instruction; and (c) naturalistic settings. Adolescents generally engage more willingly in interventions that incorporate their interests, such as play with Minecraft TM in Study 2. Additionally, Structured Play and structured instruction were crucial components of providing safe and supportive contexts for the development of social competence. Finally, skills learned in naturalistic settings tend to be applied more successfully in everyday situations. The themes are analysed through the lens of Vygotsky’s (1978) perspectives on learning, play, and development. Implications of the results for practitioners and researchers are discussed.
Resumo:
The human turn-taking system regulates the smooth and precise exchange of speaking turns during face-to-face interaction. Recent studies investigated the processing of ongoing turns during conversation by measuring the eye movements of noninvolved observers. The findings suggest that humans shift their gaze in anticipation to the next speaker before the start of the next turn. Moreover, there is evidence that the ability to timely detect turn transitions mainly relies on the lexico-syntactic content provided by the conversation. Consequently, patients with aphasia, who often experience deficits in both semantic and syntactic processing, might encounter difficulties to detect and timely shift their gaze at turn transitions. To test this assumption, we presented video vignettes of natural conversations to aphasic patients and healthy controls, while their eye movements were measured. The frequency and latency of event-related gaze shifts, with respect to the end of the current turn in the videos, were compared between the two groups. Our results suggest that, compared with healthy controls, aphasic patients have a reduced probability to shift their gaze at turn transitions but do not show significantly increased gaze shift latencies. In healthy controls, but not in aphasic patients, the probability to shift the gaze at turn transition was increased when the video content of the current turn had a higher lexico-syntactic complexity. Furthermore, the results from voxel-based lesion symptom mapping indicate that the association between lexico-syntactic complexity and gaze shift latency in aphasic patients is predicted by brain lesions located in the posterior branch of the left arcuate fasciculus. Higher lexico-syntactic processing demands seem to lead to a reduced gaze shift probability in aphasic patients. This finding may represent missed opportunities for patients to place their contributions during everyday conversation.
Resumo:
"Submitted to the Illinois General Assembly pursuant to section 21-1101(j) of the Illinois Public Utilities Act."