15 resultados para Audio-visual Speech Recognition, Visual Feature Extraction, Free-parts, Monolithic, ROI
em Universidad de Alicante
Resumo:
This thesis explores the role of multimodality in language learners’ comprehension, and more specifically, the effects on students’ audio-visual comprehension when different orchestrations of modes appear in the visualization of vodcasts. Firstly, I describe the state of the art of its three main areas of concern, namely the evolution of meaning-making, Information and Communication Technology (ICT), and audio-visual comprehension. One of the most important contributions in the theoretical overview is the suggested integrative model of audio-visual comprehension, which attempts to explain how students process information received from different inputs. Secondly, I present a study based on the following research questions: ‘Which modes are orchestrated throughout the vodcasts?’, ‘Are there any multimodal ensembles that are more beneficial for students’ audio-visual comprehension?’, and ‘What are the students’ attitudes towards audio-visual (e.g., vodcasts) compared to traditional audio (e.g., audio tracks) comprehension activities?’. Along with these research questions, I have formulated two hypotheses: Audio-visual comprehension improves when there is a greater number of orchestrated modes, and students have a more positive attitude towards vodcasts than traditional audios when carrying out comprehension activities. The study includes a multimodal discourse analysis, audio-visual comprehension tests, and students’ questionnaires. The multimodal discourse analysis of two British Council’s language learning vodcasts, entitled English is GREAT and Camden Fashion, using ELAN as the multimodal annotation tool, shows that there are a variety of multimodal ensembles of two, three and four modes. The audio-visual comprehension tests were given to 40 Spanish students, learning English as a foreign language, after the visualization of vodcasts. These comprehension tests contain questions related to specific orchestrations of modes appearing in the vodcasts. The statistical analysis of the test results, using repeated-measures ANOVA, reveal that students obtain better audio-visual comprehension results when the multimodal ensembles are constituted by a greater number of orchestrated modes. Finally, the data compiled from the questionnaires, conclude that students have a more positive attitude towards vodcasts in comparison to traditional audio listenings. Results from the audio-visual comprehension tests and questionnaires prove the two hypotheses of this study.
Resumo:
Paper submitted to the 39th International Symposium on Robotics ISR 2008, Seoul, South Korea, October 15-17, 2008.
Resumo:
In many classification problems, it is necessary to consider the specific location of an n-dimensional space from which features have been calculated. For example, considering the location of features extracted from specific areas of a two-dimensional space, as an image, could improve the understanding of a scene for a video surveillance system. In the same way, the same features extracted from different locations could mean different actions for a 3D HCI system. In this paper, we present a self-organizing feature map able to preserve the topology of locations of an n-dimensional space in which the vector of features have been extracted. The main contribution is to implicitly preserving the topology of the original space because considering the locations of the extracted features and their topology could ease the solution to certain problems. Specifically, the paper proposes the n-dimensional constrained self-organizing map preserving the input topology (nD-SOM-PINT). Features in adjacent areas of the n-dimensional space, used to extract the feature vectors, are explicitly in adjacent areas of the nD-SOM-PINT constraining the neural network structure and learning. As a study case, the neural network has been instantiate to represent and classify features as trajectories extracted from a sequence of images into a high level of semantic understanding. Experiments have been thoroughly carried out using the CAVIAR datasets (Corridor, Frontal and Inria) taken into account the global behaviour of an individual in order to validate the ability to preserve the topology of the two-dimensional space to obtain high-performance classification for trajectory classification in contrast of non-considering the location of features. Moreover, a brief example has been included to focus on validate the nD-SOM-PINT proposal in other domain than the individual trajectory. Results confirm the high accuracy of the nD-SOM-PINT outperforming previous methods aimed to classify the same datasets.
Resumo:
Comunicación presentada en el X Workshop of Physical Agents, Cáceres, 10-11 septiembre 2009.
Resumo:
Traditional visual servoing systems have been widely studied in the last years. These systems control the position of the camera attached to the robot end-effector guiding it from any position to the desired one. These controllers can be improved by using the event-based control paradigm. The system proposed in this paper is based on the idea of activating the visual controller only when something significant has occurred in the system (e.g. when any visual feature can be loosen because it is going outside the frame). Different event triggers have been defined in the image space in order to activate or deactivate the visual controller. The tests implemented to validate the proposal have proved that this new scheme avoids visual features to go out of the image whereas the system complexity is reduced considerably. Events can be used in the future to change different parameters of the visual servoing systems.
Resumo:
Las teorías cognitivas han demostrado que el pensamiento humano se encuentra corporeizado; es decir, que accedemos a la realidad mediante nuestros sentidos y no podemos huir de ellos. Para entender y manejar conceptos abstractos utilizamos proyecciones metafóricas basadas en sensaciones corporales. De ahí la ubicuidad de la metáfora en el lenguaje cotidiano. Aunque esta afirmación ha sido ampliamente probada con el análisis del corpus verbal en distintas lenguas, apenas existen investigaciones en el corpus audiovisual. Si las metáforas primarias forman parte de nuestro inconsciente cognitivo, son inherentes al ser humano y consecuencia de la naturaleza del cerebro, deben generar también metáforas visuales. En este artículo, se analizan y discuten una serie de ejemplos para comprobarlo.
Resumo:
3D sensors provides valuable information for mobile robotic tasks like scene classification or object recognition, but these sensors often produce noisy data that makes impossible applying classical keypoint detection and feature extraction techniques. Therefore, noise removal and downsampling have become essential steps in 3D data processing. In this work, we propose the use of a 3D filtering and down-sampling technique based on a Growing Neural Gas (GNG) network. GNG method is able to deal with outliers presents in the input data. These features allows to represent 3D spaces, obtaining an induced Delaunay Triangulation of the input space. Experiments show how the state-of-the-art keypoint detectors improve their performance using GNG output representation as input data. Descriptors extracted on improved keypoints perform better matching in robotics applications as 3D scene registration.
Resumo:
Feature vectors can be anything from simple surface normals to more complex feature descriptors. Feature extraction is important to solve various computer vision problems: e.g. registration, object recognition and scene understanding. Most of these techniques cannot be computed online due to their complexity and the context where they are applied. Therefore, computing these features in real-time for many points in the scene is impossible. In this work, a hardware-based implementation of 3D feature extraction and 3D object recognition is proposed to accelerate these methods and therefore the entire pipeline of RGBD based computer vision systems where such features are typically used. The use of a GPU as a general purpose processor can achieve considerable speed-ups compared with a CPU implementation. In this work, advantageous results are obtained using the GPU to accelerate the computation of a 3D descriptor based on the calculation of 3D semi-local surface patches of partial views. This allows descriptor computation at several points of a scene in real-time. Benefits of the accelerated descriptor have been demonstrated in object recognition tasks. Source code will be made publicly available as contribution to the Open Source Point Cloud Library.
Resumo:
From a gender perspective, protection and advertising political actions about work-family should promote sharing responsibilities between sexes. Next to political action and specific measures, the project of equal opportunities needs a long-term strategy based on the education on equality. This article proposes the methodologic exposition of a study based on these premises. It facilitates and explains the protocol used for the analysis of the audio-visual advertising campaigns on conciliation emitted by the Woman’s Institute. The evaluation of the actions is focused on the effectiveness from the point of view of mass media. It provides some data that illustrates the proposed study. Finally, it considers the difficulties of the available sources of information.
Resumo:
Comunicación presentada en el XI Workshop of Physical Agents, Valencia, 9-10 septiembre 2010.
Resumo:
Image Based Visual Servoing (IBVS) is a robotic control scheme based on vision. This scheme uses only the visual information obtained from a camera to guide a robot from any robot pose to a desired one. However, IBVS requires the estimation of different parameters that cannot be obtained directly from the image. These parameters range from the intrinsic camera parameters (which can be obtained from a previous camera calibration), to the measured distance on the optical axis between the camera and visual features, it is the depth. This paper presents a comparative study of the performance of D-IBVS estimating the depth from three different ways using a low cost RGB-D sensor like Kinect. The visual servoing system has been developed over ROS (Robot Operating System), which is a meta-operating system for robots. The experiments prove that the computation of the depth value for each visual feature improves the system performance.
Resumo:
Several recent works deal with 3D data in mobile robotic problems, e.g. mapping or egomotion. Data comes from any kind of sensor such as stereo vision systems, time of flight cameras or 3D lasers, providing a huge amount of unorganized 3D data. In this paper, we describe an efficient method to build complete 3D models from a Growing Neural Gas (GNG). The GNG is applied to the 3D raw data and it reduces both the subjacent error and the number of points, keeping the topology of the 3D data. The GNG output is then used in a 3D feature extraction method. We have performed a deep study in which we quantitatively show that the use of GNG improves the 3D feature extraction method. We also show that our method can be applied to any kind of 3D data. The 3D features obtained are used as input in an Iterative Closest Point (ICP)-like method to compute the 6DoF movement performed by a mobile robot. A comparison with standard ICP is performed, showing that the use of GNG improves the results. Final results of 3D mapping from the egomotion calculated are also shown.
Resumo:
Several recent works deal with 3D data in mobile robotic problems, e.g., mapping. Data comes from any kind of sensor (time of flight, Kinect or 3D lasers) that provide a huge amount of unorganized 3D data. In this paper we detail an efficient approach to build complete 3D models using a soft computing method, the Growing Neural Gas (GNG). As neural models deal easily with noise, imprecision, uncertainty or partial data, GNG provides better results than other approaches. The GNG obtained is then applied to a sequence. We present a comprehensive study on GNG parameters to ensure the best result at the lowest time cost. From this GNG structure, we propose to calculate planar patches and thus obtaining a fast method to compute the movement performed by a mobile robot by means of a 3D models registration algorithm. Final results of 3D mapping are also shown.
Resumo:
New low cost sensors and open free libraries for 3D image processing are making important advances in robot vision applications possible, such as three-dimensional object recognition, semantic mapping, navigation and localization of robots, human detection and/or gesture recognition for human-machine interaction. In this paper, a novel method for recognizing and tracking the fingers of a human hand is presented. This method is based on point clouds from range images captured by a RGBD sensor. It works in real time and it does not require visual marks, camera calibration or previous knowledge of the environment. Moreover, it works successfully even when multiple objects appear in the scene or when the ambient light is changed. Furthermore, this method was designed to develop a human interface to control domestic or industrial devices, remotely. In this paper, the method was tested by operating a robotic hand. Firstly, the human hand was recognized and the fingers were detected. Secondly, the movement of the fingers was analysed and mapped to be imitated by a robotic hand.
Resumo:
During grasping and intelligent robotic manipulation tasks, the camera position relative to the scene changes dramatically because the robot is moving to adapt its path and correctly grasp objects. This is because the camera is mounted at the robot effector. For this reason, in this type of environment, a visual recognition system must be implemented to recognize and “automatically and autonomously” obtain the positions of objects in the scene. Furthermore, in industrial environments, all objects that are manipulated by robots are made of the same material and cannot be differentiated by features such as texture or color. In this work, first, a study and analysis of 3D recognition descriptors has been completed for application in these environments. Second, a visual recognition system designed from specific distributed client-server architecture has been proposed to be applied in the recognition process of industrial objects without these appearance features. Our system has been implemented to overcome problems of recognition when the objects can only be recognized by geometric shape and the simplicity of shapes could create ambiguity. Finally, some real tests are performed and illustrated to verify the satisfactory performance of the proposed system.