49 resultados para hand-drawn visual language recognition

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A more natural, intuitive, user-friendly, and less intrusive Human–Computer interface for controlling an application by executing hand gestures is presented. For this purpose, a robust vision-based hand-gesture recognition system has been developed, and a new database has been created to test it. The system is divided into three stages: detection, tracking, and recognition. The detection stage searches in every frame of a video sequence potential hand poses using a binary Support Vector Machine classifier and Local Binary Patterns as feature vectors. These detections are employed as input of a tracker to generate a spatio-temporal trajectory of hand poses. Finally, the recognition stage segments a spatio-temporal volume of data using the obtained trajectories, and compute a video descriptor called Volumetric Spatiograms of Local Binary Patterns (VS-LBP), which is delivered to a bank of SVM classifiers to perform the gesture recognition. The VS-LBP is a novel video descriptor that constitutes one of the most important contributions of the paper, which is able to provide much richer spatio-temporal information than other existing approaches in the state of the art with a manageable computational cost. Excellent results have been obtained outperforming other approaches of the state of the art.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the language identification (LID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We show that techniques originally developed for LID on telephone speech (e.g., for the NIST language recognition evaluations) remain effective on the noisy RATS data, provided that careful consideration is applied when designing the training and development sets. In addition, we show significant improvements from the use of Wiener filtering, neural network based and language dependent i-vector modeling, and fusion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Se plantea desarrollar una herramienta que ofrezca un soporte eficiente para la creación y el diseño de protocolos biológicos a los investigadores en biología sintética. Partiendo de este objetivo, se definen dos cometidos principales: Realizar un estudio de las herramientas existentes que ofrezcan soporte al diseño y aquellas pensadas para diseñar protocolos biológicos, el fin de este estudio es descubrir las funcionalidades que implementan estas herramientas para mejorarlas. Además, se ha de desarrollar una herramienta web que, mediante un lenguaje visual, permita diseñar y crear protocolos de biología sintética, guardándolos en un formato de archivo independiente del lenguaje. En este documento se encuentra, en primer lugar, la definición de objetivos y la descripción del método de desarrollo seguido durante la implementación del proyecto; después, el marco teórico, donde se exponen las herramientas estudiadas y las similitudes y diferencias con la idea que se tiene de la aplicación, y también las herramientas de desarrollo web con las que se va a implementar el proyecto. A continuación, se muestran los resultados obtenidos, mediante la definición de requisitos, así como una exposición de la propia herramienta. Por último, se encuentra la estrategia de validación que se ha seguido en el desarrollo del proyecto y se exponen las conclusiones obtenidas de estas validaciones; también se incluyen al final las conclusiones del proyecto y las líneas futuras de desarrollo.---ABSTRACT---It is planned to develop a tool that provides efficient support for the creation and design of biological protocols researchers in synthetic biology. Based on this goal, two main tasks are defined: Conduct a study of existing tools that provide design support and those intended to design biological protocols, the purpose of this study is to discover the functionalities that implement these tools to improve them. Furthermore, it has to develop a web tool that, through a visual language, allowing design and create synthetic biology protocols, storing them in an independent language file format. In this document is located, first, the definition of objectives and description of the development method followed during project implementation; then the theoretical framework where tools and studied the similarities and differences with the idea we have of the application are discussed, and development tools with which they will implement the project. Then the results obtained, by defining requirements as well as an exhibition of the own tool. Finally, the validation strategy that has been followed in the development of the project and the conclusions drawn from these validations are exposed; also, it is included at the end of the project conclusions and future lines of development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La Gestión de Recursos Humanos a través de Internet es un problema latente y presente actualmente en cualquier sitio web dedicado a la búsqueda de empleo. Este problema también está presente en AFRICA BUILD Portal. AFRICA BUILD Portal es una emergente red socio-profesional nacida con el ánimo de crear comunidades virtuales que fomenten la educación e investigación en el área de la salud en países africanos. Uno de los métodos para fomentar la educación e investigación es mediante la movilidad de estudiantes e investigadores entre instituciones, apareciendo así, el citado problema de la gestión de recursos humanos. Por tanto, este trabajo se centra en solventar el problema de la gestión de recursos humanos en el entorno específico de AFRICA BUILD Portal. Para solventar este problema, el objetivo es desarrollar un sistema de recomendación que ayude en la gestión de recursos humanos en lo que concierne a la selección de las mejores ofertas y demandas de movilidad. Caracterizando al sistema de recomendación como un sistema semántico el cual ofrecerá las recomendaciones basándose en las reglas y restricciones impuestas por el dominio. La aproximación propuesta se basa en seguir el enfoque de los sistemas de Matchmaking semánticos. Siguiendo este enfoque, por un lado, se ha empleado un razonador de lógica descriptiva que ofrece inferencias útiles en el cálculo de las recomendaciones y por otro lado, herramientas de procesamiento de lenguaje natural para dar soporte al proceso de recomendación. Finalmente para la integración del sistema de recomendación con AFRICA BUILD Portal se han empleado diversas tecnologías web. Los resultados del sistema basados en la comparación de recomendaciones creadas por el sistema y por usuarios reales han mostrado un funcionamiento y rendimiento aceptable. Empleando medidas de evaluación de sistemas de recuperación de información se ha obtenido una precisión media del sistema de un 52%, cifra satisfactoria tratándose de un sistema semántico. Pudiendo concluir que con la solución implementada se ha construido un sistema estable y modular posibilitando: por un lado, una fácil evolución que debería ir encaminada a lograr un rendimiento mayor, incrementando su precisión y por otro lado, dejando abiertas nuevas vías de crecimiento orientadas a la explotación del potencial de AFRICA BUILD Portal mediante la Web 3.0. ---ABSTRACT---The Human Resource Management through Internet is currently a latent problem shown in any employment website. This problem has also appeared in AFRICA BUILD Portal. AFRICA BUILD Portal is an emerging socio-professional network with the objective of creating virtual communities to foster the capacity for health research and education in African countries. One way to foster this capacity of research and education is through the mobility of students and researches between institutions, thus appearing the Human Resource Management problem. Therefore, this dissertation focuses on solving the Human Resource Management problem in the specific environment of AFRICA BUILD Portal. To solve this problem, the objective is to develop a recommender system which assists the management of Human Resources with respect to the selection of the best mobility supplies and demands. The recommender system is a semantic system which will provide the recommendations according to the domain rules and restrictions. The proposed approach is based on semantic matchmaking solutions. So, this approach on the one hand uses a Description Logics reasoning engine which provides useful inferences to the recommendation process and on the other hand uses Natural Language Processing techniques to support the recommendation process. Finally, Web technologies are used in order to integrate the recommendation system into AFRICA BUILD Portal. The results of evaluating the system are based on the comparison between recommendations created by the system and by real users. These results have shown an acceptable behavior and performance. The average precision of the system has been obtained by evaluation measures for information retrieval systems, so the average precision of the system is at 52% which may be considered as a satisfactory result taking into account that the system is a semantic system. To conclude, it could be stated that the implemented system is stable and modular. This fact on the one hand allows an easy evolution that should aim to achieve a higher performance by increasing its average precision and on the other hand keeps open new ways to increase the functionality of the system oriented to exploit the potential of AFRICA BUILD Portal through Web 3.0.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La principal aportación de esta tesis doctoral ha sido la propuesta y evaluación de un sistema de traducción automática que permite la comunicación entre personas oyentes y sordas. Este sistema está formado a su vez por dos sistemas: un traductor de habla en español a Lengua de Signos Española (LSE) escrita y que posteriormente se representa mediante un agente animado; y un generador de habla en español a partir de una secuencia de signos escritos mediante glosas. El primero de ellos consta de un reconocedor de habla, un módulo de traducción entre lenguas y un agente animado que representa los signos en LSE. El segundo sistema está formado por una interfaz gráfica donde se puede especificar una secuencia de signos mediante glosas (palabras en mayúscula que representan los signos), un módulo de traducción entre lenguas y un conversor texto-habla. Para el desarrollo del sistema de traducción, en primer lugar se ha generado un corpus paralelo de 7696 frases en español con sus correspondientes traducciones a LSE. Estas frases pertenecen a cuatro dominios de aplicación distintos: la renovación del Documento Nacional de Identidad, la renovación del permiso de conducir, un servicio de información de autobuses urbanos y la recepción de un hotel. Además, se ha generado una base de datos con más de 1000 signos almacenados en cuatro sistemas distintos de signo-escritura. En segundo lugar, se ha desarrollado un módulo de traducción automática que integra dos técnicas de traducción con una estructura jerárquica: la primera basada en memoria y la segunda estadística. Además, se ha implementado un módulo de pre-procesamiento de las frases en español que, mediante su incorporación al módulo de traducción estadística, permite mejorar significativamente la tasa de traducción. En esta tesis también se ha mejorado la versión de la interfaz de traducción de LSE a habla. Por un lado, se han incorporado nuevas características que mejoran su usabilidad y, por otro, se ha integrado un traductor de lenguaje SMS (Short Message Service – Servicio de Mensajes Cortos) a español, que permite especificar la secuencia a traducir en lenguaje SMS, además de mediante una secuencia de glosas. El sistema de traducción propuesto se ha evaluado con usuarios reales en dos dominios de aplicación: un servicio de información de autobuses de la Empresa Municipal de Transportes de Madrid y la recepción del Hotel Intur Palacio San Martín de Madrid. En la evaluación estuvieron implicadas personas sordas y empleados de los dos servicios. Se extrajeron medidas objetivas (obtenidas por el sistema automáticamente) y subjetivas (mediante cuestionarios a los usuarios). Los resultados fueron muy positivos gracias a la opinión de los usuarios de la evaluación, que validaron el funcionamiento del sistema de traducción y dieron información valiosa para futuras líneas de trabajo. Por otro lado, tras la integración de cada uno de los módulos de los dos sistemas de traducción (habla-LSE y LSE-habla), los resultados de la evaluación y la experiencia adquirida en todo el proceso, una aportación importante de esta tesis doctoral es la propuesta de metodología de desarrollo de sistemas de traducción de habla a lengua de signos en los dos sentidos de la comunicación. En esta metodología se detallan los pasos a seguir para desarrollar el sistema de traducción para un nuevo dominio de aplicación. Además, la metodología describe cómo diseñar cada uno de los módulos del sistema para mejorar su flexibilidad, de manera que resulte más sencillo adaptar el sistema desarrollado a un nuevo dominio de aplicación. Finalmente, en esta tesis se analizan algunas técnicas para seleccionar las frases de un corpus paralelo fuera de dominio para entrenar el modelo de traducción cuando se quieren traducir frases de un nuevo dominio de aplicación; así como técnicas para seleccionar qué frases del nuevo dominio resultan más interesantes que traduzcan los expertos en LSE para entrenar el modelo de traducción. El objetivo es conseguir una buena tasa de traducción con la menor cantidad posible de frases. ABSTRACT The main contribution of this thesis has been the proposal and evaluation of an automatic translation system for improving the communication between hearing and deaf people. This system is made up of two systems: a Spanish into Spanish Sign Language (LSE – Lengua de Signos Española) translator and a Spanish generator from LSE sign sequences. The first one consists of a speech recognizer, a language translation module and an avatar that represents the sign sequence. The second one is made up an interface for specifying the sign sequence, a language translation module and a text-to-speech conversor. For the translation system development, firstly, a parallel corpus has been generated with 7,696 Spanish sentences and their LSE translations. These sentences are related to four different application domains: the renewal of the Identity Document, the renewal of the driver license, a bus information service and a hotel reception. Moreover, a sign database has been generated with more than 1,000 signs described in four different signwriting systems. Secondly, it has been developed an automatic translation module that integrates two translation techniques in a hierarchical structure: the first one is a memory-based technique and the second one is statistical. Furthermore, a pre processing module for the Spanish sentences has been implemented. By incorporating this pre processing module into the statistical translation module, the accuracy of the translation module improves significantly. In this thesis, the LSE into speech translation interface has been improved. On the one hand, new characteristics that improve its usability have been incorporated and, on the other hand, a SMS language into Spanish translator has been integrated, that lets specifying in SMS language the sequence to translate, besides by specifying a sign sequence. The proposed translation system has been evaluated in two application domains: a bus information service of the Empresa Municipal de Transportes of Madrid and the Hotel Intur Palacio San Martín reception. This evaluation has involved both deaf people and services employees. Objective measurements (given automatically by the system) and subjective measurements (given by user questionnaires) were extracted during the evaluation. Results have been very positive, thanks to the user opinions during the evaluation that validated the system performance and gave important information for future work. Finally, after the integration of each module of the two translation systems (speech- LSE and LSE-speech), obtaining the evaluation results and considering the experience throughout the process, a methodology for developing speech into sign language (and vice versa) into a new domain has been proposed in this thesis. This methodology includes the steps to follow for developing the translation system in a new application domain. Moreover, this methodology proposes the way to improve the flexibility of each system module, so that the adaptation of the system to a new application domain can be easier. On the other hand, some techniques are analyzed for selecting the out-of-domain parallel corpus sentences in order to train the translation module in a new domain; as well as techniques for selecting which in-domain sentences are more interesting for translating them (by LSE experts) in order to train the translation model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La zona de Madrid al Este del Retiro ha estado indefectiblemente condicionada en su tardío desarrollo urbano por su posición a espaldas del Real Sitio. La construcción hacia 1640 de las tapias que rodeaban los reales jardines transformó la red de caminos que partían hacia oriente, aisló los terrenos ubicados más al Este de la ciudad, con la que ya sólo se podrían comunicar por las carreteras de Aragón y Valencia, y condenó las expectativas de desarrollo urbano reduciendo los precios de las propiedades, lo cual determinó durante décadas los usos y la arquitectura de la zona. El Anteproyecto de Ensanche de Carlos María de Castro constituye el germen a partir del cual, durante un lento proceso de casi cien años, fue configurándose la ciudad que hoy conocemos. La identificación en el Archivo de Villa del primer plano general del Ensanche trazado por Castro, del cual anteriores trabajos advirtieron de su existencia aunque se desconocía su localización, es la principal aportación de esta investigación. Por un lado, este primer plano general del Ensanche manuscrito es, por sí mismo, un documento de indudable importancia en la historia del urbanismo madrileño. En segundo lugar, el análisis de su contenido arroja nueva luz sobre la propuesta original de Castro, parcialmente censurada por la Dirección General de Obras Públicas antes de la aprobación del plan en 1860. Especialmente en lo referente a la zona de Madrid al Este del Retiro, proyectada como barrio obrero del Ensanche, este documento ha aportado un enfoque desconocido hasta ahora sobre el paisaje urbano concebido por Castro para la más ambiciosa propuesta planteada en mucho tiempo al problema de la vivienda obrera. Finalmente, el análisis de la factura del plano revela la superposición de varias capas de dibujo, evidenciando que durante un tiempo fue un documento vivo, utilizado como plano de trabajo por el equipo de Castro durante aproximadamente diez años, hasta la destitución del ingeniero en 1868. Posteriores análisis del plano sobre otros ámbitos de la ciudad arrojarán sin duda nuevos datos sobre el proceso proyectual del conjunto del Ensanche. Pero la dinámica de lo real, sintetizable en múltiples factores de índole social, económica y legislativa, transformó durante las primeras décadas de andadura del Ensanche la ciudad proyectada por Castro al Este del Retiro. El dibujo de la ciudad, entendido como herramienta de análisis y empleado con éxito en trabajos de investigación realizados por otros autores en la misma línea, ha permitido deducir la reconstitución gráfica del estado de la ciudad en diferentes momentos singulares del desarrollo urbanístico de la zona, así como de la propuesta original de barrio obrero de Castro. No hay que olvidar que, a pesar del escaso interés que suscitaba entre los inversores inmobiliarios el ámbito geográfico de estudio de esta tesis, fue objeto, durante casi un siglo, de numerosas propuestas de ordenación y urbanización que, aunque no llegaron a materializarse, fueron configurando una suerte de desarrollo virtual de la ciudad paralelo al devenir de la realidad. De esta forma, el dibujo se constituye en esta tesis como fuente de información, herramienta de pensamiento y resultado de la investigación en sí mismo, ilustrando y contribuyendo al mejor conocimiento de la forma urbana. ABSTRACT The area of Madrid to the East of the Retiro has been inevitably conditioned in its late urban development by its position behind the Royal Site. The construction of the walls surrounding the royal gardens around 1640 transformed the network of roads departing eastward, isolated land located to the East of the city, with which already only could communicate by roads of Aragon and Valencia, condemned the expectations of urban development by reducing the prices of the properties, and determined for decades uses and architecture of the area. The Carlos María de Castro preliminary design of City Expansion is the germ from which, during a slow process of almost one hundred years, the city which we know today was setting up. The discovery in the City Archive of the City Expansion first drawing traced by Castro, which previous investigations warned of its existence although its location was unknown, is the main contribution of this research. Firstly, this hand drawn general plan of the city expansion is by itself a document of undoubted importance in the history of Madrid urbanism. Secondly, the analysis of its content sheds new light on Castro´s original proposal, partially censored by the Dirección General de Obras Públicas before the approval of the plan in 1860. Especially concerning the area of Madrid to the East of the Retiro, projected as a workingclass district of the City Expansion, this document has provided an unknown up to now approach on the urban landscape designed by Castro for the more ambitious proposal put forward in a long time to the problem of worker housing. Finally, analysis of hand drawn plan reveals the superposition of several layers of drawing, demonstrating that for a time it was a living document, used as a work plan by the Castro team for approximately ten years, until the dismissal of the engineer in 1868. Subsequent analysis of the drawing on other areas of the city will have no doubt new data on the design process of the whole City Expansion. But the dynamics of reality, synthesizable on multiple factors in social, economic and legislative, transformed during the first decades of existence of the City Expansion designed by Castro to the East of the Retiro. Drawing of the city, understood as a tool of analysis and used successfully in research works done by other authors on the same line, has allowed to deduct graphic reconstitution of the city status in different and singular moments in the urban development of the area, as well as the original Castro´s proposal of working-class district. It should not be forgotten that, despite the lack of interest which raised among investors the geographic scope of this thesis study, it was the object, for nearly a century, of numerous proposals for urbanization which, although they didn´t materialize, were setting up a sort of virtual development of the city parallel to the becoming of the reality. In this way, drawing is used in the thesis as a source of information, tool of thought and outcome of the research itself, illustrating and contributing to a better understanding of urban form.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The aim of this Master Thesis is the analysis, design and development of a robust and reliable Human-Computer Interaction interface, based on visual hand-gesture recognition. The implementation of the required functions is oriented to the simulation of a classical hardware interaction device: the mouse, by recognizing a specific hand-gesture vocabulary in color video sequences. For this purpose, a prototype of a hand-gesture recognition system has been designed and implemented, which is composed of three stages: detection, tracking and recognition. This system is based on machine learning methods and pattern recognition techniques, which have been integrated together with other image processing approaches to get a high recognition accuracy and a low computational cost. Regarding pattern recongition techniques, several algorithms and strategies have been designed and implemented, which are applicable to color images and video sequences. The design of these algorithms has the purpose of extracting spatial and spatio-temporal features from static and dynamic hand gestures, in order to identify them in a robust and reliable way. Finally, a visual database containing the necessary vocabulary of gestures for interacting with the computer has been created.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article proposes an innovative biometric technique based on the idea of authenticating a person on a mobile device by gesture recognition. To accomplish this aim, a user is prompted to be recognized by a gesture he/she performs moving his/her hand while holding a mobile device with an accelerometer embedded. As users are not able to repeat a gesture exactly in the air, an algorithm based on sequence alignment is developed to correct slight differences between repetitions of the same gesture. The robustness of this biometric technique has been studied within 2 different tests analyzing a database of 100 users with real falsifications. Equal Error Rates of 2.01 and 4.82% have been obtained in a zero-effort and an active impostor attack, respectively. A permanence evaluation is also presented from the analysis of the repetition of the gestures of 25 users in 10 sessions over a month. Furthermore, two different gesture databases have been developed: one made up of 100 genuine identifying 3-D hand gestures and 3 impostors trying to falsify each of them and another with 25 volunteers repeating their identifying 3- D hand gesture in 10 sessions over a month. These databases are the most extensive in published studies, to the best of our knowledge.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

New forms of natural interactions between human operators and UAVs (Unmanned Aerial Vehicle) are demanded by the military industry to achieve a better balance of the UAV control and the burden of the human operator. In this work, a human machine interface (HMI) based on a novel gesture recognition system using depth imagery is proposed for the control of UAVs. Hand gesture recognition based on depth imagery is a promising approach for HMIs because it is more intuitive, natural, and non-intrusive than other alternatives using complex controllers. The proposed system is based on a Support Vector Machine (SVM) classifier that uses spatio-temporal depth descriptors as input features. The designed descriptor is based on a variation of the Local Binary Pattern (LBP) technique to efficiently work with depth video sequences. Other major consideration is the especial hand sign language used for the UAV control. A tradeoff between the use of natural hand signs and the minimization of the inter-sign interference has been established. Promising results have been achieved in a depth based database of hand gestures especially developed for the validation of the proposed system.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

La medida de calidad de vídeo sigue siendo necesaria para definir los criterios que caracterizan una señal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologías, como el vídeo 3D estereoscópico o formatos más allá de la alta definición, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfacción posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenómenos que afectan a distintas fases de la cadena de producción audiovisual y tipo de contenido variado. En primer lugar, el proceso de generación de contenidos debe encontrarse controlado mediante parámetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscópico, tanto de animación como de acción real. Por otro lado, la medida de calidad relativa a la fase de compresión de vídeo emplea métricas que en ocasiones no se encuentran adaptadas a la percepción del usuario. El empleo de modelos psicovisuales y diagramas de atención visual permitirían ponderar las áreas de la imagen de manera que se preste mayor importancia a los píxeles que el usuario enfocará con mayor probabilidad. Estos dos bloques se relacionan a través de la definición del término saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las áreas que más atractivas resultan al ojo humano. La saliencia en generación de contenidos estereoscópicos se refiere principalmente a la profundidad simulada mediante la ilusión óptica, medida en términos de distancia del objeto virtual al ojo humano. Sin embargo, en vídeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posición de los píxeles o la aparición de caras, que serán los factores básicos que compondrán el modelo de atención visual desarrollado. Con el objetivo de detectar las características de una secuencia de vídeo estereoscópico que, con mayor probabilidad, pueden generar disconfort visual, se consultó la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se llegó a la conclusión de que se producía disconfort en los casos en que se producía un cambio abrupto en la distribución de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada “violación de ventana”. A través de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parámetros que definían esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imágenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodación y vergencia del ojo humano, así como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las métricas de calidad a través de modelos que se adaptan al sistema visual humano, se realizaron también pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradación. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar máscaras de ponderación y atención visual, los cuales aproximan los parámetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest user’s satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observer’s eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to user’s perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eye’s response.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geographic Information Systems are developed to handle enormous volumes of data and are equipped with numerous functionalities intended to capture, store, edit, organise, process and analyse or represent the geographically referenced information. On the other hand, industrial simulators for driver training are real-time applications that require a virtual environment, either geospecific, geogeneric or a combination of the two, over which the simulation programs will be run. In the final instance, this environment constitutes a geographic location with its specific characteristics of geometry, appearance, functionality, topography, etc. The set of elements that enables the virtual simulation environment to be created and in which the simulator user can move, is usually called the Visual Database (VDB). The main idea behind the work being developed approaches a topic that is of major interest in the field of industrial training simulators, which is the problem of analysing, structuring and describing the virtual environments to be used in large driving simulators. This paper sets out a methodology that uses the capabilities and benefits of Geographic Information Systems for organising, optimising and managing the visual Database of the simulator and for generally enhancing the quality and performance of the simulator.