14 resultados para Audio-visual content classification
em Universidad Politécnica de Madrid
Resumo:
This paper describes the participation of DAEDALUS at ImageCLEF 2011 Plant Identification task. The task is evaluated as a supervised classification problem over 71 tree species from the French Mediterranean area used as class labels, based on visual content from scan, scan-like and natural photo images. Our approach to this task is to build a classifier based on the detection of keypoints from the images extracted using Lowe’s Scale Invariant Feature Transform (SIFT) algorithm. Although our overall classification score is very low as compared to other participant groups, the main conclusion that can be drawn is that SIFT keypoints seem to work significantly better for photos than for the other image types, so our approach may be a feasible strategy for the classification of this kind of visual content.
Resumo:
Desde la explosión de crecimiento de internet que comenzó en los años 90, se han ido creando y poniendo a disposición de los usuarios diversas herramientas para compartir información y servicios de diversas formas, desde el nacimiento del primer navegador hasta nuestros días, donde hay infinidad de lenguajes aplicables al ámbito web. En esta fase de crecimiento, en primer lugar, de cara a usuarios individuales, saldrían herramientas que permitirían a cada cual hacer su web personal, con sus contenidos expuestos. Más adelante se fue generando el fenómeno “comunidad”, con, por ejemplo, foros, o webs en las que había múltiples usuarios que disfrutaban de contenidos o servicios que la web ofreciese. Este crecimiento del mundo web en lo comunitario ha avanzado en muchas ramas,entre ellas, por supuesto, la educacional, surgiendo plataformas como la que es base del proyecto que a continuación se presenta, y herramienta básica y prácticamente ya imprescindible en la enseñanza universitaria: Moodle. Moodle es una herramienta diseñada para compatir recursos y diseñar actividades para el usuario potencial, complementando su aprendizaje en aula, o incluso siendo una vía autónoma de aprendizaje en sí misma. Se ha realizado un estudio sobre el estado de saludo de los contenidos que se exponen en Moodle, y se ha encontrado que una gran mayoría de los cursos que se pueden visitar tienen un gran número de carencias. Por un lado, hay pocos con material original explotado exclusivamente para el curso, y, si tienen material original, no se ha observado una especial atención por la maquetación. Por otro lado, hay muchos otros sin material original, y, en ambos casos, no se ha encontrado ningún curso que ofrezca material audiovisual exclusivo para el curso, presentando algunos en su lugar material audiovisual encontrado en la red (Youtube, etc). A la vista de estos hechos, se ha realizado un proyecto que intenta aportar soluciones ante estas carencias, y se presenta un curso procedente de diversas referencias bibliográficas, para la parte textual, y material audiovisual original e inédito que también se ha explotado específicamente para este curso. Este material ha sido por un lado vídeo, que se ha visionado, editado y subtitulado con software de libre distribución, y por otro lado, audio, que complementa un completo glosario que se ha añadido como extra al curso y cuyo planteamiento no se ha encontrado en ningún curso online de los revisados. Todo esto se ha envuelto en una maquetación cuidada que ha sido fruto del estudio de los lenguajes web html y CSS, de forma que, por un lado, el curso sea un lugar agradable en el que aprender dentro de internet, y por otro, se pudiesen realizar ciertas operaciones que sin estos conocimientos habrían sido imposibles, como la realización del glosario o la incrustación de imágenes y vídeos. A su vez, se ha tratado de dar un enfoque didáctico a toda la memoria del proyecto, de forma que pueda ser de utilidad a un usuario futuro que quisiese profundizar en los usos de Moodle, introducirse en el lenguaje web, o introducirse en el mundo de la edición de vídeo. ABSTRACT: Since the explosion of Internet growth beginning in the 90s, many tools have been created and made available for users to share information and services in various ways, from the birth of the first browser until today, where there are plenty of web programming languages. This growth stage would give individual users tools that would allow everyone to make an own personal website, with their contents exposed. Later, the "community" phenomenon appeared with, for example, forums, or websites where multiple users enjoyed the content or web services that those websites offered. Also, this growth in the web community world has progressed in many fields, including education, with the emerge of platforms such as the one that this project uses as its basis, and which is the basic and imperative tool in college education: Moodle. Moodle is a tool designed to share resources and design activities for the potential user, completing class learning, or even letting this user learn in an autonomous way. In this project a study on the current situation of the content present in Moodle courses around the net has been carried out, and it has been found that most of them lack of original material exploited exclusively for the courses, and if they have original material, there has been not observed concern on the layout where that material lies. On the other hand, there are many other with non original material, and in both cases, there has not been found any course that offers audio- visual material made specifically for the course, instead of presenting some audiovisual material found on the net (Youtube, etc). In view of these facts, the project presented here seeks to provide solutions to these shortcomings, presenting a course with original material exploited from various references, and unpublished audioevisual material which also has been exploited specifically for this course. This material is, on one hand, video, which has been viewed, edited and subtitled with free software, and on the other, audio, which complements a comprehensive glossary that has been added as an extra feature to the course and whose approach was not found in any of the online courses reviewed. All of this has been packaged in a neat layout that has been the result of the study of web languages HTML and CSS, so that first, the course was a pleasant place to learn on the internet, and second, certain operations could be performed which without this knowledge would have been impossible, as the glossary design or embedding images and videos. Furthermore, a didactic approach has been adopted to the entire project memory, so it can be useful to a future user who wanted to go deeper on the uses of Moodle, containing an intro into the web language, or in the world video editing.
Resumo:
Esta investigación se centra en el estudio de la dimensión audiovisual de la arquitectura, como aproximación intersensorial a la aprehensión e ideación del espacio. Poniendo en evidencia la complejidad de la relación hombre-medio, se plantea la necesidad de desarrollar nuevas metodologías y herramientas que tengan en cuenta dicha complejidad y que favorezcan el desarrollo del proyecto. Nos mueve en esta investigación la convicción de que los cambios rápidos y profundos que caracterizan nuestros tiempos en todos los ámbitos, social, económico, político… entrañan inevita-blemente nuevos modos de conocimiento y experimentación del espacio, y por tanto nuevos ejes de investigación. La creciente valoración, en todos los campos del conocimiento, de los aspectos subjetivos y sensoriales, el desarrollo de las tecnologías que ha cambiado completamente nuestras relaciones interpersonales y con el entorno, las nuevas capacidades de análisis, grabación y conservación y manipulación de datos y por ultimo, aunque no menos importante, la puesta a disposición democrá¬tica y global de todo el saber a través de Internet, imponen otra aproximación al hacer, concebir y vivir la arquitectura. Esta investigación se centra en un análisis crítico del estado de la cuestión, construyendo nue¬vas redes de relación entre disciplinas, que permitan plantear la dimensión audiovisual como un nuevo eje de investigación dentro de la arquitectura, poniendo en evidencia la necesidad de desa¬rrollar análisis de forma trasversal e interdisciplinar. Hemos prestado particular atención a la evolución de lo sonoro y su aproximación cualitativa a la arquitectura, mostrando como el sonido, con su capacidad de introducir el tiempo y los aspectos dinámicos (el movimiento, la presencia del cuerpo…), no es simplemente otro canal sensorial en la aprehensión del espacio, ya que su interacción con lo visual genera un espacio-tiempo indisociable, propio, característico de cada momento y lugar. A partir de este planteamiento se ha hecho una revisión metodológica dirigida a utilizar el reco¬rrido como herramienta de análisis, que permita estudiar la relación entre el espacio, la acción y la percepción audio-visual, cruzando para ello los datos correspondientes a la morfología del espacio, con los datos de la experiencia perceptiva individual y con los de los usos colectivos del espacio, utilizándose finalmente el video como un herramienta, no sólo de representación de lo real, sino también como instrumento de análisis, que permite tomar datos (grabaciones audio, video, obser¬vaciones…), aislarlos, estudiarlos, clasificarlos, ordenarlos, y finalmente, restituirlos mediante el montaje. Se ha realizado una primera experimentación “in situ” que ha servido para explorar la aplicación del método, planteando nuevas preguntas y abriendo líneas de análisis para ulteriores investigacio¬nes. ABSTRACT This research is focused on the study of the audiovisual dimension of architecture, as an in¬tersensorial approach to space apprehension and design. It is posed the necessity to develop new methodologies and tools that keep this complexity, as a contribution to the development of a project, by means of putting into evidence the sophistication of the relationship between man and media The research moves us to the conviction that the quick and relevant changes that confer a distinc-tion to these contemporary times all over the social, economic and political environments, involve, unavoidably, new ways of knowledge and experimentation on space, and therefore, new trends of research. The growing valuation of subjective and sensorial aspects all over the fields of the knowledge and the development of the technologies that have changed completely our interpersonal and environmental relationships, the new tools for analysis, recording, conservation and manipulation of data and, last but not least, the setting to democratic and global availability of the whole knowledge through Inter¬net, impose another approach to the making, conception and experience of architecture. This research deals with a critical analysis of the state–of- the-art of the matter, modelling new webs of relationship among disciplines that allow to outline the audiovisual dimension as a new focus of research on architecture, putting evidence into practice as it is necessary to develop any analysis in a transversal and interdisciplinary way. It is paid a special attention to the evolution of sound objects and their qualitative approach to ar¬chitecture, showing how sound, with its capacity to transmit time and dynamic aspects of things (movement, the presence of the body), it is not simply another sensorial channel in the apprehension of space, since its interaction with the visual thing generates an undetachable association of space and time, an specific one of every moment and place. Starting from this position a methodological revision has been made leading to use a walk as a tool for analysis that allows to study the relationship among the space, the action and the audio-visual perception, by means of crossing data corresponding to the morphology of space, with the data of a perceptive experience from the perspective of an individual observer and with those of the collective uses of the space, as video has been finally used as a tool, not only as a representation of the real thing, but also as a tool for analysis that allows to take isolated data (audio recordings, video, obser¬vations), to be studied, classified, and put into their appropriate place, and finally, to restore them by means of a multimedia set up. A first experimentation in situ has been carried out, being useful to explore a method of appli¬cation, outlining new questions and beginning with new ways of analysis for further research.
Resumo:
The increase of multimedia services delivered over packet-based networks has entailed greater quality expectations of the end-users. This has led to an intensive research on techniques for evaluating the quality of experience perceived by the viewers of audiovisual content, considering the different degradations that it could suffer along the broadcasting system. In this paper, a comprehensive study of the impact of transmission errors affecting video and audio in IPTV is presented. With this aim, subjective assessment tests were carried out proposing a novel methodology trying to keep as close as possible home environment viewing conditions. Also 3DTV content in side-by-side format has been used in the experiments to compare the impact of the degradations. The results provide a better understanding of the effects of transmission errors, and show that the QoE related to the first approach of 3DTV is acceptable, but the visual discomfort that it causes should be reduced.
Resumo:
A land classification method was designed for the Community of Madrid (CM), which has lands suitable for either agriculture use or natural spaces. The process started from an extensive previous CM study that contains sets of land attributes with data for 122 types and a minimum-requirements method providing a land quality classification (SQ) for each land. Borrowing some tools from Operations Research (OR) and from Decision Science, that SQ has been complemented by an additive valuation method that involves a more restricted set of 13 representative attributes analysed using Attribute Valuation Functions to obtain a quality index, QI, and by an original composite method that uses a fuzzy set procedure to obtain a combined quality index, CQI, that contains relevant information from both the SQ and the QI methods.
Resumo:
The use of fly ash (FA) as an admixture to concrete is broadly extended for two main reasons: the reduction of costs that supposes the substitution of cement and the micro structural changes motivated by the mineral admixture. Regarding this second point, there is a consensus that considers that the ash generates a more compact concrete and a reduction in the size of the pore. However, the measure in which this contributes to the pozzolanic activity or as filler is not well defined. There is also no justification to the influence of the physical parameters, fineness of the grain and free water, in its behavior. This work studies the use of FA as a partial substitute of the cement in concretes of different workability (dry and wet) and the influence in the reactivity of the ash. The concrete of dry consistency which serves as reference uses a cement dose of 250 Kg/m 3 and the concrete of fluid consistency utilized a dose of cement of 350 Kg/m 3 . Two trademark of Portland Cement Type 1 were used. The first reached the resistant class for its fineness of grain and the second one for its composition. Moreover, three doses of FA have been used, and the water/binder ratio was constant in all the mixtures. We have studied the mechanical properties and the micro-structure of the concretes by means of compressive strength tests, mercury intrusion porosimetry (MIP) and thermal analysis (TA). The results of compressive strength tests allow us to observe that concrete mixtures with cements of the same classification and similar dosage of binder do not present the same mechanical behavior. These results show that the effective water/binder ratio has a major role in the development of the mechanical properties of concrete. The study of different dosages using TA, thermo-gravimetry and differential thermal analysis, revealed that the portlandite content is not restrictive in any of the dosages studied. Again, this proves that the rheology of the material influences the reaction rate and content of hydrated cement products. We conclude that the available free water is determinant in the efficiency of pozzolanic reaction. It is so that in accordance to the availability of free water, the ashes can react as an active admixture or simply change the porous distribution. The MIP shows concretes that do not exhibit significant changes in their mechanical behavior, but have suffered significant variation in their porous structure
Resumo:
This paper describes the participation of DAEDALUS at ImageCLEF 2011 Medical Retrieval task. We have focused on multimodal (or mixed) experiments that combine textual and visual retrieval. The main objective of our research has been to evaluate the effect on the medical retrieval process of the existence of an extended corpus that is annotated with the image type, associated to both the image itself and also to its textual description. For this purpose, an image classifier has been developed to tag each document with its class (1st level of the hierarchy: Radiology, Microscopy, Photograph, Graphic, Other) and subclass (2nd level: AN, CT, MR, etc.). For the textual-based experiments, several runs using different semantic expansion techniques have been performed. For the visual-based retrieval, different runs are defined by the corpus used in the retrieval process and the strategy for obtaining the class and/or subclass. The best results are achieved in runs that make use of the image subclass based on the classification of the sample images. Although different multimodal strategies have been submitted, none of them has shown to be able to provide results that are at least comparable to the ones achieved by the textual retrieval alone. We believe that we have been unable to find a metric for the assessment of the relevance of the results provided by the visual and textual processes
Resumo:
In this paper, two techniques to control UAVs (Unmanned Aerial Vehicles), based on visual information are presented. The first one is based on the detection and tracking of planar structures from an on-board camera, while the second one is based on the detection and 3D reconstruction of the position of the UAV based on an external camera system. Both strategies are tested with a VTOL (Vertical take-off and landing) UAV, and results show good behavior of the visual systems (precision in the estimation and frame rate) when estimating the helicopter¿s position and using the extracted information to control the UAV.
Resumo:
Esta Tesis Doctoral aborda el estudio de algunas técnicas no destructivas para la clasificación de madera de pino silvestre (Pinus sylvestris L.) de procedencia española y de gruesa escuadría para uso estructural. Para la estimación del módulo de elasticidad y de la resistencia se han aplicado técnicas basadas en la propagación de una onda a través de la madera: onda de ultrasonidos (Sylvatest) o de impacto (Microsecond Timer) en dirección longitudinal, o vibración en dirección longitudinal y transversal (PLG). Para la estimación de la densidad se han utilizado métodos puntuales basados en el penetrómetro (Pilodyn) y en la resistencia al arranque de un tornillo. Las variables obtenidas han sido relacionadas con los resultados de la clasificación visual y con las propiedades de la madera determinadas mediante ensayo mecánico. Además, se ha estudiado la influencia de la humedad de la madera en la velocidad de propagación de la onda para definir factores de corrección a los equipos comerciales utilizados en esta Tesis Doctoral. La muestra de estudio está formada por 244 piezas procedentes de El Espinar, Segovia, con dimensiones nominales 150 x 200 x 4000 mm (218 piezas) y 100 x 150 x 3000 mm (26 piezas). De todas las piezas se tomaron datos de dimensiones, contenido de humedad y clasificación visual según la norma UNE 56544. En las primeras 218 vigas se aplicaron las técnicas de ultrasonidos, onda de impacto y vibraciones, se determinó la densidad de cada pieza completa y se ensayaron según la norma UNE-EN 408 para obtener el módulo de elasticidad global (en todos los casos) y local (en un porcentaje), así como de la tensión de rotura. Se extrajeron tres rebanadas para los ensayos puntuales y para el cálculo de la densidad. En las otras 26 piezas se repitieron los ensayos (transmisión de onda, vibración y clasificación visual) durante el proceso de secado natural, desde que la madera se encontraba húmeda (en torno al 40 %) hasta la humedad de equilibrio higroscópico (en torno al 9%). Respecto a la clasificación visual no se han observado diferencias significativas entre la calidad MEG o las rechazadas. Se han estudiado las consecuencias del secado (principalmente las deformaciones) y no se ha encontrado justificación para que estos defectos penalicen la clasificación. Para la densidad, el mayor R2 obtenido ha sido de un 47% a partir del uso combinado de los dos equipos puntuales (penetrómetro y arranque de tornillo). Para el módulo de elasticidad y la tensión de rotura, la mejor relación se ha obtenido a partir de la técnica de vibración longitudinal, con unos coeficientes de 79% y un 52% respectivamente. Se ha estimado que el aumento de un punto porcentual en el contenido de humedad de la madera produce una pérdida de velocidad de onda del 0,58% para Sylvatest y Microsecond Timer, y del 0,71% para PLG. Estos valores son generalizables para un rango de humedades entre 9 y 25 %. Abstract This Doctoral Thesis approach the study of some non-destructive techniques as a classification method for structural use of Scots pine wood of Spanish origin with large cross section. To estimate the modulus of elasticity and strength have been used techniques based on the propagation of a wave through the timber: ultrasonic wave (Sylvatest) or stress wave (Microsecond Timer) in longitudinal direction, or vibration in longitudinal and transversal direction (PLG). Local probing methods have been applied to estimate the density, based on penetrometer (Pilodyn) and the screw withdrawal resistance meter. The different variables obtained were compared with the results of the visual grading and the values of the properties of the wood determined by the standardized test of the pieces. Furthermore, the influence of the moisture content of the wood on the velocity of propagation of the waves through the timber has been analyzed in order to establish a correction factor for the commercial devices used in this Doctoral Thesis. The sample tested consists of 244 pieces from El Espinar, Segovia, with nominal dimensions 150 x 200 x 4000 mm (218 pieces) and 100 x 150 x 3000 mm (26 pieces). Data collection about dimensions, moisture content and visual grading according to the UNE 56544 standard were carried out on all the pieces. The first 218 pieces were tested by non destructive techniques based on ultrasonic wave, stress wave and vibration, the density was measured on each piece and bending test according to the UNE-EN 408 standard was carried out for calculating the global modulus of elasticity (all the pieces) and the local one (only a representative group), as well as the bending strength. Three slices were removed for implementing the local probing and to calculate the density. In the other 26 pieces the tests (wave transmission, vibration and visual grading) were repeated during the natural drying process, from wet timber (around 40 % moisture content) up to the equilibrium moisture content (around 9%). Regarding the visual grading no significant differences were observed between MEG or rejected pieces. The effects of drying (deformations) have been studied, and justification for the specification hasn't been found. To estimate the density, the greater R2 obtained was 47% by using both penetrometer and screw withdrawal. For the modulus of elasticity and bending strength, the best relationship has been found with the longitudinal vibration, with coefficients of 79% and 52% respectively. It has been estimated that an increase of a point of the moisture content of the wood produces a decrease on the velocity obtained from ultrasonic or stress wave of 0,58%, and 0,71 % for the one obtained from vibration. Those values can be generalized for a range of moisture content from 9 to 25 %.
Resumo:
In the spinal cord of the anesthetized cat, spontaneous cord dorsum potentials (CDPs) appear synchronously along the lumbo-sacral segments. These CDPs have different shapes and magnitudes. Previous work has indicated that some CDPs appear to be specially associated with the activation of spinal pathways that lead to primary afferent depolarization and presynaptic inhibition. Visual detection and classification of these CDPs provides relevant information on the functional organization of the neural networks involved in the control of sensory information and allows the characterization of the changes produced by acute nerve and spinal lesions. We now present a novel feature extraction approach for signal classification, applied to CDP detection. The method is based on an intuitive procedure. We first remove by convolution the noise from the CDPs recorded in each given spinal segment. Then, we assign a coefficient for each main local maximum of the signal using its amplitude and distance to the most important maximum of the signal. These coefficients will be the input for the subsequent classification algorithm. In particular, we employ gradient boosting classification trees. This combination of approaches allows a faster and more accurate discrimination of CDPs than is obtained by other methods.
Resumo:
In this paper, the fusion of probabilistic knowledge-based classification rules and learning automata theory is proposed and as a result we present a set of probabilistic classification rules with self-learning capability. The probabilities of the classification rules change dynamically guided by a supervised reinforcement process aimed at obtaining an optimum classification accuracy. This novel classifier is applied to the automatic recognition of digital images corresponding to visual landmarks for the autonomous navigation of an unmanned aerial vehicle (UAV) developed by the authors. The classification accuracy of the proposed classifier and its comparison with well-established pattern recognition methods is finally reported.
Resumo:
El presente proyecto fin de carrera, realizado por el ingeniero técnico en telecomunicaciones Pedro M. Matamala Lucas, es la fase final de desarrollo de un proyecto de mayor magnitud correspondiente al software de vídeo forense SAVID. El propósito del proyecto en su totalidad es la creación de una herramienta informática capacitada para realizar el análisis de ficheros de vídeo, codificados y comprimidos por el sistema DV –Digital Video-. El objetivo del análisis, es aportar información acerca de si la cinta magnética presenta indicios de haber sido manipulada con una edición posterior a su grabación original, además, de mostrar al usuario otros datos de interés como las especificaciones técnicas de la señal de vídeo y audio. Por lo tanto, se facilitará al usuario, analista de vídeo forense, información que le ayude a valorar la originalidad del contenido del soporte que es sujeto del análisis. El objetivo específico de esta fase final, es la creación de la interfaz de usuario del software, que informa tanto del código binario de los sectores significativos, como de su interpretación tras el análisis. También permitirá al usuario el reporte de los resultados, además de otras funcionalidades que le permitan la navegación por los sectores del código que han sido modificados como efecto colateral de la edición de la cinta magnética original. Otro objetivo importante del proyecto ha sido la investigación de metodologías y técnicas de desarrollo de software para su posterior implementación, buscando con esto, una mayor eficiencia en la gestión del tiempo y una mayor calidad de software con el fin de garantizar su evolución y sostenibilidad en el futuro. Se ha hecho hincapié en las metodologías ágiles que han ido ganando relevancia en el sector de las tecnologías de la información en las últimas décadas, sustituyendo a metodologías clásicas como el desarrollo en cascada. Su flexibilidad durante el ciclo de vida del software, permite obtener mejores resultados cuando las especificaciones no están del todo definidas, ajustándose de este modo a las condiciones del proyecto. Resumiendo las especificaciones técnicas del software, C++ es el lenguaje de programación orientado a objetos con el que se ha desarrollado, utilizándose la tecnología MFC -Microsoft Foundation Classes- para la implementación. Es un proyecto MFC de tipo cuadro de dialogo,creado, compilado y publicado, con la herramienta de desarrollo integrado Microsoft Visual Studio 2010. La arquitectura con la que se ha estructurado es la arquetípica de tres capas, compuesta por la interfaz de usuario, capa de negocio y capa de acceso a datos. Se ha visto necesario configurar el proyecto con compatibilidad con CLR –Common Languages Runtime- para poder implementar la funcionalidad de creación de reportes. Acompañando a la aplicación informática, se presenta la memoria del proyecto y sus anexos correspondientes a los documentos EDRF –Especificaciones Detalladas de Requisitos funcionales-, EIU –Especificaciones de Interfaz de Usuario , DT -Diseño Técnico- y Guía de Usuario. SUMMARY. This dissertation, carried out by the telecommunications engineer Pedro M. Matamala Lucas, is in its final stage and is part of a larger project for the software of forensic video called SAVID. The purpose of the entire project is the creation of a software tool capable of analyzing video files that are coded and compressed by the DV -Digital Video- System. The objective of the analysis is to provide information on whether the magnetic tape shows signs of having been tampered with after the editing of the original recording, and also to show the user other relevant data and technical specifications of the video signal and audio. Therefore the user, forensic video analyst, will have information to help assess the originality of the content of the media that is subject to analysis. The specific objective of this final phase is the creation of the user interface of the software that provides information about the binary code of the significant sectors and also its interpretation after analysis. It will also allow the user to report the results, and other features that will allow browsing through the sections of the code that have been modified as a secondary effect of the original magnetic tape being tampered. Another important objective of the project is the investigation of methodologies and software development techniques to be used in deployment, with the aim of greater efficiency in time management and enhanced software quality in order to ensure its development and maintenance in the future. Agile methodologies, which have become important in the field of information technology in recent decades, have been used during the execution of the project, replacing classical methodologies such as Waterfall Development. The flexibility, as the result of using by agile methodologies, during the software life cycle, produces better results when the specifications are not fully defined, thus conforming to the initial conditions of the project. Summarizing the software technical specifications, C + + the programming language – which is object oriented and has been developed using technology MFC- Microsoft Foundation Classes for implementation. It is a project type dialog box, created, compiled and released with the integrated development tool Microsoft Visual Studio 2010. The architecture is structured in three layers: the user interface, business layer and data access layer. It has been necessary to configure the project with the support CLR -Common Languages Runtime – in order to implement the reporting functionality. The software application is submitted with the project report and its annexes to the following documents: Functional Requirements Specifications - Detailed User Interface Specifications, Technical Design and User Guide.
Resumo:
One of the main challenges for intelligent vehicles is the capability of detecting other vehicles in their environment, which constitute the main source of accidents. Specifically, many methods have been proposed in the literature for video-based vehicle detection. Most of them perform supervised classification using some appearance-related feature, in particular, symmetry has been extensively utilized. However, an in-depth analysis of the classification power of this feature is missing. As a first contribution of this paper, a thorough study of the classification performance of symmetry is presented within a Bayesian decision framework. This study reveals that the performance of symmetry-based classification is very limited. Therefore, as a second contribution, a new gradient-based descriptor is proposed for vehicle detection. This descriptor exploits the known rectangular structure of vehicle rears within a Histogram of Gradients (HOG)-based framework. Experiments show that the proposed descriptor outperforms largely symmetry as a feature for vehicle verification, achieving classification rates over 90%.
Resumo:
La medida de calidad de vídeo sigue siendo necesaria para definir los criterios que caracterizan una señal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologías, como el vídeo 3D estereoscópico o formatos más allá de la alta definición, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfacción posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenómenos que afectan a distintas fases de la cadena de producción audiovisual y tipo de contenido variado. En primer lugar, el proceso de generación de contenidos debe encontrarse controlado mediante parámetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscópico, tanto de animación como de acción real. Por otro lado, la medida de calidad relativa a la fase de compresión de vídeo emplea métricas que en ocasiones no se encuentran adaptadas a la percepción del usuario. El empleo de modelos psicovisuales y diagramas de atención visual permitirían ponderar las áreas de la imagen de manera que se preste mayor importancia a los píxeles que el usuario enfocará con mayor probabilidad. Estos dos bloques se relacionan a través de la definición del término saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las áreas que más atractivas resultan al ojo humano. La saliencia en generación de contenidos estereoscópicos se refiere principalmente a la profundidad simulada mediante la ilusión óptica, medida en términos de distancia del objeto virtual al ojo humano. Sin embargo, en vídeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posición de los píxeles o la aparición de caras, que serán los factores básicos que compondrán el modelo de atención visual desarrollado. Con el objetivo de detectar las características de una secuencia de vídeo estereoscópico que, con mayor probabilidad, pueden generar disconfort visual, se consultó la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se llegó a la conclusión de que se producía disconfort en los casos en que se producía un cambio abrupto en la distribución de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada “violación de ventana”. A través de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parámetros que definían esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imágenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodación y vergencia del ojo humano, así como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las métricas de calidad a través de modelos que se adaptan al sistema visual humano, se realizaron también pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradación. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar máscaras de ponderación y atención visual, los cuales aproximan los parámetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest user’s satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observer’s eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to user’s perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eye’s response.