841 resultados para Object based video
Resumo:
Assessing video quality is a complex task. While most pixel-based metrics do not present enough correlation between objective and subjective results, algorithms need to correspond to human perception when analyzing quality in a video sequence. For analyzing the perceived quality derived from concrete video artifacts in determined region of interest we present a novel methodology for generating test sequences which allow the analysis of impact of each individual distortion. Through results obtained after subjective assessment it is possible to create psychovisual models based on weighting pixels belonging to different regions of interest distributed by color, position, motion or content. Interesting results are obtained in subjective assessment which demonstrates the necessity of new metrics adapted to human visual system.
Resumo:
In order to cater for user's quality of experience (QoE) requirements, HTTP adaptive streaming (HAS) based solutions of video services have become popular recently. User QoE feedback can be instrumental in improving the capabilities of such services. Perceptual quality experiments that involve humans are considered to be the most valid method of the assessment of QoE. Besides lab-based subjective experiments, crowdsourcing based subjective assessment of video quality is gaining popularity as an alternative method. This paper presents insights into a study that investigates perceptual preferences of various adaptive video streaming scenarios through crowdsourcing based subjective quality assessment.
Resumo:
The importance of vision-based systems for Sense-and-Avoid is increasing nowadays as remotely piloted and autonomous UAVs become part of the non-segregated airspace. The development and evaluation of these systems demand flight scenario images which are expensive and risky to obtain. Currently Augmented Reality techniques allow the compositing of real flight scenario images with 3D aircraft models to produce useful realistic images for system development and benchmarking purposes at a much lower cost and risk. With the techniques presented in this paper, 3D aircraft models are positioned firstly in a simulated 3D scene with controlled illumination and rendering parameters. Realistic simulated images are then obtained using an image processing algorithm which fuses the images obtained from the 3D scene with images from real UAV flights taking into account on board camera vibrations. Since the intruder and camera poses are user-defined, ground truth data is available. These ground truth annotations allow to develop and quantitatively evaluate aircraft detection and tracking algorithms. This paper presents the software developed to create a public dataset of 24 videos together with their annotations and some tracking application results.
Resumo:
La medida de calidad de vdeo sigue siendo necesaria para definir los criterios que caracterizan una seal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologas, como el vdeo 3D estereoscpico o formatos ms all de la alta definicin, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfaccin posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenmenos que afectan a distintas fases de la cadena de produccin audiovisual y tipo de contenido variado. En primer lugar, el proceso de generacin de contenidos debe encontrarse controlado mediante parmetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscpico, tanto de animacin como de accin real. Por otro lado, la medida de calidad relativa a la fase de compresin de vdeo emplea mtricas que en ocasiones no se encuentran adaptadas a la percepcin del usuario. El empleo de modelos psicovisuales y diagramas de atencin visual permitiran ponderar las reas de la imagen de manera que se preste mayor importancia a los pxeles que el usuario enfocar con mayor probabilidad. Estos dos bloques se relacionan a travs de la definicin del trmino saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las reas que ms atractivas resultan al ojo humano. La saliencia en generacin de contenidos estereoscpicos se refiere principalmente a la profundidad simulada mediante la ilusin ptica, medida en trminos de distancia del objeto virtual al ojo humano. Sin embargo, en vdeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posicin de los pxeles o la aparicin de caras, que sern los factores bsicos que compondrn el modelo de atencin visual desarrollado. Con el objetivo de detectar las caractersticas de una secuencia de vdeo estereoscpico que, con mayor probabilidad, pueden generar disconfort visual, se consult la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se lleg a la conclusin de que se produca disconfort en los casos en que se produca un cambio abrupto en la distribucin de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada violacin de ventana. A travs de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parmetros que definan esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imgenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodacin y vergencia del ojo humano, as como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las mtricas de calidad a travs de modelos que se adaptan al sistema visual humano, se realizaron tambin pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradacin. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar mscaras de ponderacin y atencin visual, los cuales aproximan los parmetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest users satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observers eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to users perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eyes response.
Resumo:
Three experiments assessed the development of children's part and configural (part-relational) processing in object recognition during adolescence. In total, 312 school children aged 7-16 years and 80 adults were tested in 3-alternative forced choice (3-AFC) tasks. They judged the correct appearance of upright and inverted presented familiar animals, artifacts, and newly learned multipart objects, which had been manipulated either in terms of individual parts or part relations. Manipulation of part relations was constrained to either metric (animals, artifacts, and multipart objects) or categorical (multipart objects only) changes. For animals and artifacts, even the youngest children were close to adult levels for the correct recognition of an individual part change. By contrast, it was not until 11-12 years of age that they achieved similar levels of performance with regard to altered metric part relations. For the newly learned multipart objects, performance was equivalent throughout the tested age range for upright presented stimuli in the case of categorical part-specific and part-relational changes. In the case of metric manipulations, the results confirmed the data pattern observed for animals and artifacts. Together, the results provide converging evidence, with studies of face recognition, for a surprisingly late consolidation of configural-metric relative to part-based object recognition.
Resumo:
We investigate the problem of obtaining a dense reconstruction in real-time, from a live video stream. In recent years, multi-view stereo (MVS) has received considerable attention and a number of methods have been proposed. However, most methods operate under the assumption of a relatively sparse set of still images as input and unlimited computation time. Video based MVS has received less attention despite the fact that video sequences offer significant benefits in terms of usability of MVS systems. In this paper we propose a novel video based MVS algorithm that is suitable for real-time, interactive 3d modeling with a hand-held camera. The key idea is a per-pixel, probabilistic depth estimation scheme that updates posterior depth distributions with every new frame. The current implementation is capable of updating 15 million distributions/s. We evaluate the proposed method against the state-of-the-art real-time MVS method and show improvement in terms of accuracy. 2011 Elsevier B.V. All rights reserved.
Resumo:
A Case-Based Reasoning (CBR) tool is software that can be used to develop several applications that require cased-based reasoning methodology. CBR shells are kind of application generators with graphical user interface. They can be used by non-programmer users but the extension or integration of new components in these tools is not possible. In this paper we analyzed three CBR object-oriented framework development environments CBR*Tools, CAT-CBR, and JColibri. These frameworks work as open software development environment and facilitate the reuse of their design as well as implementations.
Resumo:
Many Object recognition techniques perform some flavour of point pattern matching between a model and a scene. Such points are usually selected through a feature detection algorithm that is robust to a class of image transformations and a suitable descriptor is computed over them in order to get a reliable matching. Moreover, some approaches take an additional step by casting the correspondence problem into a matching between graphs defined over feature points. The motivation is that the relational model would add more discriminative power, however the overall effectiveness strongly depends on the ability to build a graph that is stable with respect to both changes in the object appearance and spatial distribution of interest points. In fact, widely used graph-based representations, have shown to suffer some limitations, especially with respect to changes in the Euclidean organization of the feature points. In this paper we introduce a technique to build relational structures over corner points that does not depend on the spatial distribution of the features. 2012 ICPR Org Committee.
Resumo:
This dissertation presents a study and experimental research on asymmetric coding of stereoscopic video. A review on 3D technologies, video formats and coding is rst presented and then particular emphasis is given to asymmetric coding of 3D content and performance evaluation methods, based on subjective measures, of methods using asymmetric coding. The research objective was de ned to be an extension of the current concept of asymmetric coding for stereo video. To achieve this objective the rst step consists in de ning regions in the spatial dimension of auxiliary view with di erent perceptual relevance within the stereo pair, which are identi ed by a binary mask. Then these regions are encoded with better quality (lower quantisation) for the most relevant ones and worse quality (higher quantisation) for the those with lower perceptual relevance. The actual estimation of the relevance of a given region is based on a measure of disparity according to the absolute di erence between views. To allow encoding of a stereo sequence using this method, a reference H.264/MVC encoder (JM) has been modi ed to allow additional con guration parameters and inputs. The nal encoder is still standard compliant. In order to show the viability of the method subjective assessment tests were performed over a wide range of objective qualities of the auxiliary view. The results of these tests allow us to prove 3 main goals. First, it is shown that the proposed method can be more e cient than traditional asymmetric coding when encoding stereo video at higher qualities/rates. The method can also be used to extend the threshold at which uniform asymmetric coding methods start to have an impact on the subjective quality perceived by the observers. Finally the issue of eye dominance is addressed. Results from stereo still images displayed over a short period of time showed it has little or no impact on the proposed method.
Resumo:
Recently, blood oxygen level-dependent (BOLD) functional magnetic resonance imaging (fMRI) has become a routine clinical procedure for localization of language and motor brain regions and has been replacing more invasive preoperative procedures. However, the fMRI results from these tasks are not always reproducible even from the same patient. Evaluating the reproducibility of language and speech mapping is especially complicated due to the complex brain circuitry that may become activated during the functional task. Non-language areas such as sensory, attention, decision-making, and motor brain regions may also be activated in addition to the specific language regions during a traditional sentence-completion task. In this study, I test a new approach, which utilizes 4-minute video-based tasks, to map language and speech brain regions for patients undergoing brain surgery. Results from 35 subjects have shown that the video-based task activates Wernickes area, as well as Brocas area in most subjects. The computed laterality indices, which indicate the dominant hemisphere from that functional task, have indicated left dominance from the video-based tasks. This study has shown that the video-based task may be an alternative method for localization of language and speech brain regions for patients who are unable to complete the sentence-completion task.
Resumo:
Objective <br/>Pedestrian detection under video surveillance systems has always been a hot topic in computer vision research. These systems are widely used in train stations, airports, large commercial plazas, and other public places. However, pedestrian detection remains difficult because of complex backgrounds. Given its development in recent years, the visual attention mechanism has attracted increasing attention in object detection and tracking research, and previous studies have achieved substantial progress and breakthroughs. We propose a novel pedestrian detection method based on the semantic features under the visual attention mechanism. <br/>Method <br/>The proposed semantic feature-based visual attention model is a spatial-temporal model that consists of two parts: the static visual attention model and the motion visual attention model. The static visual attention model in the spatial domain is constructed by combining bottom-up with top-down attention guidance. Based on the characteristics of pedestrians, the bottom-up visual attention model of Itti is improved by intensifying the orientation vectors of elementary visual features to make the visual saliency map suitable for pedestrian detection. In terms of pedestrian attributes, skin color is selected as a semantic feature for pedestrian detection. The regional and Gaussian models are adopted to construct the skin color model. Skin feature-based visual attention guidance is then proposed to complete the top-down process. The bottom-up and top-down visual attentions are linearly combined using the proper weights obtained from experiments to construct the static visual attention model in the spatial domain. The spatial-temporal visual attention model is then constructed via the motion features in the temporal domain. Based on the static visual attention model in the spatial domain, the frame difference method is combined with optical flowing to detect motion vectors. Filtering is applied to process the field of motion vectors. The saliency of motion vectors can be evaluated via motion entropy to make the selected motion feature more suitable for the spatial-temporal visual attention model. <br/>Result <br/>Standard datasets and practical videos are selected for the experiments. The experiments are performed on a MATLAB R2012a platform. The experimental results show that our spatial-temporal visual attention model demonstrates favorable robustness under various scenes, including indoor train station surveillance videos and outdoor scenes with swaying leaves. Our proposed model outperforms the visual attention model of Itti, the graph-based visual saliency model, the phase spectrum of quaternion Fourier transform model, and the motion channel model of Liu in terms of pedestrian detection. The proposed model achieves a 93% accuracy rate on the test video. <br/>Conclusion <br/>This paper proposes a novel pedestrian method based on the visual attention mechanism. A spatial-temporal visual attention model that uses low-level and semantic features is proposed to calculate the saliency map. Based on this model, the pedestrian targets can be detected through focus of attention shifts. The experimental results verify the effectiveness of the proposed attention model for detecting pedestrians.
Resumo:
2016 is the outbreak year of the virtual reality industry. In the field of virtual reality, 3D surveying plays an important role. Nowadays, 3D surveying technology has received increasing attention. This project aims to establish and optimize a WebGL three-dimensional broadcast platform combined with streaming media technology. It takes streaming media server and panoramic video broadcast in browser as the application background. Simultaneously, it discusses about the architecture from streaming media server to panoramic media player and analyzing relevant theory problem. This paper focuses on the debugging of streaming media platform, the structure of WebGL player environment, different types of ball model analysis, and the 3D mapping technology. The main work contains the following points: Initially, relay on Easy Darwin open source streaming media server, built a streaming service platform. It can realize the transmission from RTSP stream to streaming media server, and forwards HLS slice video to clients; Then, wrote a WebGL panoramic video player based on Three.js lib with JQuery browser playback controls. Set up a HTML5 panoramic video player; Next, analyzed the latitude and longitude sphere model which from Three.js library according to WebGL rendering method. Pointed out the drawbacks of this model and the breakthrough point of improvement; After that, on the basis of Schneider transform principle, established the Schneider sphere projection model, and converted the output OBJ file to JS file for media player reading. Finally implemented real time panoramic video high precision playing without plugin; At last, I summarized the whole project. Put forward the direction of future optimization and extensible market.