907 resultados para video classifcation and cataloging


Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn, stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and significant perspective distortion

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a novel framework for the analysis and optimization of encoding latency for multiview video. Firstly, we characterize the elements that have an influence in the encoding latency performance: (i) the multiview prediction structure and (ii) the hardware encoder model. Then, we provide algorithms to find the encoding latency of any arbitrary multiview prediction structure. The proposed framework relies on the directed acyclic graph encoder latency (DAGEL) model, which provides an abstraction of the processing capacity of the encoder by considering an unbounded number of processors. Using graph theoretic algorithms, the DAGEL model allows us to compute the encoding latency of a given prediction structure, and determine the contribution of the prediction dependencies to it. As an example of DAGEL application, we propose an algorithm to reduce the encoding latency of a given multiview prediction structure up to a target value. In our approach, a minimum number of frame dependencies are pruned, until the latency target value is achieved, thus minimizing the degradation of the rate-distortion performance due to the removal of the prediction dependencies. Finally, we analyze the latency performance of the DAGEL derived prediction structures in multiview encoders with limited processing capacity.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Analysis of minimally invasive surgical videos is a powerful tool to drive new solutions for achieving reproducible training programs, objective and transparent assessment systems and navigation tools to assist surgeons and improve patient safety. This paper presents how video analysis contributes to the development of new cognitive and motor training and assessment programs as well as new paradigms for image-guided surgery.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In recent years, the increasing sophistication of embedded multimedia systems and wireless communication technologies has promoted a widespread utilization of video streaming applications. It has been reported in 2013 that youngsters, aged between 13 and 24, spend around 16.7 hours a week watching online video through social media, business websites, and video streaming sites. Video applications have already been blended into people daily life. Traditionally, video streaming research has focused on performance improvement, namely throughput increase and response time reduction. However, most mobile devices are battery-powered, a technology that grows at a much slower pace than either multimedia or hardware developments. Since battery developments cannot satisfy expanding power demand of mobile devices, research interests on video applications technology has attracted more attention to achieve energy-efficient designs. How to efficiently use the limited battery energy budget becomes a major research challenge. In addition, next generation video standards impel to diversification and personalization. Therefore, it is desirable to have mechanisms to implement energy optimizations with greater flexibility and scalability. In this context, the main goal of this dissertation is to find an energy management and optimization mechanism to reduce the energy consumption of video decoders based on the idea of functional-oriented reconfiguration. System battery life is prolonged as the result of a trade-off between energy consumption and video quality. Functional-oriented reconfiguration takes advantage of the similarities among standards to build video decoders reconnecting existing functional units. If a feedback channel from the decoder to the encoder is available, the former can signal the latter changes in either the encoding parameters or the encoding algorithms for energy-saving adaption. The proposed energy optimization and management mechanism is carried out at the decoder end. This mechanism consists of an energy-aware manager, implemented as an additional block of the reconfiguration engine, an energy estimator, integrated into the decoder, and, if available, a feedback channel connected to the encoder end. The energy-aware manager checks the battery level, selects the new decoder description and signals to build a new decoder to the reconfiguration engine. It is worth noting that the analysis of the energy consumption is fundamental for the success of the energy management and optimization mechanism. In this thesis, an energy estimation method driven by platform event monitoring is proposed. In addition, an event filter is suggested to automate the selection of the most appropriate events that affect the energy consumption. At last, a detailed study on the influence of the training data on the model accuracy is presented. The modeling methodology of the energy estimator has been evaluated on different underlying platforms, single-core and multi-core, with different characteristics of workload. All the results show a good accuracy and low on-line computation overhead. The required modifications on the reconfiguration engine to implement the energy-aware manager have been assessed under different scenarios. The results indicate a possibility to lengthen the battery lifetime of the system in two different use-cases.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Low-cost systems that can obtain a high-quality foreground segmentation almostindependently of the existing illumination conditions for indoor environments are verydesirable, especially for security and surveillance applications. In this paper, a novelforeground segmentation algorithm that uses only a Kinect depth sensor is proposedto satisfy the aforementioned system characteristics. This is achieved by combininga mixture of Gaussians-based background subtraction algorithm with a new Bayesiannetwork that robustly predicts the foreground/background regions between consecutivetime steps. The Bayesian network explicitly exploits the intrinsic characteristics ofthe depth data by means of two dynamic models that estimate the spatial and depthevolution of the foreground/background regions. The most remarkable contribution is thedepth-based dynamic model that predicts the changes in the foreground depth distributionbetween consecutive time steps. This is a key difference with regard to visible imagery,where the color/gray distribution of the foreground is typically assumed to be constant.Experiments carried out on two different depth-based databases demonstrate that theproposed combination of algorithms is able to obtain a more accurate segmentation of theforeground/background than other state-of-the art approaches.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Nowadays, HTTP adaptive streaming (HAS) has become a reliable distribution technology offering significant advantages in terms of both user perceived Quality of Experience (QoE) and resource utilization for content and network service providers. By trading-off the video quality, HAS is able to adapt to the available bandwidth and display requirements so that it can deliver the video content to a variety of devices over the Internet. However, until now there is not enough knowledge of how the adaptation techniques affect the end user's visual experience. Therefore, this paper presents a comparative analysis of different bitrate adaptation strategies in adaptive streaming of monoscopic and stereoscopic video. This has been done through a subjective experiment of testing the end-user response to the video quality variations, considering the visual comfort issue. The experimental outcomes have made a good insight into the factors that can influence on the QoE of different adaptation strategies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The importance of vision-based systems for Sense-and-Avoid is increasing nowadays as remotely piloted and autonomous UAVs become part of the non-segregated airspace. The development and evaluation of these systems demand flight scenario images which are expensive and risky to obtain. Currently Augmented Reality techniques allow the compositing of real flight scenario images with 3D aircraft models to produce useful realistic images for system development and benchmarking purposes at a much lower cost and risk. With the techniques presented in this paper, 3D aircraft models are positioned firstly in a simulated 3D scene with controlled illumination and rendering parameters. Realistic simulated images are then obtained using an image processing algorithm which fuses the images obtained from the 3D scene with images from real UAV flights taking into account on board camera vibrations. Since the intruder and camera poses are user-defined, ground truth data is available. These ground truth annotations allow to develop and quantitatively evaluate aircraft detection and tracking algorithms. This paper presents the software developed to create a public dataset of 24 videos together with their annotations and some tracking application results.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

La medida de calidad de vídeo sigue siendo necesaria para definir los criterios que caracterizan una señal que cumpla los requisitos de visionado impuestos por el usuario. Las nuevas tecnologías, como el vídeo 3D estereoscópico o formatos más allá de la alta definición, imponen nuevos criterios que deben ser analizadas para obtener la mayor satisfacción posible del usuario. Entre los problemas detectados durante el desarrollo de esta tesis doctoral se han determinado fenómenos que afectan a distintas fases de la cadena de producción audiovisual y tipo de contenido variado. En primer lugar, el proceso de generación de contenidos debe encontrarse controlado mediante parámetros que eviten que se produzca el disconfort visual y, consecuentemente, fatiga visual, especialmente en lo relativo a contenidos de 3D estereoscópico, tanto de animación como de acción real. Por otro lado, la medida de calidad relativa a la fase de compresión de vídeo emplea métricas que en ocasiones no se encuentran adaptadas a la percepción del usuario. El empleo de modelos psicovisuales y diagramas de atención visual permitirían ponderar las áreas de la imagen de manera que se preste mayor importancia a los píxeles que el usuario enfocará con mayor probabilidad. Estos dos bloques se relacionan a través de la definición del término saliencia. Saliencia es la capacidad del sistema visual para caracterizar una imagen visualizada ponderando las áreas que más atractivas resultan al ojo humano. La saliencia en generación de contenidos estereoscópicos se refiere principalmente a la profundidad simulada mediante la ilusión óptica, medida en términos de distancia del objeto virtual al ojo humano. Sin embargo, en vídeo bidimensional, la saliencia no se basa en la profundidad, sino en otros elementos adicionales, como el movimiento, el nivel de detalle, la posición de los píxeles o la aparición de caras, que serán los factores básicos que compondrán el modelo de atención visual desarrollado. Con el objetivo de detectar las características de una secuencia de vídeo estereoscópico que, con mayor probabilidad, pueden generar disconfort visual, se consultó la extensa literatura relativa a este tema y se realizaron unas pruebas subjetivas preliminares con usuarios. De esta forma, se llegó a la conclusión de que se producía disconfort en los casos en que se producía un cambio abrupto en la distribución de profundidades simuladas de la imagen, aparte de otras degradaciones como la denominada “violación de ventana”. A través de nuevas pruebas subjetivas centradas en analizar estos efectos con diferentes distribuciones de profundidades, se trataron de concretar los parámetros que definían esta imagen. Los resultados de las pruebas demuestran que los cambios abruptos en imágenes se producen en entornos con movimientos y disparidades negativas elevadas que producen interferencias en los procesos de acomodación y vergencia del ojo humano, así como una necesidad en el aumento de los tiempos de enfoque del cristalino. En la mejora de las métricas de calidad a través de modelos que se adaptan al sistema visual humano, se realizaron también pruebas subjetivas que ayudaron a determinar la importancia de cada uno de los factores a la hora de enmascarar una determinada degradación. Los resultados demuestran una ligera mejora en los resultados obtenidos al aplicar máscaras de ponderación y atención visual, los cuales aproximan los parámetros de calidad objetiva a la respuesta del ojo humano. ABSTRACT Video quality assessment is still a necessary tool for defining the criteria to characterize a signal with the viewing requirements imposed by the final user. New technologies, such as 3D stereoscopic video and formats of HD and beyond HD oblige to develop new analysis of video features for obtaining the highest user’s satisfaction. Among the problems detected during the process of this doctoral thesis, it has been determined that some phenomena affect to different phases in the audiovisual production chain, apart from the type of content. On first instance, the generation of contents process should be enough controlled through parameters that avoid the occurrence of visual discomfort in observer’s eye, and consequently, visual fatigue. It is especially necessary controlling sequences of stereoscopic 3D, with both animation and live-action contents. On the other hand, video quality assessment, related to compression processes, should be improved because some objective metrics are adapted to user’s perception. The use of psychovisual models and visual attention diagrams allow the weighting of image regions of interest, giving more importance to the areas which the user will focus most probably. These two work fields are related together through the definition of the term saliency. Saliency is the capacity of human visual system for characterizing an image, highlighting the areas which result more attractive to the human eye. Saliency in generation of 3DTV contents refers mainly to the simulated depth of the optic illusion, i.e. the distance from the virtual object to the human eye. On the other hand, saliency is not based on virtual depth, but on other features, such as motion, level of detail, position of pixels in the frame or face detection, which are the basic features that are part of the developed visual attention model, as demonstrated with tests. Extensive literature involving visual comfort assessment was looked up, and the development of new preliminary subjective assessment with users was performed, in order to detect the features that increase the probability of discomfort to occur. With this methodology, the conclusions drawn confirmed that one common source of visual discomfort was when an abrupt change of disparity happened in video transitions, apart from other degradations, such as window violation. New quality assessment was performed to quantify the distribution of disparities over different sequences. The results confirmed that abrupt changes in negative parallax environment produce accommodation-vergence mismatches derived from the increasing time for human crystalline to focus the virtual objects. On the other side, for developing metrics that adapt to human visual system, additional subjective tests were developed to determine the importance of each factor, which masks a concrete distortion. Results demonstrated slight improvement after applying visual attention to objective metrics. This process of weighing pixels approximates the quality results to human eye’s response.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The usage of HTTP adaptive streaming (HAS) has become widely spread in multimedia services. Because it allows the service providers to improve the network resource utilization and user׳s Quality of Experience (QoE). Using this technology, the video playback interruption is reduced since the network and server status in addition to capability of user device, all are taken into account by HAS client to adapt the quality to the current condition. Adaptation can be done using different strategies. In order to provide optimal QoE, the perceptual impact of adaptation strategies from point of view of the user should be studied. However, the time-varying video quality due to the adaptation which usually takes place in a long interval introduces a new type of impairment making the subjective evaluation of adaptive streaming system challenging. The contribution of this paper is two-fold: first, it investigates the testing methodology to evaluate HAS QoE by comparing the subjective experimental outcomes obtained from ACR standardized method and a semi-continuous method developed to evaluate the long sequences. In addition, influence of using audiovisual stimuli to evaluate the video-related impairment is inquired. Second, impact of some of the adaptation technical factors including the quality switching amplitude and chunk size in combination with high range of commercial content type is investigated. The results of this study provide a good insight toward achieving appropriate testing method to evaluate HAS QoE, in addition to designing switching strategies with optimal visual quality.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Acknowledgements We would like to thank Erik Rexstad and Rob Williams for useful reviews of this manuscript. The collection of visual and acoustic data was funded by the UK Department of Energy & Climate Change, the Scottish Government, Collaborative Offshore Wind Research into the Environment (COWRIE) and Oil & Gas UK. Digital aerial surveys were funded by Moray Offshore Renewables Ltd and additional funding for analysis of the combined datasets was provided by Marine Scotland. Collaboration between the University of Aberdeen and Marine Scotland was supported by MarCRF. We thank colleagues at the University of Aberdeen, Moray First Marine, NERI, Hi-Def Aerial Surveying Ltd and Ravenair for essential support in the field, particularly Tim Barton, Bill Ruck, Rasmus Nielson and Dave Rutter. Thanks also to Andy Webb, David Borchers, Len Thomas, Kelly McLeod, David L. Miller, Dinara Sadykova and Thomas Cornulier for advice on survey design and statistical approache. Data Accessibility Data are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.cf04g

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Objective: To evaluate two cases of intermittent exotropia (IX(T)) treated by vision therapy the efficacy of the treatment by complementing the clinical examination with a 3-D video-oculography to register and to evidence the potential applicability of this technology for such purpose. Methods: We report the binocular alignment changes occurring after vision therapy in a woman of 36 years with an IX(T) of 25 prism diopters (Δ) at far and 18 Δ at near and a child of 10 years with 8 Δ of IX(T) in primary position associated to 6 Δ of left eye hypotropia. Both patients presented good visual acuity with correction in both eyes. Instability of ocular deviation was evident by VOG analysis, revealing also the presence of vertical and torsional components. Binocular vision therapy was prescribed and performed including different types of vergence, accommodation, and consciousness of diplopia training. Results: After therapy, excellent ranges of fusional vergence and a “to-the-nose” near point of convergence were obtained. The 3-D VOG examination (Sensoro Motoric Instruments, Teltow, Germany) confirmed the compensation of the deviation with a high level of stability of binocular alignment. Significant improvement could be observed after therapy in the vertical and torsional components that were found to become more stable. Patients were very satisfied with the outcome obtained by vision therapy. Conclusion: 3D-VOG is a useful technique for providing an objective register of the compensation of the ocular deviation and the stability of the binocular alignment achieved after vision therapy in cases of IX(T), providing a detailed analysis of vertical and torsional improvements.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: The pupillary light reflex characterizes the direct and consensual response of the eye to the perceived brightness of a stimulus. It has been used as indicator of both neurological and optic nerve pathologies. As with other eye reflexes, this reflex constitutes an almost instantaneous movement and is linked to activation of the same midbrain area. The latency of the pupillary light reflex is around 200 ms, although the literature also indicates that the fastest eye reflexes last 20 ms. Therefore, a system with sufficiently high spatial and temporal resolutions is required for accurate assessment. In this study, we analyzed the pupillary light reflex to determine whether any small discrepancy exists between the direct and consensual responses, and to ascertain whether any other eye reflex occurs before the pupillary light reflex. Methods: We constructed a binocular video-oculography system two high-speed cameras that simultaneously focused on both eyes. This was then employed to assess the direct and consensual responses of each eye using our own algorithm based on Circular Hough Transform to detect and track the pupil. Time parameters describing the pupillary light reflex were obtained from the radius time-variation. Eight healthy subjects (4 women, 4 men, aged 24–45) participated in this experiment. Results: Our system, which has a resolution of 15 microns and 4 ms, obtained time parameters describing the pupillary light reflex that were similar to those reported in previous studies, with no significant differences between direct and consensual reflexes. Moreover, it revealed an incomplete reflex blink and an upward eye movement at around 100 ms that may correspond to Bell’s phenomenon. Conclusions: Direct and consensual pupillary responses do not any significant temporal differences. The system and method described here could prove useful for further assessment of pupillary and blink reflexes. The resolution obtained revealed the existence reported here of an early incomplete blink and an upward eye movement.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Complementary programs

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Software for video-based multi-point frequency measuring and mapping: http://hdl.handle.net/10045/53429