994 resultados para video sequence matching


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The purpose of this study is to prove the convergence of the simultaneous estimation of the optical flow and object state (SEOS) method. The SEOS method utilizes dynamic object parameter information when calculating optical flow in tracking a moving object within a video stream. Optical flow estimation for the SEOS method requires the minimization of an error function containing the object's physical parameter data. When this function is discretized, the Euler-Lagrange equations form a system of linear equations. The system is arranged such that its property matrix is positive definite symmetric, proving the convergence of the Gauss-Seidel iterative methods. The system of linear equations produced by SEOS can alternatively be resolved by Jacobi iterative schemes. The positive definite symmetric property is not sufficient for Jacobi convergence. The convergence of SEOS for a block diagonal Jacobi is proved by analysing the Euclidean norm of the Jacobi matrix. In this paper, we also investigate the use of SEOS for tracking individual objects within a video sequence. The illustrations provided show the effectiveness of SEOS for localizing objects within a video sequence and generating optical flow results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The proliferation of multimedia content and the demand for new audio or video services have fostered the development of a new era based on multimedia information, which allowed the evolution of Wireless Multimedia Sensor Networks (WMSNs) and also Flying Ad-Hoc Networks (FANETs). In this way, live multimedia services require realtime video transmissions with a low frame loss rate, tolerable end-to-end delay, and jitter to support video dissemination with Quality of Experience (QoE) support. Hence, a key principle in a QoE-aware approach is the transmission of high priority frames (protect them) with a minimum packet loss ratio, as well as network overhead. Moreover, multimedia content must be transmitted from a given source to the destination via intermediate nodes with high reliability in a large scale scenario. The routing service must cope with dynamic topologies caused by node failure or mobility, as well as wireless channel changes, in order to continue to operate despite dynamic topologies during multimedia transmission. Finally, understanding user satisfaction on watching a video sequence is becoming a key requirement for delivery of multimedia content with QoE support. With this goal in mind, solutions involving multimedia transmissions must take into account the video characteristics to improve video quality delivery. The main research contributions of this thesis are driven by the research question how to provide multimedia distribution with high energy-efficiency, reliability, robustness, scalability, and QoE support over wireless ad hoc networks. The thesis addresses several problem domains with contributions on different layers of the communication stack. At the application layer, we introduce a QoE-aware packet redundancy mechanism to reduce the impact of the unreliable and lossy nature of wireless environment to disseminate live multimedia content. At the network layer, we introduce two routing protocols, namely video-aware Multi-hop and multi-path hierarchical routing protocol for Efficient VIdeo transmission for static WMSN scenarios (MEVI), and cross-layer link quality and geographical-aware beaconless OR protocol for multimedia FANET scenarios (XLinGO). Both protocols enable multimedia dissemination with energy-efficiency, reliability and QoE support. This is achieved by combining multiple cross-layer metrics for routing decision in order to establish reliable routes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Relevância:

80.00% 80.00%

Publicador:

Resumo:

[ES] In this paper we address the problem of inserting virtual content in a video sequence. The method we propose uses just image information. We perform primitive tracking, camera calibration, real and virtual camera synchronisation and finally rendering to insert the virtual content in the real video sequence. To simplify the calibration step we assume that cameras are mounted on a tripod (which is a common situation in practise). The primitive tracking procedure, which uses lines and circles as primitives, is performed by means of a CART (Classification and Regression Tree). Finally, the virtual and real camera synchronisation and rendering is performed using functions of OpenGL (Open Graphic Library). We have applied the method proposed to sport event scenarios, specifically, soccer matches. In order to illustrate its performance, it has been applied to real HD (High Definition) video sequences. The quality of the proposed method is validated by inserting virtual elements in such HD video sequence.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An algorithm for the real-time registration of a retinal video sequence captured with a scanning digital ophthalmoscope (SDO) to a retinal composite image is presented. This method is designed for a computer-assisted retinal laser photocoagulation system to compensate for retinal motion and hence enhance the accuracy, speed, and patient safety of retinal laser treatments. The procedure combines intensity and feature-based registration techniques. For the registration of an individual frame, the translational frame-to-frame motion between preceding and current frame is detected by normalized cross correlation. Next, vessel points on the current video frame are identified and an initial transformation estimate is constructed from the calculated translation vector and the quadratic registration matrix of the previous frame. The vessel points are then iteratively matched to the segmented vessel centerline of the composite image to refine the initial transformation and register the video frame to the composite image. Criteria for image quality and algorithm convergence are introduced, which assess the exclusion of single frames from the registration process and enable a loss of tracking signal if necessary. The algorithm was successfully applied to ten different video sequences recorded from patients. It revealed an average accuracy of 2.47 ± 2.0 pixels (∼23.2 ± 18.8 μm) for 2764 evaluated video frames and demonstrated that it meets the clinical requirements.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The proliferation of multimedia content and the demand for new audio or video services have fostered the development of a new era based on multimedia information, which allowed the evolution of Wireless Multimedia Sensor Networks (WMSNs) and also Flying Ad-Hoc Networks (FANETs). In this way, live multimedia services require real-time video transmissions with a low frame loss rate, tolerable end-to-end delay, and jitter to support video dissemination with Quality of Experience (QoE) support. Hence, a key principle in a QoE-aware approach is the transmission of high priority frames (protect them) with a minimum packet loss ratio, as well as network overhead. Moreover, multimedia content must be transmitted from a given source to the destination via intermediate nodes with high reliability in a large scale scenario. The routing service must cope with dynamic topologies caused by node failure or mobility, as well as wireless channel changes, in order to continue to operate despite dynamic topologies during multimedia transmission. Finally, understanding user satisfaction on watching a video sequence is becoming a key requirement for delivery of multimedia content with QoE support. With this goal in mind, solutions involving multimedia transmissions must take into account the video characteristics to improve video quality delivery. The main research contributions of this thesis are driven by the research question how to provide multimedia distribution with high energy-efficiency, reliability, robustness, scalability, and QoE support over wireless ad hoc networks. The thesis addresses several problem domains with contributions on different layers of the communication stack. At the application layer, we introduce a QoE-aware packet redundancy mechanism to reduce the impact of the unreliable and lossy nature of wireless environment to disseminate live multimedia content. At the network layer, we introduce two routing protocols, namely video-aware Multi-hop and multi-path hierarchical routing protocol for Efficient VIdeo transmission for static WMSN scenarios (MEVI), and cross-layer link quality and geographical-aware beaconless OR protocol for multimedia FANET scenarios (XLinGO). Both protocols enable multimedia dissemination with energy-efficiency, reliability and QoE support. This is achieved by combining multiple cross-layer metrics for routing decision in order to establish reliable routes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El objetivo general de este trabajo es el correcto funcionamiento de un sistema de reconocimiento facial compuesto de varios módulos, implementados en distintos lenguajes. Uno de dichos módulos está escrito en Python y se encargarí de determinar el género del rostro o rostros que aparecen en una imagen o en un fotograma de una secuencia de vídeo. El otro módulo, escrito en C++, llevará a cabo el reconocimiento de cada una de las partes de la cara (ojos, nariz, boca) y la orientación hacia la que está posicionada (derecha, izquierda). La primera parte de esta memoria corresponde a la reimplementación de todas las partes de un analizador facial, que constituyen el primer módulo antes mencionado. Estas partes son un analizador, compuesto a su vez por un reconocedor (Tracker) y un procesador (Processor), y una clase visor para poder visualizar los resultados. Por un lado, el reconocedor o "Tracker.es el encargado de encontrar la cara y sus partes, que serán pasadas al procesador o Processor, que analizará la cara obtenida por el reconocedor y determinará su género. Este módulo estaba dise~nado completamente en C y OpenCV 1.0, y ha sido reescrito en Python y OpenCV 2.4. Y en la segunda parte, se explica cómo realizar la comunicación entre el primer módulo escrito en Python y el segundo escrito en C++. Además, se analizarán diferentes herramientas para poder ejecutar código C++ desde programas Python. Dichas herramientas son PyBindGen, Cython y Boost. Dependiendo de las necesidades del programador se contará cuál de ellas es más conveniente utilizar en cada caso. Por último, en el apartado de resultados se puede observar el funcionamiento del sistema con la integración de los dos módulos, y cómo se muestran por pantalla los puntos de interés, el género y la orientación del rostro utilizando imágenes tomadas con una cámara web.---ABSTRACT---The main objective of this document is the proper functioning of a facial recognition system composed of two modules, implemented in diferent languages. One of these modules is written in Python, and his purpose is determining the gender of the face or faces in an image or a frame of a video sequence. The other module is written in C ++ and it will perform the recognition of each of the parts of the face (eyes, nose , mouth), and the head pose (right, left).The first part of this document corresponds to the reimplementacion of all components of a facial analyzer , which constitute the first module that I mentioned before. These parts are an analyzer , composed by a tracke) and a processor, and a viewer to display the results. The tracker function is to find and its parts, which will be passed to the processor, which will analyze the face obtained by the tracker. The processor will determine the face's gender. This module was completely written in C and OpenCV 1.0, and it has been rewritten in Python and OpenCV 2.4. And in the second part, it explains how to comunicate two modules, one of them written in Python and the other one written in C++. Furthermore, it talks about some tools to execute C++ code from Python scripts. The tools are PyBindGen, Cython and Boost. It will tell which one of those tools is better to use depend on the situation. Finally, in the results section it is possible to see how the system works with the integration of the two modules, and how the points of interest, the gender an the head pose are displayed on the screen using images taken from a webcam.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A more natural, intuitive, user-friendly, and less intrusive Human–Computer interface for controlling an application by executing hand gestures is presented. For this purpose, a robust vision-based hand-gesture recognition system has been developed, and a new database has been created to test it. The system is divided into three stages: detection, tracking, and recognition. The detection stage searches in every frame of a video sequence potential hand poses using a binary Support Vector Machine classifier and Local Binary Patterns as feature vectors. These detections are employed as input of a tracker to generate a spatio-temporal trajectory of hand poses. Finally, the recognition stage segments a spatio-temporal volume of data using the obtained trajectories, and compute a video descriptor called Volumetric Spatiograms of Local Binary Patterns (VS-LBP), which is delivered to a bank of SVM classifiers to perform the gesture recognition. The VS-LBP is a novel video descriptor that constitutes one of the most important contributions of the paper, which is able to provide much richer spatio-temporal information than other existing approaches in the state of the art with a manageable computational cost. Excellent results have been obtained outperforming other approaches of the state of the art.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Behaviour analysis of construction safety systems is of fundamental importance to avoid accidental injuries. Traditionally, measurements of dynamic actions in Civil Engineering have been done through accelerometers, but high-speed cameras and image processing techniques can play an important role in this area. Here, we propose using morphological image filtering and Hough transform on high-speed video sequence as tools for dynamic measurements on that field. The presented method is applied to obtain the trajectory and acceleration of a cylindrical ballast falling from a building and trapped by a thread net. Results show that safety recommendations given in construction codes can be potentially dangerous for workers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ce travail présente deux nouveaux systèmes simples d'analyse de la marche humaine grâce à une caméra de profondeur (Microsoft Kinect) placée devant un sujet marchant sur un tapis roulant conventionnel, capables de détecter une marche saine et celle déficiente. Le premier système repose sur le fait qu'une marche normale présente typiquement un signal de profondeur lisse au niveau de chaque pixel avec moins de hautes fréquences, ce qui permet d'estimer une carte indiquant l'emplacement et l'amplitude de l'énergie de haute fréquence (HFSE). Le second système analyse les parties du corps qui ont un motif de mouvement irrégulier, en termes de périodicité, lors de la marche. Nous supposons que la marche d'un sujet sain présente partout dans le corps, pendant les cycles de marche, un signal de profondeur avec un motif périodique sans bruit. Nous estimons, à partir de la séquence vidéo de chaque sujet, une carte montrant les zones d'irrégularités de la marche (également appelées énergie de bruit apériodique). La carte avec HFSE ou celle visualisant l'énergie de bruit apériodique peut être utilisée comme un bon indicateur d'une éventuelle pathologie, dans un outil de diagnostic précoce, rapide et fiable, ou permettre de fournir des informations sur la présence et l'étendue de la maladie ou des problèmes (orthopédiques, musculaires ou neurologiques) du patient. Même si les cartes obtenues sont informatives et très discriminantes pour une classification visuelle directe, même pour un non-spécialiste, les systèmes proposés permettent de détecter automatiquement les individus en bonne santé et ceux avec des problèmes locomoteurs.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ce travail présente deux nouveaux systèmes simples d'analyse de la marche humaine grâce à une caméra de profondeur (Microsoft Kinect) placée devant un sujet marchant sur un tapis roulant conventionnel, capables de détecter une marche saine et celle déficiente. Le premier système repose sur le fait qu'une marche normale présente typiquement un signal de profondeur lisse au niveau de chaque pixel avec moins de hautes fréquences, ce qui permet d'estimer une carte indiquant l'emplacement et l'amplitude de l'énergie de haute fréquence (HFSE). Le second système analyse les parties du corps qui ont un motif de mouvement irrégulier, en termes de périodicité, lors de la marche. Nous supposons que la marche d'un sujet sain présente partout dans le corps, pendant les cycles de marche, un signal de profondeur avec un motif périodique sans bruit. Nous estimons, à partir de la séquence vidéo de chaque sujet, une carte montrant les zones d'irrégularités de la marche (également appelées énergie de bruit apériodique). La carte avec HFSE ou celle visualisant l'énergie de bruit apériodique peut être utilisée comme un bon indicateur d'une éventuelle pathologie, dans un outil de diagnostic précoce, rapide et fiable, ou permettre de fournir des informations sur la présence et l'étendue de la maladie ou des problèmes (orthopédiques, musculaires ou neurologiques) du patient. Même si les cartes obtenues sont informatives et très discriminantes pour une classification visuelle directe, même pour un non-spécialiste, les systèmes proposés permettent de détecter automatiquement les individus en bonne santé et ceux avec des problèmes locomoteurs.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We introduce a view-point invariant representation of moving object trajectories that can be used in video database applications. It is assumed that trajectories lie on a surface that can be locally approximated with a plane. Raw trajectory data is first locally approximated with a cubic spline via least squares fitting. For each sampled point of the obtained curve, a projective invariant feature is computed using a small number of points in its neighborhood. The resulting sequence of invariant features computed along the entire trajectory forms the view invariant descriptor of the trajectory itself. Time parametrization has been exploited to compute cross ratios without ambiguity due to point ordering. Similarity between descriptors of different trajectories is measured with a distance that takes into account the statistical properties of the cross ratio, and its symmetry with respect to the point at infinity. In experiments, an overall correct classification rate of about 95% has been obtained on a dataset of 58 trajectories of players in soccer video, and an overall correct classification rate of about 80% has been obtained on matching partial segments of trajectories collected from two overlapping views of outdoor scenes with moving people and cars.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We address the problem of face recognition on video by employing the recently proposed probabilistic linear discrimi-nant analysis (PLDA). The PLDA has been shown to be robust against pose and expression in image-based face recognition. In this research, the method is extended and applied to video where image set to image set matching is performed. We investigate two approaches of computing similarities between image sets using the PLDA: the closest pair approach and the holistic sets approach. To better model face appearances in video, we also propose the heteroscedastic version of the PLDA which learns the within-class covariance of each individual separately. Our experi-ments on the VidTIMIT and Honda datasets show that the combination of the heteroscedastic PLDA and the closest pair approach achieves the best performance.