750 resultados para Video Sequences
Resumo:
The problem of automatic recognition of the fish from the video sequences is discussed in this Master’s Thesis. This is a very urgent issue for many organizations engaged in fish farming in Finland and Russia because the process of automation control and counting of individual species is turning point in the industry. The difficulties and the specific features of the problem have been identified in order to find a solution and propose some recommendations for the components of the automated fish recognition system. Methods such as background subtraction, Kalman filtering and Viola-Jones method were implemented during this work for detection, tracking and estimation of fish parameters. Both the results of the experiments and the choice of the appropriate methods strongly depend on the quality and the type of a video which is used as an input data. Practical experiments have demonstrated that not all methods can produce good results for real data, whereas on synthetic data they operate satisfactorily.
Resumo:
Facial expression recognition is one of the most challenging research areas in the image recognition ¯eld and has been actively studied since the 70's. For instance, smile recognition has been studied due to the fact that it is considered an important facial expression in human communication, it is therefore likely useful for human–machine interaction. Moreover, if a smile can be detected and also its intensity estimated, it will raise the possibility of new applications in the future
Resumo:
This paper presents a mapping method for wide row crop fields. The resulting map shows the crop rows and weeds present in the inter-row spacing. Because field videos are acquired with a camera mounted on top of an agricultural vehicle, a method for image sequence stabilization was needed and consequently designed and developed. The proposed stabilization method uses the centers of some crop rows in the image sequence as features to be tracked, which compensates for the lateral movement (sway) of the camera and leaves the pitch unchanged. A region of interest is selected using the tracked features, and an inverse perspective technique transforms the selected region into a bird’s-eye view that is centered on the image and that enables map generation. The algorithm developed has been tested on several video sequences of different fields recorded at different times and under different lighting conditions, with good initial results. Indeed, lateral displacements of up to 66% of the inter-row spacing were suppressed through the stabilization process, and crop rows in the resulting maps appear straight
Resumo:
Assessing video quality is a complex task. While most pixel-based metrics do not present enough correlation between objective and subjective results, algorithms need to correspond to human perception when analyzing quality in a video sequence. For analyzing the perceived quality derived from concrete video artifacts in determined region of interest we present a novel methodology for generating test sequences which allow the analysis of impact of each individual distortion. Through results obtained after subjective assessment it is possible to create psychovisual models based on weighting pixels belonging to different regions of interest distributed by color, position, motion or content. Interesting results are obtained in subjective assessment which demonstrates the necessity of new metrics adapted to human visual system.
Resumo:
One of the most efficient approaches to generate the side information (SI) in distributed video codecs is through motion compensated frame interpolation where the current frame is estimated based on past and future reference frames. However, this approach leads to significant spatial and temporal variations in the correlation noise between the source at the encoder and the SI at the decoder. In such scenario, it would be useful to design an architecture where the SI can be more robustly generated at the block level, avoiding the creation of SI frame regions with lower correlation, largely responsible for some coding efficiency losses. In this paper, a flexible framework to generate SI at the block level in two modes is presented: while the first mode corresponds to a motion compensated interpolation (MCI) technique, the second mode corresponds to a motion compensated quality enhancement (MCQE) technique where a low quality Intra block sent by the encoder is used to generate the SI by doing motion estimation with the help of the reference frames. The novel MCQE mode can be overall advantageous from the rate-distortion point of view, even if some rate has to be invested in the low quality Intra coding blocks, for blocks where the MCI produces SI with lower correlation. The overall solution is evaluated in terms of RD performance with improvements up to 2 dB, especially for high motion video sequences and long Group of Pictures (GOP) sizes.
Resumo:
INTRODUCTION: Video records are widely used to analyze performance in alpine skiing at professional or amateur level. Parts of these analyses require the labeling of some movements (i.e. determining when specific events occur). If differences among coaches and differences for the same coach between different dates are expected, they have never been quantified. Moreover, knowing these differences is essential to determine which parameters reliable should be used. This study aimed to quantify the precision and the repeatability for alpine skiing coaches of various levels, as it is done in other fields (Koo et al, 2005). METHODS: A software similar to commercialized products was designed to allow video analyses. 15 coaches divided into 3 groups (5 amateur coaches (G1), 5 professional instructors (G2) and 5 semi-professional coaches (G3)) were enrolled. They were asked to label 15 timing parameters (TP) according to the Swiss ski manual (Terribilini et al, 2001) for each curve. TP included phases (initiation, steering I-II), body and ski movements (e.g. rotation, weighting, extension, balance). Three video sequences sampled at 25 Hz were used and one curve per video was labeled. The first video was used to familiarize the analyzer to the software. The two other videos, corresponding to slalom and giant slalom, were considered for the analysis. G1 realized twice the analysis (A1 and A2) at different dates and TP were randomized between both analyses. Reference TP were considered as the median of G2 and G3 at A1. The precision was defined as the RMS difference between individual TP and reference TP, whereas the repeatability was calculated as the RMS difference between individual TP at A1 and at A2. RESULTS AND DISCUSSION: For G1, G2 and G3, a precision of +/-5.6 frames, +/-3.0 and +/-2.0 frames, was respectively obtained. These results showed that G2 was more precise than G1, and G3 more precise than G2, were in accordance with group levels. The repeatability for G1 was +/-3.1 frames. Furthermore, differences among TP precision were observed, considering G2 and G3, with largest differences of +/-5.9 frames for "body counter rotation movement in steering phase II", and of 0.8 frame for "ski unweighting in initiation phase". CONCLUSION: This study quantified coach ability to label video in term of precision and repeatability. The best precision was obtained for G3 and was of +/-0.08s, which corresponds to +/-6.5% of the curve cycle. Regarding the repeatability, we obtained a result of +/-0.12s for G1, corresponding to +/-12% of the curve cycle. The repeatability of G2 and G3 are expected to be lower than the precision of G1 and the corresponding repeatability will be assessed soon. In conclusion, our results indicate that the labeling of video records is reliable for some TP, whereas caution is required for others. REFERENCES Koo S, Gold MD, Andriacchi TP. (2005). Osteoarthritis, 13, 782-789. Terribilini M, et al. (2001). Swiss Ski manual, 29-46. IASS, Lucerne.
Resumo:
This paper presents a paralleled Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. In the TPA, Motion Vectors (MV) are generated from the first-pass LHMEA and are used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We introduced hashtable into video processing and completed parallel implementation. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. It discusses how parallel video coding on load balanced multiprocessor systems can help, especially on motion estimation. The effect of load balancing for improved performance is discussed. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
This paper presents an improved parallel Two-Pass Hexagonal (TPA) algorithm constituted by Linear Hashtable Motion Estimation Algorithm (LHMEA) and Hexagonal Search (HEXBS) for motion estimation. Motion Vectors (MV) are generated from the first-pass LHMEA and used as predictors for second-pass HEXBS motion estimation, which only searches a small number of Macroblocks (MBs). We used bashtable into video processing and completed parallel implementation. The hashtable structure of LHMEA is improved compared to the original TPA and LHMEA. We propose and evaluate parallel implementations of the LHMEA of TPA on clusters of workstations for real time video compression. The implementation contains spatial and temporal approaches. The performance of the algorithm is evaluated by using standard video sequences and the results are compared to current algorithms.
Resumo:
This paper presents an empirical study of affine invariant feature detectors to perform matching on video sequences of people with non-rigid surface deformation. Recent advances in feature detection and wide baseline matching have focused on static scenes. Video frames of human movement capture highly non-rigid deformation such as loose hair, cloth creases, skin stretching and free flowing clothing. This study evaluates the performance of six widely used feature detectors for sparse temporal correspondence on single view and multiple view video sequences. Quantitative evaluation is performed of both the number of features detected and their temporal matching against and without ground truth correspondence. Recall-accuracy analysis of feature matching is reported for temporal correspondence on single view and multiple view sequences of people with variation in clothing and movement. This analysis identifies that existing feature detection and matching algorithms are unreliable for fast movement with common clothing.
Resumo:
This paper introduces a database of freely available stereo-3D content designed to facilitate research in stereo post-production. It describes the structure and content of the database and provides some details about how the material was gathered. The database includes examples of many of the scenarios characteristic to broadcast footage. Material was gathered at different locations including a studio with controlled lighting and both indoor and outdoor on-location sites with more restricted lighting control. The database also includes video sequences with accompanying 3D audio data recorded in an Ambisonics format. An intended consequence of gathering the material is that the database contains examples of degradations that would be commonly present in real-world scenarios. This paper describes one such artefact caused by uneven exposure in the stereo views, causing saturation in the over-exposed view. An algorithm for the restoration of this artefact is proposed in order to highlight the usefuiness of the database.
Resumo:
The development and evaluation of new algorithms and protocols for Wireless Multimedia Sensor Networks (WMSNs) are usually supported by means of a discrete event network simulator, where OMNeT++ is one of the most important ones. However, experiments involving multimedia transmission, video flows with different characteristics, genres, group of pictures lengths, and coding techniques must be evaluated based also on Quality of Experience (QoE) metrics to reflect the user's perception. Such experiments require the evaluation of video-related information, i.e., frame type, received/lost, delay, jitter, decoding errors, as well as inter and intra-frame dependency of received/distorted videos. However, existing OMNeT++ frameworks for WMSNs do not support video transmissions with QoE-awareness, neither a large set of mobility traces to enable evaluations under different multimedia/mobile situations. In this paper, we propose a Mobile MultiMedia Wireless Sensor Network OMNeT++ framework (M3WSN) to support transmission, control and evaluation of real video sequences in mobile WMSNs.
Resumo:
A reliable and robust routing service for Flying Ad-Hoc Networks (FANETs) must be able to adapt to topology changes. User experience on watching live video sequences must also be satisfactory even in scenarios with buffer overflow and high packet loss ratio. In this paper, we introduce a Cross-layer Link quality and Geographical-aware beaconless opportunistic routing protocol (XLinGO). It enhances the transmission of simultaneous multiple video flows over FANETs by creating and keeping reliable persistent multi-hop routes. XLinGO considers a set of cross-layer and human-related information for routing decisions, as performance metrics and Quality of Experience (QoE). Performance evaluation shows that XLinGO achieves multimedia dissemination with QoE support and robustness in a multi-hop, multi-flow, and mobile network environments.
Resumo:
In Video over IP services, perceived video quality heavily depends on parameters such as video coding and network Quality of Service. This paper proposes a model for the estimation of perceived video quality in video streaming and broadcasting services that combines the aforementioned parameters with other that depend mainly on the information contents of the video sequences. These fitting parameters are derived from the Spatial and Temporal Information contents of the sequences. This model does not require reference to the original video sequence so it can be used for online, real-time monitoring of perceived video quality in Video over IP services. Furthermore, this paper proposes a measurement workbench designed to acquire both training data for model fitting and test data for model validation. Preliminary results show good correlation between measured and predicted values.
Resumo:
The paper proposes a model for estimation of perceived video quality in IPTV, taking as input both video coding and network Quality of Service parameters. It includes some fitting parameters that depend mainly on the information contents of the video sequences. A method to derive them from the Spatial and Temporal Information contents of the sequences is proposed. The model may be used for near real-time monitoring of IPTV video quality.