786 resultados para Video cameras
Resumo:
This paper presents an approach for the automatic calibration of low-cost cameras which are assumed to be restricted in their freedom of movement to either pan or tilt movements. Camera parameters, including focal length, principal point, lens distortion parameter and the angle and axis of rotation, can be recovered from a minimum set of two images of the camera, provided that the axis of rotation between the two images goes through the camera’s optical center and is parallel to either the vertical (panning) or horizontal (tilting) axis of the image. Previous methods for auto-calibration of cameras based on pure rotations fail to work in these two degenerate cases. In addition, our approach includes a modified RANdom SAmple Consensus (RANSAC) algorithm, as well as improved integration of the radial distortion coefficient in the computation of inter-image homographies. We show that these modifications are able to increase the overall efficiency, reliability and accuracy of the homography computation and calibration procedure using both synthetic and real image sequences
Resumo:
Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.
Resumo:
Facial expression is an important channel of human social communication. Facial expression recognition (FER) aims to perceive and understand emotional states of humans based on information in the face. Building robust and high performance FER systems that can work in real-world video is still a challenging task, due to the various unpredictable facial variations and complicated exterior environmental conditions, as well as the difficulty of choosing a suitable type of feature descriptor for extracting discriminative facial information. Facial variations caused by factors such as pose, age, gender, race and occlusion, can exert profound influence on the robustness, while a suitable feature descriptor largely determines the performance. Most present attention on FER has been paid to addressing variations in pose and illumination. No approach has been reported on handling face localization errors and relatively few on overcoming facial occlusions, although the significant impact of these two variations on the performance has been proved and highlighted in many previous studies. Many texture and geometric features have been previously proposed for FER. However, few comparison studies have been conducted to explore the performance differences between different features and examine the performance improvement arisen from fusion of texture and geometry, especially on data with spontaneous emotions. The majority of existing approaches are evaluated on databases with posed or induced facial expressions collected in laboratory environments, whereas little attention has been paid on recognizing naturalistic facial expressions on real-world data. This thesis investigates techniques for building robust and high performance FER systems based on a number of established feature sets. It comprises of contributions towards three main objectives: (1) Robustness to face localization errors and facial occlusions. An approach is proposed to handle face localization errors and facial occlusions using Gabor based templates. Template extraction algorithms are designed to collect a pool of local template features and template matching is then performed to covert these templates into distances, which are robust to localization errors and occlusions. (2) Improvement of performance through feature comparison, selection and fusion. A comparative framework is presented to compare the performance between different features and different feature selection algorithms, and examine the performance improvement arising from fusion of texture and geometry. The framework is evaluated for both discrete and dimensional expression recognition on spontaneous data. (3) Evaluation of performance in the context of real-world applications. A system is selected and applied into discriminating posed versus spontaneous expressions and recognizing naturalistic facial expressions. A database is collected from real-world recordings and is used to explore feature differences between standard database images and real-world images, as well as between real-world images and real-world video frames. The performance evaluations are based on the JAFFE, CK, Feedtum, NVIE, Semaine and self-collected QUT databases. The results demonstrate high robustness of the proposed approach to the simulated localization errors and occlusions. Texture and geometry have different contributions to the performance of discrete and dimensional expression recognition, as well as posed versus spontaneous emotion discrimination. These investigations provide useful insights into enhancing robustness and achieving high performance of FER systems, and putting them into real-world applications.
Resumo:
From a law enforcement standpoint, the ability to search for a person matching a semantic description (i.e. 1.8m tall, red shirt, jeans) is highly desirable. While a significant research effort has focused on person re-detection (the task of identifying a previously observed individual in surveillance video), these techniques require descriptors to be built from existing image or video observations. As such, person re-detection techniques are not suited to situations where footage of the person of interest is not readily available, such as a witness reporting a recent crime. In this paper, we present a novel framework that is able to search for a person based on a semantic description. The proposed approach uses size and colour cues, and does not require a person detection routine to locate people in the scene, improving utility in crowded conditions. The proposed approach is demonstrated with a new database that will be made available to the research community, and we show that the proposed technique is able to correctly localise a person in a video based on a simple semantic description.
Resumo:
Thermal-infrared imagery is relatively robust to many of the failure conditions of visual and laser-based SLAM systems, such as fog, dust and smoke. The ability to use thermal-infrared video for localization is therefore highly appealing for many applications. However, operating in thermal-infrared is beyond the capacity of existing SLAM implementations. This paper presents the first known monocular SLAM system designed and tested for hand-held use in the thermal-infrared modality. The implementation includes a flexible feature detection layer able to achieve robust feature tracking in high-noise, low-texture thermal images. A novel approach for structure initialization is also presented. The system is robust to irregular motion and capable of handling the unique mechanical shutter interruptions common to thermal-infrared cameras. The evaluation demonstrates promising performance of the algorithm in several environments.
Resumo:
In this paper a real-time vision based power line extraction solution is investigated for active UAV guidance. The line extraction algorithm starts from ridge points detected by steerable filters. A collinear line segments fitting algorithm is followed up by considering global and local information together with multiple collinear measurements. GPU boosted algorithm implementation is also investigated in the experiment. The experimental result shows that the proposed algorithm outperforms two baseline line detection algorithms and is able to fitting long collinear line segments. The low computational cost of the algorithm make suitable for real-time applications.
Resumo:
Quality based frame selection is a crucial task in video face recognition, to both improve the recognition rate and to reduce the computational cost. In this paper we present a framework that uses a variety of cues (face symmetry, sharpness, contrast, closeness of mouth, brightness and openness of the eye) to select the highest quality facial images available in a video sequence for recognition. Normalized feature scores are fused using a neural network and frames with high quality scores are used in a Local Gabor Binary Pattern Histogram Sequence based face recognition system. Experiments on the Honda/UCSD database shows that the proposed method selects the best quality face images in the video sequence, resulting in improved recognition performance.
Resumo:
Studies dedicated to understanding the relationship between gaming and mental health, have traditionally focused on the effects of depression, anxiety, obsessive usage, aggression, obesity, and faltering ‘real life’ relationships. The complexity of game genre and personality aside, this review aims to define a space for a positive relationship between videogame play and wellbeing by applying current videogame research to the criteria that defines the wellbeing construct ‘flourishing’. Self- determination theory (SDT), and flow provide context, and areas of overlap are explored.
Resumo:
While a rich body of literature in television and film studies and media policy studies has tended to focus on the media activities in the formal sector, we know much less about informal media activities, its influence on state policies, as well as the dynamics between the formal and the informal sectors. This article examines these issues with reference to a particularly revealing period following a large-scale government crackdown on peer-to-peer video sharing sites in China in 2008. By analyzing the aim and consequences of the state action, I point to the counter-productive effect in terms of cultural loss and the resurgence of offline piracy; and show the positive impact on forcing the informal into the formal sector, and pressuring the formal to innovate. Meanwhile, an increasing rapprochement between professional and user-created content leads to a new relationship between formal and informal sectors. This case demonstrates the importance of considering the dynamics between the two sectors. It also offers compelling evidence of the role of the informal sector in engendering state action, which in turn impacted on the co-evolution of the formal and the informal sectors.
Resumo:
This proposal combines ethnographic techniques and discourse studies to investigating a collective of people engaged with audiovisual productions who collaborate in Curta Favela’s workshops in Rio de Janeiro’s favelas. ‘Favela’ is often translated simply as ‘slum’ or ‘shantytown’, but these terms connote negative characteristics such as shortage, poverty, and deprivation referring to favelas which end up stigmatizing these low income suburbs. Curta Favela (Favela Shorts) is an independent project which all participants join to use photography and participatory audiovisual production as a tool for social change and raising consciousness. As cameras are not affordable for favelas dwellers, Curta Favela’s volunteers teach favela residents how they can use their mobile phones and compact cameras to take pictures and make movies, and afterwards, how they can edit the data using free editing video software programs and publish it on the Internet. To record audio, they use their mp3 or mobile phones. The main aim of this study is to shed light not only on how this project operates, but also to highlight how collective intelligence can be used as a way of fighting against the lack of basic resources.
Resumo:
This proposal combines ethnographic techniques and discourse studies to investigate a collective of people engaged with audiovisual productions who collaborate in Curta Favela’s workshops in Rio de Janeiro’s favelas. ‘Favela’ is often translated simply as ‘slum’ or ‘shantytown’, but these terms connote negative characteristics such as shortage, poverty, and deprivation which end up stigmatizing these low income suburbs. Curta Favela (Favela Shorts) is an independent project in which all participants join to use photography and participatory audiovisual production as tools for social change and to raise consciousness. As cameras are not affordable for favela dwellers, Curta Favela’s volunteers teach favela residents how they can use their mobile phones and compact cameras to take pictures and make movies, and afterwards, how they can edit the data using free editing video software programs and publish it on the Internet. To record audio, they use their mp3 or mobile phones. The main aim of this study is to shed light not only on how this project operates, but also to highlight how collective intelligence can be used as a way of fighting against a lack of basic resources.
Resumo:
Video-based training combined with flotation tank recovery may provide an additional stimulus for improving shooting in basketball. A pre-post controlled trial was conducted to assess the effectiveness of a 3 wk intervention combining video-based training and flotation tank recovery on three-point shooting performance in elite female basketball players. Players were assigned to an experimental (n=10) and control group (n=9). A 3 wk intervention consisted of 2 x 30 min float sessions a week which included 10 min of video-based training footage, followed by a 3 wk retention phase. A total of 100 three-point shots were taken from 5 designated positions on the court at each week to assess three-point shooting performance. There was no clear difference in the mean change in the number of successful three-point shots between the groups (-3%; ±18%, mean; ±90% confidence limits). Video-based training combined with flotation recovery had little effect on three-point shooting performance.
Resumo:
To recognize faces in video, face appearances have been widely modeled as piece-wise local linear models which linearly approximate the smooth yet non-linear low dimensional face appearance manifolds. The choice of representations of the local models is crucial. Most of the existing methods learn each local model individually meaning that they only anticipate variations within each class. In this work, we propose to represent local models as Gaussian distributions which are learned simultaneously using the heteroscedastic probabilistic linear discriminant analysis (PLDA). Each gallery video is therefore represented as a collection of such distributions. With the PLDA, not only the within-class variations are estimated during the training, the separability between classes is also maximized leading to an improved discrimination. The heteroscedastic PLDA itself is adapted from the standard PLDA to approximate face appearance manifolds more accurately. Instead of assuming a single global within-class covariance, the heteroscedastic PLDA learns different within-class covariances specific to each local model. In the recognition phase, a probe video is matched against gallery samples through the fusion of point-to-model distances. Experiments on the Honda and MoBo datasets have shown the merit of the proposed method which achieves better performance than the state-of-the-art technique.
Resumo:
The increasing demand for mobile video has attracted much attention from both industry and researchers. To satisfy users and to facilitate the usage of mobile video, providing optimal quality to the users is necessary. As a result, quality of experience (QoE) becomes an important focus in measuring the overall quality perceived by the end-users, from the aspects of both objective system performance and subjective experience. However, due to the complexity of user experience and diversity of resources (such as videos, networks and mobile devices), it is still challenging to develop QoE models for mobile video that can represent how user-perceived value varies with changing conditions. Previous QoE modelling research has two main limitations: aspects influencing QoE are insufficiently considered; and acceptability as the user value is seldom studied. Focusing on the QoE modelling issues, two aims are defined in this thesis: (i) investigating the key influencing factors of mobile video QoE; and (ii) establishing QoE prediction models based on the relationships between user acceptability and the influencing factors, in order to help provide optimal mobile video quality. To achieve the first goal, a comprehensive user study was conducted. It investigated the main impacts on user acceptance: video encoding parameters such as quantization parameter, spatial resolution, frame rate, and encoding bitrate; video content type; mobile device display resolution; and user profiles including gender, preference for video content, and prior viewing experience. Results from both quantitative and qualitative analysis revealed the significance of these factors, as well as how and why they influenced user acceptance of mobile video quality. Based on the results of the user study, statistical techniques were used to generate a set of QoE models that predict the subjective acceptability of mobile video quality by using a group of the measurable influencing factors, including encoding parameters and bitrate, content type, and mobile device display resolution. Applying the proposed QoE models into a mobile video delivery system, optimal decisions can be made for determining proper video coding parameters and for delivering most suitable quality to users. This would lead to consistent user experience on different mobile video content and efficient resource allocation. The findings in this research enhance the understanding of user experience in the field of mobile video, which will benefit mobile video design and research. This thesis presents a way of modelling QoE by emphasising user acceptability of mobile video quality, which provides a strong connection between technical parameters and user-desired quality. Managing QoE based on acceptability promises the potential for adapting to the resource limitations and achieving an optimal QoE in the provision of mobile video content.