56 resultados para Face recognition from video
Resumo:
In this paper we present a new event recognition framework, based on the Dempster-Shafer theory of evidence, which combines the evidence from multiple atomic events detected by low-level computer vision analytics. The proposed framework employs evidential network modelling of composite events. This approach can effectively handle the uncertainty of the detected events, whilst inferring high-level events that have semantic meaning with high degrees of belief. Our scheme has been comprehensively evaluated against various scenarios that simulate passenger behaviour on public transport platforms such as buses and trains. The average accuracy rate of our method is 81% in comparison to 76% by a standard rule-based method.
Resumo:
CCTV systems are broadly deployed in the present world. To ensure
in-time reaction for intelligent surveillance, it is a fundamental task for real-world
applications to determine the gender of people of interest. However, normal video
algorithms for gender profiling (usually face profiling) have three drawbacks.
First, the profiling result is always uncertain. Second, for a time-lasting gender
profiling algorithm, the result is not stable. The degree of certainty usually varies, sometimes even to the extent that a male is classified as a female, and vice versa. Third, for a robust profiling result in cases were a person’s face is not visible, other features, such as body shape, are required. These algorithms may provide different recognition results - at the very least, they will provide different degrees of certainties. To overcome these problems, in this paper, we introduce an evidential approach that makes use of profiling results from multiple algorithms over a period of time. Experiments show that this approach does provide better results than single profiling results and classic fusion results.
Resumo:
This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy.
Resumo:
Introduction
The use of video capture of lectures in Higher Education is not a recent occurrence with web based learning technologies including digital recording of live lectures becoming increasing commonly offered by universities throughout the world (Holliman and Scanlon, 2004). However in the past decade the increase in technical infrastructural provision including the availability of high speed broadband has increased the potential and use of videoed lecture capture. This had led to a variety of lecture capture formats including pod casting, live streaming or delayed broadcasting of whole or part of lectures.
Additionally in the past five years there has been a significant increase in the popularity of online learning, specifically via Massive Open Online Courses (MOOCs) (Vardi, 2014). One of the key aspects of MOOCs is the simulated recording of lecture like activities. There has been and continues to be much debate on the consequences of the popularity of MOOCs, especially in relation to its potential uses within established University programmes.
There have been a number of studies dedicated to the effects of videoing lectures.
The clustered areas of research in video lecture capture have the following main themes:
• Staff perceptions including attendance, performance of students and staff workload
• Reinforcement versus replacement of lectures
• Improved flexibility of learning
• Facilitating engaging and effective learning experiences
• Student usage, perception and satisfaction
• Facilitating students learning at their own pace
Most of the body of the research has concentrated on student and faculty perceptions, including academic achievement, student attendance and engagement (Johnston et al, 2012).
Generally the research has been positive in review of the benefits of lecture capture for both students and faculty. This perception coupled with technical infrastructure improvements and student demand may well mean that the use of video lecture capture will continue to increase in frequency in the next number of years in tertiary education. However there is a relatively limited amount of research in the effects of lecture capture specifically in the area of computer programming with Watkins 2007 being one of few studies . Video delivery of programming solutions is particularly useful for enabling a lecturer to illustrate the complex decision making processes and iterative nature of the actual code development process (Watkins et al 2007). As such research in this area would appear to be particularly appropriate to help inform debate and future decisions made by policy makers.
Research questions and objectives
The purpose of the research was to investigate how a series of lecture captures (in which the audio of lectures and video of on-screen projected content were recorded) impacted on the delivery and learning of a programme of study in an MSc Software Development course in Queen’s University, Belfast, Northern Ireland. The MSc is conversion programme, intended to take graduates from non-computing primary degrees and upskill them in this area. The research specifically targeted the Java programming module within the course. It also analyses and reports on the empirical data from attendances and various video viewing statistics. In addition, qualitative data was collected from staff and student feedback to help contextualise the quantitative results.
Methodology, Methods and Research Instruments Used
The study was conducted with a cohort of 85 post graduate students taking a compulsory module in Java programming in the first semester of a one year MSc in Software Development. A pre-course survey of students found that 58% preferred to have available videos of “key moments” of lectures rather than whole lectures. A large scale study carried out by Guo concluded that “shorter videos are much more engaging” (Guo 2013). Of concern was the potential for low audience retention for videos of whole lectures.
The lecturers recorded snippets of the lecture directly before or after the actual physical delivery of the lecture, in a quiet environment and then upload the video directly to a closed YouTube channel. These snippets generally concentrated on significant parts of the theory followed by theory related coding demonstration activities and were faithful in replication of the face to face lecture. Generally each lecture was supported by two to three videos of durations ranging from 20 – 30 minutes.
Attendance
The MSc programme has several attendance based modules of which Java Programming was one element. In order to assess the consequence on attendance for the Programming module a control was established. The control used was a Database module which is taken by the same students and runs in the same semester.
Access engagement
The videos were hosted on a closed YouTube channel made available only to the students in the class. The channel had enabled analytics which reported on the following areas for all and for each individual video; views (hits), audience retention, viewing devices / operating systems used and minutes watched.
Student attitudes
Three surveys were taken in regard to investigating student attitudes towards the videoing of lectures. The first was before the start of the programming module, then at the mid-point and subsequently after the programme was complete.
The questions in the first survey were targeted at eliciting student attitudes towards lecture capture before they had experienced it in the programme. The midpoint survey gathered data in relation to how the students were individually using the system up to that point. This included feedback on how many videos an individual had watched, viewing duration, primary reasons for watching and the result on attendance, in addition to probing for comments or suggestions. The final survey on course completion contained questions similar to the midpoint survey but in summative view of the whole video programme.
Conclusions and Outcomes
The study confirmed findings of other such investigations illustrating that there is little or no effect on attendance at lectures. The use of the videos appears to help promote continual learning but they are particularly accessed by students at assessment periods. Students respond positively to the ability to access lectures digitally, as a means of reinforcing learning experiences rather than replacing them. Feedback from students was overwhelmingly positive indicating that the videos benefited their learning. Also there are significant benefits to part recording of lectures rather than recording whole lectures. The behaviour viewing trends analytics suggest that despite the increase in the popularity of online learning via MOOCs and the promotion of video learning on mobile devices in fact in this study the vast majority of students accessed the online videos at home on laptops or desktops However, in part, this is likely due to the nature of the taught subject, that being programming.
The research involved prerecording the lecture in smaller timed units and then uploading for distribution to counteract existing quality issues with recording entire live lectures. However the advancement and consequential improvement in quality of in situ lecture capture equipment may well help negate the need to record elsewhere. The research has also highlighted an area of potentially very significant use for performance analysis and improvement that could have major implications for the quality of teaching. A study of the analytics of the viewings of the videos could well provide a quick response formative feedback mechanism for the lecturer. If a videoed lecture either recorded live or later is a true reflection of the face to face lecture an analysis of the viewing patterns for the video may well reveal trends that correspond with the live delivery.
Resumo:
Do patterns in the YouTube viewing analytics of Lecture Capture videos point to areas of potential teaching and learning performance enhancement? The goal of this action based research project was to capture and quantitatively analyse the viewing behaviours and patterns of a series of video lecture captures across several computing modules in Queen’s University, Belfast, Northern Ireland. The research sought to establish if a quantitative analysis of viewing behaviours coupled with a qualitative evaluation of the material provided from the students could be correlated to provide generalised patterns that could then be used to understand the learning experience of students during face to face lectures and, thereby, present opportunities to reflectively enhance lecturer performance and the students’ overall learning experience and, ultimately, their level of academic attainment.
Resumo:
CCTV (Closed-Circuit TeleVision) systems are broadly deployed in the present world. To ensure in-time reaction for intelligent surveillance, it is a fundamental task for real-world applications to determine the gender of people of interest. However, normal video algorithms for gender profiling (usually face profiling) have three drawbacks. First, the profiling result is always uncertain. Second, the profiling result is not stable. The degree of certainty usually varies over time, sometimes even to the extent that a male is classified as a female, and vice versa. Third, for a robust profiling result in cases that a person’s face is not visible, other features, such as body shape, are required. These algorithms may provide different recognition results - at the very least, they will provide different degrees of certainties. To overcome these problems, in this paper, we introduce an Dempster-Shafer (DS) evidential approach that makes use of profiling results from multiple algorithms over a period of time, in particular, Denoeux’s cautious rule is applied for fusing mass functions through time lines. Experiments show that this approach does provide better results than single profiling results and classic fusion results. Furthermore, it is found that if severe mis-classification has occurred at the beginning of the time line, the combination can yield undesirable results. To remedy this weakness, we further propose three extensions to the evidential approach proposed above incorporating notions of time-window, time-attenuation, and time-discounting, respectively. These extensions also applies Denoeux’s rule along with time lines and take the DS approach as a special case. Experiments show that these three extensions do provide better results than their predecessor when mis-classifications occur.
Resumo:
In this paper we propose a novel recurrent neural networkarchitecture for video-based person re-identification.Given the video sequence of a person, features are extracted from each frame using a convolutional neural network that incorporates a recurrent final layer, which allows information to flow between time-steps. The features from all time steps are then combined using temporal pooling to give an overall appearance feature for the complete sequence. The convolutional network, recurrent layer, and temporal pooling layer, are jointly trained to act as a feature extractor for video-based re-identification using a Siamese network architecture.Our approach makes use of colour and optical flow information in order to capture appearance and motion information which is useful for video re-identification. Experiments are conduced on the iLIDS-VID and PRID-2011 datasets to show that this approach outperforms existing methods of video-based re-identification.
https://github.com/niallmcl/Recurrent-Convolutional-Video-ReID
Project Source Code
Resumo:
Situational awareness is achieved naturally by the human senses of sight and hearing in combination. Automatic scene understanding aims at replicating this human ability using microphones and cameras in cooperation. In this paper, audio and video signals are fused and integrated at different levels of semantic abstractions. We detect and track a speaker who is relatively unconstrained, i.e., free to move indoors within an area larger than the comparable reported work, which is usually limited to round table meetings. The system is relatively simple: consisting of just 4 microphone pairs and a single camera. Results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talk. System evaluation is performed on both single and multi-modality tracking. The performance improvement given by the audio–video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision–recall (recognition). Improvements vs. the closest works are evaluated: 56% sound source localisation computational cost over an audio only system, 8% speaker diarisation error rate over an audio only speaker recognition unit and 36% on the precision–recall metric over an audio–video dominant speaker recognition method.
Resumo:
This paper presents the novel theory for performing multi-agent activity recognition without requiring large training corpora. The reduced need for data means that robust probabilistic recognition can be performed within domains where annotated datasets are traditionally unavailable. Complex human activities are composed from sequences of underlying primitive activities. We do not assume that the exact temporal ordering of primitives is necessary, so can represent complex activity using an unordered bag. Our three-tier architecture comprises low-level video tracking, event analysis and high-level inference. High-level inference is performed using a new, cascading extension of the Rao–Blackwellised Particle Filter. Simulated annealing is used to identify pairs of agents involved in multi-agent activity. We validate our framework using the benchmarked PETS 2006 video surveillance dataset and our own sequences, and achieve a mean recognition F-Score of 0.82. Our approach achieves a mean improvement of 17% over a Hidden Markov Model baseline.