610 resultados para video data
Resumo:
Facial expression is an important channel of human social communication. Facial expression recognition (FER) aims to perceive and understand emotional states of humans based on information in the face. Building robust and high performance FER systems that can work in real-world video is still a challenging task, due to the various unpredictable facial variations and complicated exterior environmental conditions, as well as the difficulty of choosing a suitable type of feature descriptor for extracting discriminative facial information. Facial variations caused by factors such as pose, age, gender, race and occlusion, can exert profound influence on the robustness, while a suitable feature descriptor largely determines the performance. Most present attention on FER has been paid to addressing variations in pose and illumination. No approach has been reported on handling face localization errors and relatively few on overcoming facial occlusions, although the significant impact of these two variations on the performance has been proved and highlighted in many previous studies. Many texture and geometric features have been previously proposed for FER. However, few comparison studies have been conducted to explore the performance differences between different features and examine the performance improvement arisen from fusion of texture and geometry, especially on data with spontaneous emotions. The majority of existing approaches are evaluated on databases with posed or induced facial expressions collected in laboratory environments, whereas little attention has been paid on recognizing naturalistic facial expressions on real-world data. This thesis investigates techniques for building robust and high performance FER systems based on a number of established feature sets. It comprises of contributions towards three main objectives: (1) Robustness to face localization errors and facial occlusions. An approach is proposed to handle face localization errors and facial occlusions using Gabor based templates. Template extraction algorithms are designed to collect a pool of local template features and template matching is then performed to covert these templates into distances, which are robust to localization errors and occlusions. (2) Improvement of performance through feature comparison, selection and fusion. A comparative framework is presented to compare the performance between different features and different feature selection algorithms, and examine the performance improvement arising from fusion of texture and geometry. The framework is evaluated for both discrete and dimensional expression recognition on spontaneous data. (3) Evaluation of performance in the context of real-world applications. A system is selected and applied into discriminating posed versus spontaneous expressions and recognizing naturalistic facial expressions. A database is collected from real-world recordings and is used to explore feature differences between standard database images and real-world images, as well as between real-world images and real-world video frames. The performance evaluations are based on the JAFFE, CK, Feedtum, NVIE, Semaine and self-collected QUT databases. The results demonstrate high robustness of the proposed approach to the simulated localization errors and occlusions. Texture and geometry have different contributions to the performance of discrete and dimensional expression recognition, as well as posed versus spontaneous emotion discrimination. These investigations provide useful insights into enhancing robustness and achieving high performance of FER systems, and putting them into real-world applications.
Resumo:
This paper describes an innovative platform that facilitates the collection of objective safety data around occurrences at railway level crossings using data sources including forward-facing video, telemetry from trains and geo-referenced asset and survey data. This platform is being developed with support by the Australian rail industry and the Cooperative Research Centre for Rail Innovation. The paper provides a description of the underlying accident causation model, the development methodology and refinement process as well as a description of the data collection platform. The paper concludes with a brief discussion of benefits this project is expected to provide the Australian rail industry.
Resumo:
Mobile video, as an emerging market and a promising research field, has attracted much attention from both industry and researchers. Considering the quality of user-experience as the crux of mobile video services, this chapter aims to provide a guide to user-centered studies of mobile video quality. This will benefit future research in better understanding user needs and experiences, designing effective research, and providing solid solutions to improve the quality of mobile video. This chapter is organized in three main parts: (1) a review of recent user studies from the perspectives of research focuses, user study methods, and data analysis methods; (2) an example of conducting a user study of mobile video research, together with the discussion on a series of relative issues, such as participants, materials and devices, study procedure, and analysis results, and; (3) a conclusion with an open discussion about challenges and opportunities in mobile video related research, and associated potential future improvements.
Resumo:
The ability to identify and assess user engagement with transmedia productions is vital to the success of individual projects and the sustainability of this mode of media production as a whole. It is essential that industry players have access to tools and methodologies that offer the most complete and accurate picture of how audiences/users engage with their productions and which assets generate the most valuable returns of investment. Drawing upon research conducted with Hoodlum Entertainment, a Brisbane-based transmedia producer, this project involved an initial assessment of the way engagement tends to be understood, why standard web analytics tools are ill-suited to measuring it, how a customised tool could offer solutions, and why this question of measuring engagement is so vital to the future of transmedia as a sustainable industry. Working with data provided by Hoodlum Entertainment and Foxtel Marketing, the outcome of the study was a prototype for a custom data visualisation tool that allowed access, manipulation and presentation of user engagement data, both historic and predictive. The prototyped interfaces demonstrate how the visualization tool would collect and organise data specific to multiplatform projects by aggregating data across a number of platform reporting tools. Such a tool is designed to encompass not only platforms developed by the transmedia producer but also sites developed by fans. This visualisation tool accounted for multiplatform experience projects whose top level is comprised of people, platforms and content. People include characters, actors, audience, distributors and creators. Platforms include television, Facebook and other relevant social networks, literature, cinema and other media that might be included in the multiplatform experience. Content refers to discreet media texts employed within the platform, such as tweet, a You Tube video, a Facebook post, an email, a television episode, etc. Core content is produced by the creators’ multiplatform experiences to advance the narrative, while complimentary content generated by audience members offers further contributions to the experience. Equally important is the timing with which the components of the experience are introduced and how they interact with and impact upon each other. Being able to combine, filter and sort these elements in multiple ways we can better understand the value of certain components of a project. It also offers insights into the relationship between the timing of the release of components and user activity associated with them, which further highlights the efficacy (or, indeed, failure) of assets as catalysts for engagement. In collaboration with Hoodlum we have developed a number of design scenarios experimenting with the ways in which data can be visualised and manipulated to tell a more refined story about the value of user engagement with certain project components and activities. This experimentation will serve as the basis for future research.
Resumo:
The balance between player competence and the challenge presented by a task has been acknowledged as a major factor in providing optimal experience in video games. While Dynamic Difficulty Adjustment (DDA) presents methods for adjusting difficulty in real-time during singleplayer games, little research has explored its application in competitive multiplayer games where challenge is dictated by the competence of human opponents. By conducting a formal review of 180 existing competitive multiplayer games, it was found that a large number of modern games are utilizing DDA techniques to balance challenge between human opponents. From this data, we propose a preliminary framework for classifying Multiplayer Dynamic Difficulty Adjustment (mDDA) instances.
Resumo:
Social media platforms are of interest to interactive entertainment companies for a number of reasons. They can operate as a platform for deploying games, as a tool for communicating with customers and potential customers, and can provide analytics on how players utilize the; game providing immediate feedback on design decisions and changes. However, as ongoing research with Australian developer Halfbrick, creators of $2 , demonstrates, the use of these platforms is not universally seen as a positive. The incorporation of Big Data into already innovative development practices has the potential to cause tension between designers, whilst the platform also challenges the traditional business model, relying on micro-transactions rather than an up-front payment and a substantial shift in design philosophy to take advantage of the social aspects of platforms such as Facebook.
Resumo:
Facial expression recognition (FER) systems must ultimately work on real data in uncontrolled environments although most research studies have been conducted on lab-based data with posed or evoked facial expressions obtained in pre-set laboratory environments. It is very difficult to obtain data in real-world situations because privacy laws prevent unauthorized capture and use of video from events such as funerals, birthday parties, marriages etc. It is a challenge to acquire such data on a scale large enough for benchmarking algorithms. Although video obtained from TV or movies or postings on the World Wide Web may also contain ‘acted’ emotions and facial expressions, they may be more ‘realistic’ than lab-based data currently used by most researchers. Or is it? One way of testing this is to compare feature distributions and FER performance. This paper describes a database that has been collected from television broadcasts and the World Wide Web containing a range of environmental and facial variations expected in real conditions and uses it to answer this question. A fully automatic system that uses a fusion based approach for FER on such data is introduced for performance evaluation. Performance improvements arising from the fusion of point-based texture and geometry features, and the robustness to image scale variations are experimentally evaluated on this image and video dataset. Differences in FER performance between lab-based and realistic data, between different feature sets, and between different train-test data splits are investigated.
Resumo:
Video stimulated recall interviewing is a research technique in which subjects view a video sequence of their behaviour and are then invited to reflect on their decision-making processes during the videoed event. Despite its popularity, this technique raises methodological issues for researchers, particularly novice researchers in education. The paper reports that while stimulated recall is a valuable technique for investigating decision making processes in relation to specific events, it is not a technique that lends itself as a universal technique for research. This paper recounts one study in educational research where stimulated recall interview was used successfully as a useful tool for collecting data with an adapted version of SRI procedure.
Resumo:
Using Media-Access-Control (MAC) address for data collection and tracking is a capable and cost effective approach as the traditional ways such as surveys and video surveillance have numerous drawbacks and limitations. Positioning cell-phones by Global System for Mobile communication was considered an attack on people's privacy. MAC addresses just keep a unique log of a WiFi or Bluetooth enabled device for connecting to another device that has not potential privacy infringements. This paper presents the use of MAC address data collection approach for analysis of spatio-temporal dynamics of human in terms of shared space utilization. This paper firstly discuses the critical challenges and key benefits of MAC address data as a tracking technology for monitoring human movement. Here, proximity-based MAC address tracking is postulated as an effective methodology for analysing the complex spatio-temporal dynamics of human movements at shared zones such as lounge and office areas. A case study of university staff lounge area is described in detail and results indicates a significant added value of the methodology for human movement tracking. By analysis of MAC address data in the study area, clear statistics such as staff’s utilisation frequency, utilisation peak periods, and staff time spent is obtained. The analyses also reveal staff’s socialising profiles in terms of group and solo gathering. The paper is concluded with a discussion on why MAC address tracking offers significant advantages for tracking human behaviour in terms of shared space utilisation with respect to other and more prominent technologies, and outlines some of its remaining deficiencies.
Resumo:
Design process phases of development, evaluation and implementation were used to create a garment to simultaneously collect reliable data of speech production and intensity of movement of toddlers (18-36 months). A series of prototypes were developed and evaluated that housed accelerometer-based motion sensors and a digital transmitter with microphone. The approved test garment was a top constructed from loop-faced fabric with interior pockets to house devices. Extended side panels allowed for sizing. In total, 56 toddlers (28 male; 28 female; 16-36 months of age) participated in the study providing pilot and baseline data. The test garment was effective in collecting data as evaluated for accuracy and reliability using ANOVA for accelerometer data, transcription of video for type of movement, and number and length of utterances for speech production. The data collection garment has been implemented in various studies across disciplines.
Resumo:
The session explores the potential for “Patron Driven Acquisition” (PDA) as a model for the acquisition of online video. Today, PDA has become a standard model of acquisition in the eBook market, more effectively aligning spend with use and increased return on investment (ROI). PDA is an unexplored model for acquisition of video, for which library collection development is complicated by higher storage and delivery costs, labor overheads for content selection and acquisition, and a dynamic film industry in which media and the technology that supports it is changing daily. Queensland University of Technology (QUT) and La Trobe University in Australia launched a research project in collaboration with Kanopy to explore the opportunity for PDA of video. The study relied on three data sources: (1) national surveys to compare the video purchasing and use practices of colleges, (2) on-campus pilot projects of PDA models to assess user engagement and behavior, and (3) testing of various user applications and features to support the model. The study incorporates usage statistics and survey data and builds upon a peer-reviewed research paper presented at the VALA 2014 conference in Melbourne, Australia. This session will be conducted by the researchers and will graphically present the results from the study. It will map out a future for video PDA, and how libraries can more cost-effectively acquire and maximize the discoverability of online video. The presenters will also solicit input and welcome questions from audience members.
Resumo:
This paper describes a safety data recording and analysis system that has been developed to capture safety occurrences including precursors using high-definition forward-facing video from train cabs and data from other train-borne systems. The paper describes the data processing model and how events detected through data analysis are related to an underlying socio-technical model of accident causation. The integrated approach to safety data recording and analysis insures systemic factors that condition, influence or potentially contribute to an occurrence are captured both for safety occurrences and precursor events, providing a rich tapestry of antecedent causal factors that can significantly improve learning around accident causation. This can ultimately provide benefit to railways through the development of targeted and more effective countermeasures, better risk models and more effective use and prioritization of safety funds. Level crossing occurrences are a key focus in this paper with data analysis scenarios describing causal factors around near-miss occurrences. The paper concludes with a discussion on how the system can also be applied to other types of railway safety occurrences.
Resumo:
This study constructs performance prediction models to estimate the end-user perceived video quality on mobile devices for the latest video encoding techniques –VP9 and H.265. Both subjective and objective video quality assessments were carried out for collecting data and selecting the most desirable predictors. Using statistical regression, two models were generated to achieve 94.5% and 91.5% of prediction accuracies respectively, depending on whether the predictor derived from the objective assessment is involved. These proposed models can be directly used by media industries for video quality estimation, and will ultimately help them to ensure a positive end-user quality of experience on future mobile devices after the adaptation of the latest video encoding technologies.
Resumo:
Viewer interests, evoked by video content, can potentially identify the highlights of the video. This paper explores the use of facial expressions (FE) and heart rate (HR) of viewers captured using camera and non-strapped sensor for identifying interesting video segments. The data from ten subjects with three videos showed that these signals are viewer dependent and not synchronized with the video contents. To address this issue, new algorithms are proposed to effectively combine FE and HR signals for identifying the time when viewer interest is potentially high. The results show that, compared with subjective annotation and match report highlights, ‘non-neutral’ FE and ‘relatively higher and faster’ HR is able to capture 60%-80% of goal, foul, and shot-on-goal soccer video events. FE is found to be more indicative than HR of viewer’s interests, but the fusion of these two modalities outperforms each of them.
Resumo:
In recent years, rapid advances in information technology have led to various data collection systems which are enriching the sources of empirical data for use in transport systems. Currently, traffic data are collected through various sensors including loop detectors, probe vehicles, cell-phones, Bluetooth, video cameras, remote sensing and public transport smart cards. It has been argued that combining the complementary information from multiple sources will generally result in better accuracy, increased robustness and reduced ambiguity. Despite the fact that there have been substantial advances in data assimilation techniques to reconstruct and predict the traffic state from multiple data sources, such methods are generally data-driven and do not fully utilize the power of traffic models. Furthermore, the existing methods are still limited to freeway networks and are not yet applicable in the urban context due to the enhanced complexity of the flow behavior. The main traffic phenomena on urban links are generally caused by the boundary conditions at intersections, un-signalized or signalized, at which the switching of the traffic lights and the turning maneuvers of the road users lead to shock-wave phenomena that propagate upstream of the intersections. This paper develops a new model-based methodology to build up a real-time traffic prediction model for arterial corridors using data from multiple sources, particularly from loop detectors and partial observations from Bluetooth and GPS devices.