52 resultados para audio-visual information
Resumo:
In order to use virtual reality as a sport analysis tool, we need to be sure that an immersed athlete reacts realistically in a virtual environment. This has been validated for a real handball goalkeeper facing a virtual thrower. However, we currently ignore which visual variables induce a realistic motor behavior of the immersed handball goalkeeper. In this study, we used virtual reality to dissociate the visual information related to the movements of the player from the visual information related to the trajectory of the ball. Thus, the aim is to evaluate the relative influence of these different visual information sources on the goalkeeper's motor behavior. We tested 10 handball goalkeepers who had to predict the final position of the virtual ball in the goal when facing the following: only the throwing action of the attacking player (TA condition), only the resulting ball trajectory (BA condition), and both the throwing action of the attacking player and the resulting ball trajectory (TB condition). Here we show that performance was better in the BA and TB conditions, but contrary to expectations, performance was substantially worse in the TA condition. A significant effect of ball landing zone does, however, suggest that the relative importance between visual information from the player and the ball depends on the targeted zone in the goal. In some cases, body-based cues embedded in the throwing actions may have a minor influence on the ball trajectory and vice versa. Kinematics analysis was then combined with these results to determine why such differences occur depending on the ball landing zone and consequently how it can clarify the role of different sources of visual information on the motor behavior of an athlete immersed in a virtual environment.
Resumo:
Proprioceptive information from the foot/ankle provides important information regarding body sway for balance control, especially in situations where visual information is degraded or absent. Given known increases in catastrophic injury due to falls with older age, understanding the neural basis of proprioceptive processing for balance control is particularly important for older adults. In the present study, we linked neural activity in response to stimulation of key foot proprioceptors (i.e., muscle spindles) with balance ability across the lifespan. Twenty young and 20 older human adults underwent proprioceptive mapping; foot tendon vibration was compared with vibration of a nearby bone in an fMRI environment to determine regions of the brain that were active in response to muscle spindle stimulation. Several body sway metrics were also calculated for the same participants on an eyes-closed balance task. Based on regression analyses, multiple clusters of voxels were identified showing a significant relationship between muscle spindle stimulation-induced neural activity and maximum center of pressure excursion in the anterior-posterior direction. In this case, increased activation was associated with greater balance performance in parietal, frontal, and insular cortical areas, as well as structures within the basal ganglia. These correlated regions were age- and foot-stimulation side-independent and largely localized to right-sided areas of the brain thought to be involved in monitoring stimulus-driven shifts of attention. These findings support the notion that, beyond fundamental peripheral reflex mechanisms, central processing of proprioceptive signals from the foot is critical for balance control.
Resumo:
Catching a ball involves a dynamic transformation of visual information about ball motion into motor commands for moving the hand to the right place at the right time. We previously formulated a neural model for this transformation to account for the consistent leftward movement biases observed in our catching experiments. According to the model, these biases arise within the representation of target motion as well as within the transformation from a gaze-centered to a body-centered movement command. Here, we examine the validity of the latter aspect of our model in a catching task involving gaze fixation. Gaze fixation should systematically influence biases in catching movements, because in the model movement commands are only generated in the direction perpendicular to the gaze direction. Twelve participants caught balls while gazing at a fixation point positioned either straight ahead or 14 degrees to the right. Four participants were excluded because they could not adequately maintain fixation. We again observed a consistent leftward movement bias, but the catching movements were unaffected by fixation direction. This result refutes our proposal that the leftward bias partly arises within the visuomotor transformation, and suggests instead that the bias predominantly arises within the early representation of target motion, specifically through an imbalance in the represented radial and azimuthal target motion.
Resumo:
A software system, recently developed by the authors for the efficient capturing, editing, and delivery of audio-visual web lectures, was used to create a series of lectures for a first-year undergraduate course in Dynamics. These web lectures were developed to serve as an extra study resource for students attending lectures and not as a replacement. A questionnaire was produced to obtain feedback from students. The overall response was very favorable and numerous requests were made for other lecturers to adopt this technology. Despite the students' approval of this added resource, there was no significant improvement in overall examination performance
Resumo:
For the first time in this paper we present results showing the effect of speaker head pose angle on automatic lip-reading performance over a wide range of closely spaced angles. We analyse the effect head pose has upon the features themselves and show that by selecting coefficients with minimum variance w.r.t. pose angle, recognition performance can be improved when train-test pose angles differ. Experiments are conducted using the initial phase of a unique multi view Audio-Visual database designed specifically for research and development of pose-invariant lip-reading systems. We firstly show that it is the higher order horizontal spatial frequency components that become most detrimental as the pose deviates. Secondly we assess the performance of different feature selection masks across a range of pose angles including a new mask based on Minimum Cross-Pose Variance coefficients. We report a relative improvement of 50% in Word Error Rate when using our selection mask over a common energy based selection during profile view lip-reading.
Resumo:
This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.
Resumo:
Existing referencing systems frequently prove inadequate for the citation of moving image and sound media such as vidcasts, streaming television, sound files, un-catalogued archive footage, amateur content hosted online or
non-broadcast radio recordings. Back in 2009 and 2010 a British working group funded by Higher Education Funding Council for England (HEFCE) and co-ordinated by the British Universities Film and Video Council investigated this problem. This report documents the early stages of the project.
Resumo:
This paper presents a new perceptual watermarking model for Discrete Shearlet transform (DST). DST provides the optimal representation [10] of the image features based on multi-resolution and multi-directional analysis. This property can be exploited on for watermark embedding to achieve the watermarking imperceptibility by introducing the human visual system using Chou’s model. In this model, a spatial JND profile is adapted to fit the sub-band structure. The combination of DST and the Just-Noticeable Distortion (JND) profile improves the levels of robustness against certain attacks while minimizing the distortion; by assigning a visibility threshold of distortion to each DST sub-band coefficient in the case of grey scale image watermarking.
Resumo:
This paper presents a new type of Flexible Macroblock Ordering (FMO) type for the H.264 Advanced Video Coding (AVC) standard, which can more efficiently flag the position and shape of regions of interest (ROIs) in each frame. In H.264/AVC, 7 types of FMO have been defined, all of which are designed for error resilience. Most previous work related to ROI processing has adopted Type-2 (foreground & background), or Type-6 (explicit), to flag the position and shape of the ROI. However, only rectangular shapes are allowed in Type-2 and for non-rectangular shapes, the non-ROI macroblocks may be wrongly flagged as being within the ROI, which could seriously affect subsequent processing of the ROI. In Type-6, each macroblock in a frame uses fixed-length bits to indicate to its slice group. In general, each ROI is assigned to one slice group identity. Although this FMO type can more accurately flag the position and shape of the ROI, it incurs a significant bitrate overhead. The proposed new FMO type uses the smallest rectangle that covers the ROI to indicate its position and a spiral binary mask is employed within the rectangle to indicate the shape of the ROI. This technique can accurately flag the ROI and provide significantly savings in the bitrate overhead. Compared with Type-6, an 80% to 90% reduction in the bitrate overhead can be obtained while achieving the same accuracy.
Resumo:
This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy.
Resumo:
Through the concept of sonic resonance, the project Cidade Museu – Museum City explores five derelict or transitional spaces in the city of Viseu. The activation and capture of these spaces develops an audio- visual memory that reflects architectures, stories and experiences, while creating a sense of place through sounds and images.
The project brings together musicians with a background in contemporary music, electroacoustic music and improvisation and a visual artist focusing on photography and video.
Each member of the collective explores the selected spaces in order to activate them with the help of their respective instruments and through sound projection in an iterative process in which the source of activation gradually gives way to the characteristics of each space, their resonances and acoustic characteristics. The museum city (a nickname for the city of Viseu), in this performance, exposes the contrast between the grandeur and multi-faceted architecture of Viseu’s Cathedral with spaces that spread throughout the city waiting for a new future.
The performance in the Cathedral (Sé) is characterised by a trio ensemble, an eight channel sound system and video projecting audio recordings and images made in each of the five spaces. The audience is invited to explore the relations between the various buildings and their stories while being immersed in their resonances and visual projections.
The performance explores the following spaces in Viseu: the old Orfeão (music hall), an old wine cellar, a mansion home to the national road services, a house with its grounds in Rua Silva Gaio and an old slaughterhouse.
Resumo:
Gait period estimation is an important step in the gait recognition framework. In this paper, we propose a new gait cycle detection method based on the angles of extreme points of both legs. In addition to that, to further improve the estimation of the gait period, the proposed algorithm divides the gait sequence into sections before identifying the maximum values. The proposed algorithm is scale invariant and less dependent on the silhouette shape. The performance of the proposed method was evaluated using the OU-ISIR speed variation gait database. The experimental results show that the proposed method achieved 90.2% gait recognition accuracy and outperforms previous methods found in the literature with the second best only achieved 67.65% accuracy.
Resumo:
This paper proposes a novel method of detecting packed executable files using steganalysis, primarily targeting the detection of obfuscated malware through packing. Considering that over 80% of malware in the wild is packed, detection accuracy and low false negative rates are important properties of malware detection methods. Experimental results outlined in this paper reveal that the proposed approach achieving an overall detection accuracy of greater than 99%, a false negative rate of 1% and a false positive rate of 0%.