291 resultados para Speaker Recognition
Resumo:
Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.
Resumo:
Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.
Resumo:
Purpose Optical blur and ageing are known to affect driving performance but their effects on drivers' eye movements are poorly understood. This study examined the effects of optical blur and age on eye movement patterns and performance on the DriveSafe slide recognition test which is purported to predict fitness to drive. Methods Twenty young (27.1 ± 4.6 years) and 20 older (73.3 ± 5.7 years) visually normal drivers performed the DriveSafe under two visual conditions: best-corrected vision and with +2.00 DS blur. The DriveSafe is a Visual Recognition Slide Test that consists of brief presentations of static, real-world driving scenes containing different road users (pedestrians, bicycles and vehicles). Participants reported the types, relative positions and direction of travel of the road users in each image; the score was the number of correctly reported items (maximum score of 128). Eye movements were recorded while participants performed the DriveSafe test using a Tobii TX300 eye tracking system. Results There was a significant main effect of blur on DriveSafe scores (best-corrected: 114.9 vs blur: 93.2; p < 0.001). There was also a significant age and blur interaction on the DriveSafe scores (p < 0.001) such that the young drivers were more negatively affected by blur than the older drivers (reductions of 22% and 13% respectively; p < 0.001): with best-corrected vision, the young drivers performed better than the older drivers (DriveSafe scores: 118.4 vs 111.5; p = 0.001), while with blur, the young drivers performed worse than the older drivers (88.6 vs 95.9; p = 0.009). For the eye movement patterns, blur significantly reduced the number of fixations on road users (best-corrected: 5.1 vs blur: 4.5; p < 0.001), fixation duration on road users (2.0 s vs 1.8 s; p < 0.001) and saccade amplitudes (7.4° vs 6.7°; p < 0.001). A main effect of age on eye movements was also found where older drivers made smaller saccades than the young drivers (6.7° vs 7.4°; p < 0.001). Conclusions Blur reduced DriveSafe scores for both age groups and this effect was greater for the young drivers. The decrease in number of fixations and fixation duration on road users, as well as the reduction in saccade amplitudes under the blurred condition, highlight the difficulty experienced in performing the task in the presence of optical blur, which suggests that uncorrected refractive errors may have a detrimental impact on aspects of driving performance.
Resumo:
Automatic speech recognition from multiple distant micro- phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.
Resumo:
Highly efficient loading of bone morphogenetic protein-2 (BMP-2) onto carriers with desirable performance is still a major challenge in the field of bone regeneration. Till now, the nanoscaled surface-induced changes of the structure and bioactivity of BMP-2 remains poorly understood. Here, the effect of nanoscaled surface on the adsorption and bioactivity of BMP-2 was investigated with a series of hydroxyapatite surfaces (HAPs): HAP crystal-coated surface (HAP), HAP crystal-coated polished surface (HAP-Pol), and sintered HAP crystal-coated surface (HAP-Sin). The adsorption dynamics of recombinant human BMP-2 (rhBMP-2) and the accessibility of the binding epitopes of adsorbed rhBMP-2 for BMP receptors (BMPRs) were examined by a quartz crystal microbalance with dissipation. Moreover, the bioactivity of adsorbed rhBMP-2 and the BMP-induced Smad signaling were investigated with C2C12 model cells. A noticeably high mass-uptake of rhBMP-2 and enhanced recognition of BMPR-IA to adsorbed rhBMP-2 were found on the HAP-Pol surface. For the rhBMP-2-adsorbed HAPs, both ALP activity and Smad signaling increased in the order of HAP-Sin < HAP < HAP-Pol. Furthermore, hybrid molecular dynamics and steered molecular dynamics simulations validated that BMP-2 tightly anchored on the HAP-Pol surface with a relative loosened conformation, but the HAP-Sin surface induced a compact conformation of BMP-2. In conclusion, the nanostructured HAPs can modulate the way of adsorption of rhBMP-2, and thus the recognition of BMPR-IA and the bioactivity of rhBMP-2. These findings can provide insightful suggestions for the future design and fabrication of rhBMP-2-based scaffolds/implants.
Resumo:
Background Pollens of subtropical grasses, Bahia (Paspalum notatum), Johnson (Sorghum halepense), and Bermuda (Cynodon dactylon), are common causes of respiratory allergies in subtropical regions worldwide. Objective To evaluate IgE cross-reactivity of grass pollen (GP) found in subtropical and temperate areas. Methods Case and control serum samples from 83 individuals from the subtropical region of Queensland were tested for IgE reactivity with GP extracts by enzyme-linked immunosorbent assay. A randomly sampled subset of 21 serum samples from patients with subtropical GP allergy were examined by ImmunoCAP and cross-inhibition assays. Results Fifty-four patients with allergic rhinitis and GP allergy had higher IgE reactivity with P notatum and C dactylon than with a mixture of 5 temperate GPs. For 90% of 21 GP allergic serum samples, P notatum, S halepense, or C dactylon specific IgE concentrations were higher than temperate GP specific IgE, and GP specific IgE had higher correlations of subtropical GP (r = 0.771-0.950) than temperate GP (r = 0.317-0.677). In most patients (71%-100%), IgE with P notatum, S halepense, or C dactylon GPs was inhibited better by subtropical GP than temperate GP. When the temperate GP mixture achieved 50% inhibition of IgE with subtropical GP, there was a 39- to 67-fold difference in concentrations giving 50% inhibition and significant differences in maximum inhibition for S halepense and P notatum GP relative to temperate GP. Conclusion Patients living in a subtropical region had species specific IgE recognition of subtropical GP. Most GP allergic patients in Queensland would benefit from allergen specific immunotherapy with a standardized content of subtropical GP allergens.
Resumo:
We propose a novel multiview fusion scheme for recognizing human identity based on gait biometric data. The gait biometric data is acquired from video surveillance datasets from multiple cameras. Experiments on publicly available CASIA dataset show the potential of proposed scheme based on fusion towards development and implementation of automatic identity recognition systems.
What triggers problem recognition? An exploration on young Australian male problematic online gamers
Resumo:
Help-seeking is a complex decision-making process that first begins with problem recognition. However, little is understood about the conceptualisation of the helpseeking process and the triggers of problem recognition. This research proposes the use of the Critical Incident Technique (CIT) to examine and classify incidents that serve as key triggers of problem recognition among young Australian male problematic online gamers. The research provides a classification of five different types of triggers that will aid social marketers into developing effective early detection, prevention and treatment focused social marketing interventions.
Resumo:
Pattern recognition is a promising approach for the identification of structural damage using measured dynamic data. Much of the research on pattern recognition has employed artificial neural networks (ANNs) and genetic algorithms as systematic ways of matching pattern features. The selection of a damage-sensitive and noise-insensitive pattern feature is important for all structural damage identification methods. Accordingly, a neural networks-based damage detection method using frequency response function (FRF) data is presented in this paper. This method can effectively consider uncertainties of measured data from which training patterns are generated. The proposed method reduces the dimension of the initial FRF data and transforms it into new damage indices and employs an ANN method for the actual damage localization and quantification using recognized damage patterns from the algorithm. In civil engineering applications, the measurement of dynamic response under field conditions always contains noise components from environmental factors. In order to evaluate the performance of the proposed strategy with noise polluted data, noise contaminated measurements are also introduced to the proposed algorithm. ANNs with optimal architecture give minimum training and testing errors and provide precise damage detection results. In order to maximize damage detection results, the optimal architecture of ANN is identified by defining the number of hidden layers and the number of neurons per hidden layer by a trial and error method. In real testing, the number of measurement points and the measurement locations to obtain the structure response are critical for damage detection. Therefore, optimal sensor placement to improve damage identification is also investigated herein. A finite element model of a two storey framed structure is used to train the neural network. It shows accurate performance and gives low error with simulated and noise-contaminated data for single and multiple damage cases. As a result, the proposed method can be used for structural health monitoring and damage detection, particularly for cases where the measurement data is very large. Furthermore, it is suggested that an optimal ANN architecture can detect damage occurrence with good accuracy and can provide damage quantification with reasonable accuracy under varying levels of damage.
Resumo:
In the field of face recognition, sparse representation (SR) has received considerable attention during the past few years, with a focus on holistic descriptors in closed-set identification applications. The underlying assumption in such SR-based methods is that each class in the gallery has sufficient samples and the query lies on the subspace spanned by the gallery of the same class. Unfortunately, such an assumption is easily violated in the face verification scenario, where the task is to determine if two faces (where one or both have not been seen before) belong to the same person. In this study, the authors propose an alternative approach to SR-based face verification, where SR encoding is performed on local image patches rather than the entire face. The obtained sparse signals are pooled via averaging to form multiple region descriptors, which then form an overall face descriptor. Owing to the deliberate loss of spatial relations within each region (caused by averaging), the resulting descriptor is robust to misalignment and various image deformations. Within the proposed framework, they evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder Neural Network (SANN) and an implicit probabilistic technique based on Gaussian mixture models. Thorough experiments on AR, FERET, exYaleB, BANCA and ChokePoint datasets show that the local SR approach obtains considerably better and more robust performance than several previous state-of-the-art holistic SR methods, on both the traditional closed-set identification task and the more applicable face verification task. The experiments also show that l1-minimisation-based encoding has a considerably higher computational cost when compared with SANN-based and probabilistic encoding, but leads to higher recognition rates.
Resumo:
This thesis investigates the use of fusion techniques and mathematical modelling to increase the robustness of iris recognition systems against iris image quality degradation, pupil size changes and partial occlusion. The proposed techniques improve recognition accuracy and enhance security. They can be further developed for better iris recognition in less constrained environments that do not require user cooperation. A framework to analyse the consistency of different regions of the iris is also developed. This can be applied to improve recognition systems using partial iris images, and cancelable biometric signatures or biometric based cryptography for privacy protection.
Resumo:
This chapter interrogates what recognition of prior learning (RPL) can and does mean in the higher education sector—a sector in the grip of the widening participation agenda and an open access age. The chapter discusses how open learning is making inroads into recognition processes and examines two studies in open learning recognition. A case study relating to e-portfolio-style RPL for entry into a Graduate Certificate in Policy and Governance at a metropolitan university in Queensland is described. In the first instance, candidates who do not possess a relevant Bachelor degree need to demonstrate skills in governmental policy work in order to be eligible to gain entry to a Graduate Certificate (at Australian Qualifications Framework Level 8) (Australian Qualifications Framework Council, 2013, p. 53). The chapter acknowledges the benefits and limitations of recognition in open learning and those of more traditional RPL, anticipating future developments in both (or their convergence).
Resumo:
This research has made contributions to the area of spoken term detection (STD), defined as the process of finding all occurrences of a specified search term in a large collection of speech segments. The use of visual information in the form of lip movements of the speaker in addition to audio and the use of topic of the speech segments, and the expected frequency of words in the target speech domain, are proposed. By using these complementary information, improvement in the performance of STD has been achieved which enables efficient search of key words in large collection of multimedia documents.
Resumo:
This paper describes a vision-only system for place recognition in environments that are tra- versed at different times of day, when chang- ing conditions drastically affect visual appear- ance, and at different speeds, where places aren’t visited at a consistent linear rate. The ma- jor contribution is the removal of wheel-based odometry from the previously presented algo- rithm (SMART), allowing the technique to op- erate on any camera-based device; in our case a mobile phone. While we show that the di- rect application of visual odometry to our night- time datasets does not achieve a level of perfor- mance typically needed, the VO requirements of SMART are orthogonal to typical usage: firstly only the magnitude of the velocity is required, and secondly the calculated velocity signal only needs to be repeatable in any one part of the environment over day and night cycles, but not necessarily globally consistent. Our results show that the smoothing effect of motion constraints is highly beneficial for achieving a locally consis- tent, lighting-independent velocity estimate. We also show that the advantage of our patch-based technique used previously for frame recogni- tion, surprisingly, does not transfer to VO, where SIFT demonstrates equally good performance. Nevertheless, we present the SMART system us- ing only vision, which performs sequence-base place recognition in extreme low-light condi- tions where standard 6-DOF VO fails and that improves place recognition performance over odometry-less benchmarks, approaching that of wheel odometry.
Resumo:
Deep convolutional network models have dominated recent work in human action recognition as well as image classification. However, these methods are often unduly influenced by the image background, learning and exploiting the presence of cues in typical computer vision datasets. For unbiased robotics applications, the degree of variation and novelty in action backgrounds is far greater than in computer vision datasets. To address this challenge, we propose an “action region proposal” method that, informed by optical flow, extracts image regions likely to contain actions for input into the network both during training and testing. In a range of experiments, we demonstrate that manually segmenting the background is not enough; but through active action region proposals during training and testing, state-of-the-art or better performance can be achieved on individual spatial and temporal video components. Finally, we show by focusing attention through action region proposals, we can further improve upon the existing state-of-the-art in spatio-temporally fused action recognition performance.