969 resultados para optical character recognition
Resumo:
Speaker verification is the process of verifying the identity of a person by analysing their speech. There are several important applications for automatic speaker verification (ASV) technology including suspect identification, tracking terrorists and detecting a person’s presence at a remote location in the surveillance domain, as well as person authentication for phone banking and credit card transactions in the private sector. Telephones and telephony networks provide a natural medium for these applications. The aim of this work is to improve the usefulness of ASV technology for practical applications in the presence of adverse conditions. In a telephony environment, background noise, handset mismatch, channel distortions, room acoustics and restrictions on the available testing and training data are common sources of errors for ASV systems. Two research themes were pursued to overcome these adverse conditions: Modelling mismatch and modelling uncertainty. To directly address the performance degradation incurred through mismatched conditions it was proposed to directly model this mismatch. Feature mapping was evaluated for combating handset mismatch and was extended through the use of a blind clustering algorithm to remove the need for accurate handset labels for the training data. Mismatch modelling was then generalised by explicitly modelling the session conditions as a constrained offset of the speaker model means. This session variability modelling approach enabled the modelling of arbitrary sources of mismatch, including handset type, and halved the error rates in many cases. Methods to model the uncertainty in speaker model estimates and verification scores were developed to address the difficulties of limited training and testing data. The Bayes factor was introduced to account for the uncertainty of the speaker model estimates in testing by applying Bayesian theory to the verification criterion, with improved performance in matched conditions. Modelling the uncertainty in the verification score itself met with significant success. Estimating a confidence interval for the "true" verification score enabled an order of magnitude reduction in the average quantity of speech required to make a confident verification decision based on a threshold. The confidence measures developed in this work may also have significant applications for forensic speaker verification tasks.
Resumo:
The effective daylighting of multistorey commercial building interiors poses an interesting problem for designers in Australia’s tropical and subtropical context. Given that a building exterior receives adequate sun and skylight as dictated by location-specific factors such as weather, siting and external obstructions; then the availability of daylight throughout its interior is dependant on certain building characteristics: the distance from a window façade (room depth), ceiling or window head height, window size and the visible transmittance of daylighting apertures. The daylighting of general stock, multistorey commercial buildings is made difficult by their design limitations with respect to some of these characteristics. The admission of daylight to these interiors is usually exclusively by vertical windows. Using conventional glazing, such windows can only admit sun and skylight to a depth of approximately 2 times the window height. This penetration depth is typically much less than the depth of the office interiors, so that core areas of these buildings receive little or no daylight. This issue is particularly relevant where deep, open plan office layouts prevail. The resulting interior daylight pattern is a relatively narrow perimeter zone bathed in (sometimes too intense) light, contrasted with a poorly daylit core zone. The broad luminance range this may present to a building occupant’s visual field can be a source of discomfort glare. Furthermore, the need in most tropical and subtropical regions to restrict solar heat gains to building interiors for much of the year has resulted in the widespread use of heavily tinted or reflective glazing on commercial building façades. This strategy reduces the amount of solar radiation admitted to the interior, thereby decreasing daylight levels proportionately throughout. However this technique does little to improve the way light is distributed throughout the office space. Where clear skies dominate weather conditions, at different times of day or year direct sunlight may pass unobstructed through vertical windows causing disability or discomfort glare for building occupants and as such, its admission to an interior must be appropriately controlled. Any daylighting system to be applied to multistorey commercial buildings must consider these design obstacles, and attempt to improve the distribution of daylight throughout these deep, sidelit office spaces without causing glare conditions. The research described in this thesis delineates first the design optimisation and then the actual prototyping and manufacture process of a daylighting device to be applied to such multistorey buildings in tropical and subtropical environments.
Resumo:
Probabilistic robotics, most often applied to the problem of simultaneous localisation and mapping (SLAM), requires measures of uncertainly to accompany observations of the environment. This paper describes how uncertainly can be characterised for a vision system that locates coloured landmark in a typical laboratory environment. The paper describes a model of the uncertainly in segmentation, the internal camera model and the mounting of the camera on the robot. It =plains the implementation of the system on a laboratory robot, and provides experimental results that show the coherence of the uncertainly model,
Resumo:
In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.
Resumo:
Uncooperative iris identification systems at a distance and on the move often suffer from poor resolution and poor focus of the captured iris images. The lack of pixel resolution and well-focused images significantly degrades the iris recognition performance. This paper proposes a new approach to incorporate the focus score into a reconstruction-based super-resolution process to generate a high resolution iris image from a low resolution and focus inconsistent video sequence of an eye. A reconstruction-based technique, which can incorporate middle and high frequency components from multiple low resolution frames into one desired super-resolved frame without introducing false high frequency components, is used. A new focus assessment approach is proposed for uncooperative iris at a distance and on the move to improve performance for variations in lighting, size and occlusion. A novel fusion scheme is then proposed to incorporate the proposed focus score into the super-resolution process. The experiments conducted on the The Multiple Biometric Grand Challenge portal database shows that our proposed approach achieves an EER of 2.1%, outperforming the existing state-of-the-art averaging signal-level fusion approach by 19.2% and the robust mean super-resolution approach by 8.7%.
Resumo:
Purpose: To investigate the short term influence of imposed monocular defocus upon human optical axial length (the distance from anterior cornea to retinal pigment epithelium) and ocular biometrics. Methods: Twenty-eight young adult subjects (14 myopes and 14 emmetropes) had eye biometrics measured before and then 30 and 60 minutes after exposure to monocular (right eye) defocus. Four different monocular defocus conditions were tested, each on a separate day: control (no defocus), myopic (+3 D defocus), hyperopic (-3 D defocus) and diffuse (0.2 density Bangerter filter) defocus. The fellow eye was optimally corrected (no defocus). Results: Imposed defocus caused small but significant changes in optical axial length (p<0.0001). A significant increase in optical axial length (mean change +8 ± 14 μm, p=0.03) occurred following hyperopic defocus, and a significant reduction in optical axial length (mean change -13 ± 14 μm, p=0.0001) was found following myopic defocus. A small increase in optical axial length was observed following diffuse defocus (mean change +6 ± 13 μm, p=0.053). Choroidal thickness also exhibited some significant changes with certain defocus conditions. No significant difference was found between myopes and emmetropes in the changes in optical axial length or choroidal thickness with defocus. Conclusions: Significant changes in optical axial length occur in human subjects following 60 minutes of monocular defocus. The bi-directional optical axial length changes observed in response to defocus implies the human visual system is capable of detecting the presence and sign of defocus and altering optical axial length to move the retina towards the image plane.
Resumo:
This chapter reports on research work that aims to overcome some limitations of conventional community engagement for urban planning. Adaptive and human-centred design approaches that are well established in human-computer interaction (such as personas and design scenarios) as well as creative writing and dramatic character development methods (such as the Stanislavsky System and the Meisner Technique) are yet largely unexplored in the rather conservative and long-term design context of urban planning. Based on these approaches, we have been trialling a set of performance based workshop activities to gain insights into participants’ desires and requirements that may inform the future design of apartments and apartment buildings in inner city Brisbane. The focus of these workshops is to analyse the behaviour and lifestyle of apartment dwellers and generate residential personas that become boundary objects in the cross-disciplinary discussions of urban design and planning teams. Dramatisation and embodied interaction of use cases form part of the strategies we employed to engage participants and elicit community feedback.
Resumo:
Voice recognition is one of the key enablers to reduce driver distraction as in-vehicle systems become more and more complex. With the integration of voice recognition in vehicles, safety and usability are improved as the driver’s eyes and hands are not required to operate system controls. Whilst speaker independent voice recognition is well developed, performance in high noise environments (e.g. vehicles) is still limited. La Trobe University and Queensland University of Technology have developed a low-cost hardware-based speech enhancement system for automotive environments based on spectral subtraction and delay–sum beamforming techniques. The enhancement algorithms have been optimised using authentic Australian English collected under typical driving conditions. Performance tests conducted using speech data collected under variety of vehicle noise conditions demonstrate a word recognition rate improvement in the order of 10% or more under the noisiest conditions. Currently developed to a proof of concept stage there is potential for even greater performance improvement.