938 resultados para Accuracy.


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the preliminary results in establishing a strategy for predicting Zenith Tropospheric Delay (ZTD) and relative ZTD (rZTD) between Continuous Operating Reference Stations (CORS) in near real-time. It is anticipated that the predicted ZTD or rZTD can assist the network-based Real-Time Kinematic (RTK) performance over long inter-station distances, ultimately, enabling a cost effective method of delivering precise positioning services to sparsely populated regional areas, such as Queensland. This research firstly investigates two ZTD solutions: 1) the post-processed IGS ZTD solution and 2) the near Real-Time ZTD solution. The near Real-Time solution is obtained through the GNSS processing software package (Bernese) that has been deployed for this project. The predictability of the near Real-Time Bernese solution is analyzed and compared to the post-processed IGS solution where it acts as the benchmark solution. The predictability analyses were conducted with various prediction time of 15, 30, 45, and 60 minutes to determine the error with respect to timeliness. The predictability of ZTD and relative ZTD is determined (or characterized) by using the previously estimated ZTD as the predicted ZTD of current epoch. This research has shown that both the ZTD and relative ZTD predicted errors are random in nature; the STD grows from a few millimeters to sub-centimeters while the predicted delay interval ranges from 15 to 60 minutes. Additionally, the RZTD predictability shows very little dependency on the length of tested baselines of up to 1000 kilometers. Finally, the comparison of near Real-Time Bernese solution with IGS solution has shown a slight degradation in the prediction accuracy. The less accurate NRT solution has an STD error of 1cm within the delay of 50 minutes. However, some larger errors of up to 10cm are observed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes the validity of a Gabor filter bank for feature extraction of solder joint images on Printed Circuit Boards (PCBs). A distance measure based on the Mahalanobis Cosine metric is also presented for classification of five different types of solder joints. From the experimental results, this methodology achieved high accuracy and a well generalised performance. This can be an effective method to reduce cost and improve quality in the production of PCBs in the manufacturing industry.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The eyelids play an important role in lubricating and protecting the surface of the eye. Each blink serves to spread fresh tears, remove debris and replenish the smooth optical surface of the eye. Yet little is known about how the eyelids contact the ocular surface and what pressure distribution exists between the eyelids and cornea. As the principal refractive component of the eye, the cornea is a major element of the eye’s optics. The optical properties of the cornea are known to be susceptible to the pressure exerted by the eyelids. Abnormal eyelids, due to disease, have altered pressure on the ocular surface due to changes in the shape, thickness or position of the eyelids. Normal eyelids also cause corneal distortions that are most often noticed when they are resting closer to the corneal centre (for example during reading). There were many reports of monocular diplopia after reading due to corneal distortion, but prior to videokeratoscopes these localised changes could not be measured. This thesis has measured the influence of eyelid pressure on the cornea after short-term near tasks and techniques were developed to quantify eyelid pressure and its distribution. The profile of the wave-like eyelid-induced corneal changes and the refractive effects of these distortions were investigated. Corneal topography changes due to both the upper and lower eyelids were measured for four tasks involving two angles of vertical downward gaze (20° and 40°) and two near work tasks (reading and steady fixation). After examining the depth and shape of the corneal changes, conclusions were reached regarding the magnitude and distribution of upper and lower eyelid pressure for these task conditions. The degree of downward gaze appears to alter the upper eyelid pressure on the cornea, with deeper changes occurring after greater angles of downward gaze. Although the lower eyelid was further from the corneal centre in large angles of downward gaze, its effect on the cornea was greater than that of the upper eyelid. Eyelid tilt, curvature, and position were found to be influential in the magnitude of eyelid-induced corneal changes. Refractively these corneal changes are clinically and optically significant with mean spherical and astigmatic changes of about 0.25 D after only 15 minutes of downward gaze (40° reading and steady fixation conditions). Due to the magnitude of these changes, eyelid pressure in downward gaze offers a possible explanation for some of the day-to-day variation observed in refraction. Considering the magnitude of these changes and previous work on their regression, it is recommended that sustained tasks performed in downward gaze should be avoided for at least 30 minutes before corneal and refractive assessment requiring high accuracy. Novel procedures were developed to use a thin (0.17 mm) tactile piezoresistive pressure sensor mounted on a rigid contact lens to measure eyelid pressure. A hydrostatic calibration system was constructed to convert raw digital output of the sensors to actual pressure units. Conditioning the sensor prior to use regulated the measurement response and sensor output was found to stabilise about 10 seconds after loading. The influences of various external factors on sensor output were studied. While the sensor output drifted slightly over several hours, it was not significant over the measurement time of 30 seconds used for eyelid pressure, as long as the length of the calibration and measurement recordings were matched. The error associated with calibrating at room temperature but measuring at ocular surface temperature led to a very small overestimation of pressure. To optimally position the sensor-contact lens combination under the eyelid margin, an in vivo measurement apparatus was constructed. Using this system, eyelid pressure increases were observed when the upper eyelid was placed on the sensor and a significant increase was apparent when the eyelid pressure was increased by pulling the upper eyelid tighter against the eye. For a group of young adult subjects, upper eyelid pressure was measured using this piezoresistive sensor system. Three models of contact between the eyelid and ocular surface were used to calibrate the pressure readings. The first model assumed contact between the eyelid and pressure sensor over more than the pressure cell width of 1.14 mm. Using thin pressure sensitive carbon paper placed under the eyelid, a contact imprint was measured and this width used for the second model of contact. Lastly as Marx’s line has been implicated as the region of contact with the ocular surface, its width was measured and used as the region of contact for the third model. The mean eyelid pressures calculated using these three models for the group of young subjects were 3.8 ± 0.7 mmHg (whole cell), 8.0 ± 3.4 mmHg (imprint width) and 55 ± 26 mmHg (Marx’s line). The carbon imprints using Pressurex-micro confirmed previous suggestions that a band of the eyelid margin has primary contact with the ocular surface and provided the best estimate of the contact region and hence eyelid pressure. Although it is difficult to directly compare the results with previous eyelid pressure measurement attempts, the eyelid pressure calculated using this model was slightly higher than previous manometer measurements but showed good agreement with the eyelid force estimated using an eyelid tensiometer. The work described in this thesis has shown that the eyelids have a significant influence on corneal shape, even after short-term tasks (15 minutes). Instrumentation was developed using piezoresistive sensors to measure eyelid pressure. Measurements for the upper eyelid combined with estimates of the contact region between the cornea and the eyelid enabled quantification of the upper eyelid pressure for a group of young adult subjects. These techniques will allow further investigation of the interaction between the eyelids and the surface of the eye.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The wavelet packet transform decomposes a signal into a set of bases for time–frequency analysis. This decomposition creates an opportunity for implementing distributed data mining where features are extracted from different wavelet packet bases and served as feature vectors for applications. This paper presents a novel approach for integrated machine fault diagnosis based on localised wavelet packet bases of vibration signals. The best basis is firstly determined according to its classification capability. Data mining is then applied to extract features and local decisions are drawn using Bayesian inference. A final conclusion is reached using a weighted average method in data fusion. A case study on rolling element bearing diagnosis shows that this approach can greatly improve the accuracy ofdiagno sis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Soft biometrics are characteristics that can be used to describe, but not uniquely identify an individual. These include traits such as height, weight, gender, hair, skin and clothing colour. Unlike traditional biometrics (i.e. face, voice) which require cooperation from the subject, soft biometrics can be acquired by surveillance cameras at range without any user cooperation. Whilst these traits cannot provide robust authentication, they can be used to provide coarse authentication or identification at long range, locate a subject who has been previously seen or who matches a description, as well as aid in object tracking. In this paper we propose three part (head, torso, legs) height and colour soft biometric models, and demonstrate their verification performance on a subset of the PETS 2006 database. We show that these models, whilst not as accurate as traditional biometrics, can still achieve acceptable rates of accuracy in situations where traditional biometrics cannot be applied.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The load–frequency control (LFC) problem has been one of the major subjects in a power system. In practice, LFC systems use proportional–integral (PI) controllers. However since these controllers are designed using a linear model, the non-linearities of the system are not accounted for and they are incapable of gaining good dynamical performance for a wide range of operating conditions in a multi-area power system. A strategy for solving this problem because of the distributed nature of a multi-area power system is presented by using a multi-agent reinforcement learning (MARL) approach. It consists of two agents in each power area; the estimator agent provides the area control error (ACE) signal based on the frequency bias estimation and the controller agent uses reinforcement learning to control the power system in which genetic algorithm optimisation is used to tune its parameters. This method does not depend on any knowledge of the system and it admits considerable flexibility in defining the control objective. Also, by finding the ACE signal based on the frequency bias estimation the LFC performance is improved and by using the MARL parallel, computation is realised, leading to a high degree of scalability. Here, to illustrate the accuracy of the proposed approach, a three-area power system example is given with two scenarios.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The following paper presents an evaluation of airborne sensors for use in vegetation management in powerline corridors. Three integral stages in the management process are addressed including, the detection of trees, relative positioning with respect to the nearest powerline and vegetation height estimation. Image data, including multi-spectral and high resolution, are analyzed along with LiDAR data captured from fixed wing aircraft. Ground truth data is then used to establish the accuracy and reliability of each sensor thus providing a quantitative comparison of sensor options. Tree detection was achieved through crown delineation using a Pulse-Coupled Neural Network (PCNN) and morphologic reconstruction applied to multi-spectral imagery. Through testing it was shown to achieve a detection rate of 96%, while the accuracy in segmenting groups of trees and single trees correctly was shown to be 75%. Relative positioning using LiDAR achieved a RMSE of 1.4m and 2.1m for cross track distance and along track position respectively, while Direct Georeferencing achieved RMSE of 3.1m in both instances. The estimation of pole and tree heights measured with LiDAR had a RMSE of 0.4m and 0.9m respectively, while Stereo Matching achieved 1.5m and 2.9m. Overall a small number of poles were missed with detection rates of 98% and 95% for LiDAR and Stereo Matching.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The performance of iris recognition systems is significantly affected by the segmentation accuracy, especially in non- ideal iris images. This paper proposes an improved method to localise non-circular iris images quickly and accurately. Shrinking and expanding active contour methods are consolidated when localising inner and outer iris boundaries. First, the pupil region is roughly estimated based on histogram thresholding and morphological operations. There- after, a shrinking active contour model is used to precisely locate the inner iris boundary. Finally, the estimated inner iris boundary is used as an initial contour for an expanding active contour scheme to find the outer iris boundary. The proposed scheme is robust in finding exact the iris boundaries of non-circular and off-angle irises. In addition, occlusions of the iris images from eyelids and eyelashes are automatically excluded from the detected iris region. Experimental results on CASIA v3.0 iris databases indicate the accuracy of proposed technique.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Purpose: To investigate whether wearing different presbyopic vision corrections alters the pattern of eye and head movements when viewing dynamic driving-related traffic scenes. Methods: Participants included 20 presbyopes (mean age: 56±5.7 years) who had no experience of wearing presbyopic vision corrections (i.e. all were single vision wearers). Eye and head movements were recorded while wearing five different vision corrections: single vision lenses (SV), progressive addition spectacle lenses (PALs), bifocal spectacle lenses (BIF), monovision (MV) and multifocal contact lenses (MTF CL) in random order. Videotape recordings of traffic scenes of suburban roads and expressways (with edited targets) were presented as dynamic driving-related stimuli and digital numeric display panels included as near visual stimuli (simulating speedometer and radio). Eye and head movements were recorded using the faceLAB™ system and the accuracy of target identification was also recorded. Results: The magnitude of eye movements while viewing the driving-related traffic scenes was greater when wearing BIF and PALs than MV and MTF CL (p≤0.013). The magnitude of head movements was greater when wearing SV, BIF and PALs than MV and MTF CL (p<0.0001) and the number of saccades was significantly higher for BIF and PALs than MV (p≤0.043). Target recognition accuracy was poorer for all vision corrections when the near stimulus was located at eccentricities inferiorly and to the left, rather than directly below the primary position of gaze (p=0.008), and PALs gave better performance than MTF CL (p=0.043). Conclusions: Different presbyopic vision corrections alter eye and head movement patterns. In particular, the larger magnitude of eye and head movements and greater number of saccades associated with the spectacle presbyopic corrections, may impact on driving performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The accuracy of cause-of-death statistics substantially depends on the quality of cause-of-death information in death certificates, primarily completed by medical doctors. Deficiencies in cause-of-death certification have been observed across the world, and over time. Despite educational interventions targeting to improve the quality of death certification, their intended impacts are rarely evaluated. This review aims to provide empirical evidence that could guide the modification of existing educational programs, or the development of new interventions, which are necessary to improve the capacity of certifiers as well as the quality of cause-of-death certification, and thereby, the quality of mortality statistics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Road curves are an important feature of road infrastructure and many serious crashes occur on road curves. In Queensland, the number of fatalities is twice as many on curves as that on straight roads. Therefore, there is a need to reduce drivers’ exposure to crash risk on road curves. Road crashes in Australia and in the Organisation for Economic Co-operation and Development(OECD) have plateaued in the last five years (2004 to 2008) and the road safety community is desperately seeking innovative interventions to reduce the number of crashes. However, designing an innovative and effective intervention may prove to be difficult as it relies on providing theoretical foundation, coherence, understanding, and structure to both the design and validation of the efficiency of the new intervention. Researchers from multiple disciplines have developed various models to determine the contributing factors for crashes on road curves with a view towards reducing the crash rate. However, most of the existing methods are based on statistical analysis of contributing factors described in government crash reports. In order to further explore the contributing factors related to crashes on road curves, this thesis designs a novel method to analyse and validate these contributing factors. The use of crash claim reports from an insurance company is proposed for analysis using data mining techniques. To the best of our knowledge, this is the first attempt to use data mining techniques to analyse crashes on road curves. Text mining technique is employed as the reports consist of thousands of textual descriptions and hence, text mining is able to identify the contributing factors. Besides identifying the contributing factors, limited studies to date have investigated the relationships between these factors, especially for crashes on road curves. Thus, this study proposed the use of the rough set analysis technique to determine these relationships. The results from this analysis are used to assess the effect of these contributing factors on crash severity. The findings obtained through the use of data mining techniques presented in this thesis, have been found to be consistent with existing identified contributing factors. Furthermore, this thesis has identified new contributing factors towards crashes and the relationships between them. A significant pattern related with crash severity is the time of the day where severe road crashes occur more frequently in the evening or night time. Tree collision is another common pattern where crashes that occur in the morning and involves hitting a tree are likely to have a higher crash severity. Another factor that influences crash severity is the age of the driver. Most age groups face a high crash severity except for drivers between 60 and 100 years old, who have the lowest crash severity. The significant relationship identified between contributing factors consists of the time of the crash, the manufactured year of the vehicle, the age of the driver and hitting a tree. Having identified new contributing factors and relationships, a validation process is carried out using a traffic simulator in order to determine their accuracy. The validation process indicates that the results are accurate. This demonstrates that data mining techniques are a powerful tool in road safety research, and can be usefully applied within the Intelligent Transport System (ITS) domain. The research presented in this thesis provides an insight into the complexity of crashes on road curves. The findings of this research have important implications for both practitioners and academics. For road safety practitioners, the results from this research illustrate practical benefits for the design of interventions for road curves that will potentially help in decreasing related injuries and fatalities. For academics, this research opens up a new research methodology to assess crash severity, related to road crashes on curves.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study is the first to investigate the effect of prolonged reading on reading performance and visual functions in students with low vision. The study focuses on one of the most common modes of achieving adequate magnification for reading by students with low vision, their close reading distance (proximal or relative distance magnification). Close reading distances impose high demands on near visual functions, such as accommodation and convergence. Previous research on accommodation in children with low vision shows that their accommodative responses are reduced compared to normal vision. In addition, there is an increased lag of accommodation for higher stimulus levels as may occur at close reading distance. Reduced accommodative responses in low vision and higher lag of accommodation at close reading distances together could impact on reading performance of students with low vision especially during prolonged reading tasks. The presence of convergence anomalies could further affect reading performance. Therefore, the aims of the present study were 1) To investigate the effect of prolonged reading on reading performance in students with low vision 2) To investigate the effect of prolonged reading on visual functions in students with low vision. This study was conducted as cross-sectional research on 42 students with low vision and a comparison group of 20 students with normal vision, aged 7 to 20 years. The students with low vision had vision impairments arising from a range of causes and represented a typical group of students with low vision, with no significant developmental delays, attending school in Brisbane, Australia. All participants underwent a battery of clinical tests before and after a prolonged reading task. An initial reading-specific history and pre-task measurements that included Bailey-Lovie distance and near visual acuities, Pelli-Robson contrast sensitivity, ocular deviations, sensory fusion, ocular motility, near point of accommodation (pull-away method), accuracy of accommodation (Monocular Estimation Method (MEM)) retinoscopy and Near Point of Convergence (NPC) (push-up method) were recorded for all participants. Reading performance measures were Maximum Oral Reading Rates (MORR), Near Text Visual Acuity (NTVA) and acuity reserves using Bailey-Lovie text charts. Symptoms of visual fatigue were assessed using the Convergence Insufficiency Symptom Survey (CISS) for all participants. Pre-task measurements of reading performance and accuracy of accommodation and NPC were compared with post-task measurements, to test for any effects of prolonged reading. The prolonged reading task involved reading a storybook silently for at least 30 minutes. The task was controlled for print size, contrast, difficulty level and content of the reading material. Silent Reading Rate (SRR) was recorded every 2 minutes during prolonged reading. Symptom scores and visual fatigue scores were also obtained for all participants. A visual fatigue analogue scale (VAS) was used to assess visual fatigue during the task, once at the beginning, once at the middle and once at the end of the task. In addition to the subjective assessments of visual fatigue, tonic accommodation was monitored using a photorefractor (PlusoptiX CR03™) every 6 minutes during the task, as an objective assessment of visual fatigue. Reading measures were done at the habitual reading distance of students with low vision and at 25 cms for students with normal vision. The initial history showed that the students with low vision read for significantly shorter periods at home compared to the students with normal vision. The working distances of participants with low vision ranged from 3-25 cms and half of them were not using any optical devices for magnification. Nearly half of the participants with low vision were able to resolve 8-point print (1M) at 25 cms. Half of the participants in the low vision group had ocular deviations and suppression at near. Reading rates were significantly reduced in students with low vision compared to those of students with normal vision. In addition, there were a significantly larger number of participants in the low vision group who could not sustain the 30-minute task compared to the normal vision group. However, there were no significant changes in reading rates during or following prolonged reading in either the low vision or normal vision groups. Individual changes in reading rates were independent of their baseline reading rates, indicating that the changes in reading rates during prolonged reading cannot be predicted from a typical clinical assessment of reading using brief reading tasks. Contrary to previous reports the silent reading rates of the students with low vision were significantly lower than their oral reading rates, although oral and silent reading was assessed using different methods. Although the visual acuity, contrast sensitivity, near point of convergence and accuracy of accommodation were significantly poorer for the low vision group compared to those of the normal vision group, there were no significant changes in any of these visual functions following prolonged reading in either group. Interestingly, a few students with low vision (n =10) were found to be reading at a distance closer than their near point of accommodation. This suggests a decreased sensitivity to blur. Further evaluation revealed that the equivalent intrinsic refractive errors (an estimate of the spherical dioptirc defocus which would be expected to yield a patient’s visual acuity in normal subjects) were significantly larger for the low vision group compared to those of the normal vision group. As expected, accommodative responses were significantly reduced for the low vision group compared to the expected norms, which is consistent with their close reading distances, reduced visual acuity and contrast sensitivity. For those in the low vision group who had an accommodative error exceeding their equivalent intrinsic refractive errors, a significant decrease in MORR was found following prolonged reading. The silent reading rates however were not significantly affected by accommodative errors in the present study. Suppression also had a significant impact on the changes in reading rates during prolonged reading. The participants who did not have suppression at near showed significant decreases in silent reading rates during and following prolonged reading. This impact of binocular vision at near on prolonged reading was possibly due to the high demands on convergence. The significant predictors of MORR in the low vision group were age, NTVA, reading interest and reading comprehension, accounting for 61.7% of the variances in MORR. SRR was not significantly influenced by any factors, except for the duration of the reading task sustained; participants with higher reading rates were able to sustain a longer reading duration. In students with normal vision, age was the only predictor of MORR. Participants with low vision also reported significantly greater visual fatigue compared to the normal vision group. Measures of tonic accommodation however were little influenced by visual fatigue in the present study. Visual fatigue analogue scores were found to be significantly associated with reading rates in students with low vision and normal vision. However, the patterns of association between visual fatigue and reading rates were different for SRR and MORR. The participants with low vision with higher symptom scores had lower SRRs and participants with higher visual fatigue had lower MORRs. As hypothesized, visual functions such as accuracy of accommodation and convergence did have an impact on prolonged reading in students with low vision, for students whose accommodative errors were greater than their equivalent intrinsic refractive errors, and for those who did not suppress one eye. Those students with low vision who have accommodative errors higher than their equivalent intrinsic refractive errors might significantly benefit from reading glasses. Similarly, considering prisms or occlusion for those without suppression might reduce the convergence demands in these students while using their close reading distances. The impact of these prescriptions on reading rates, reading interest and visual fatigue is an area of promising future research. Most importantly, it is evident from the present study that a combination of factors such as accommodative errors, near point of convergence and suppression should be considered when prescribing reading devices for students with low vision. Considering these factors would also assist rehabilitation specialists in identifying those students who are likely to experience difficulty in prolonged reading, which is otherwise not reflected during typical clinical reading assessments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.