270 resultados para SPEECH THERAPY
Resumo:
In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.
Resumo:
Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.
Resumo:
Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.
Resumo:
The purpose of this chapter is to describe the use of caricatured contrasting scenarios (Bødker, 2000) and how they can be used to consider potential designs for disruptive technologies. The disruptive technology in this case is Automatic Speech Recognition (ASR) software in workplace settings. The particular workplace is the Magistrates Court of the Australian Capital Territory.----- Caricatured contrasting scenarios are ideally suited to exploring how ASR might be implemented in a particular setting because they allow potential implementations to be “sketched” quickly and with little effort. This sketching of potential interactions and the emphasis of both positive and negative outcomes allows the benefits and pitfalls of design decisions to become apparent.----- A brief description of the Court is given, describing the reasons for choosing the Court for this case study. The work of the Court is framed as taking place in two modes: Front of house, where the courtroom itself is, and backstage, where documents are processed and the business of the court is recorded and encoded into various systems.----- Caricatured contrasting scenarios describing the introduction of ASR to the front of house are presented and then analysed. These scenarios show that the introduction of ASR to the court would be highly problematic.----- The final section describes how ASR could be re-imagined in order to make it useful for the court. A final scenario is presented that describes how this re-imagined ASR could be integrated into both the front of house and backstage of the court in a way that could strengthen both processes.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
he purpose of this study was to evaluate the comparative cost of treating alcohol dependence with either cognitive behavioral therapy (CBT) alone or CBT combined with naltrexone (CBT+naltrexone). Two hundred ninety-eight outpatients dependent on alcohol who were consecutively treated for alcohol dependence participated in this study. One hundred seven (36%) patients received adjunctive pharmacotherapy (CBT+naltrexone). The Drug Abuse Treatment Cost Analysis Program was used to estimate treatment costs. Adjunctive pharmacotherapy (CBT+naltrexone) introduced an additional treatment cost and was 54% more expensive than CBT alone. When treatment abstinence rates (36.1% CBT; 62.6% CBT+naltrexone) were applied to cost effectiveness ratios, CBT+naltrexone demonstrated an advantage over CBT alone. There were no differences between groups on a preference-based health measure (SF-6D). In this treatment center, to achieve 100 abstainers over a 12-week program, 280 patients require CBT compared with 160 CBT+naltrexone. The dominant choice was CBT+naltrexone based on modest economic advantages and significant efficiencies in the numbers needed to treat.
Resumo:
This naturalistic study investigated the mechanisms of change in measures of negative thinking and in 24-h urinary metabolites of noradrenaline (norepinephrine), dopamine and serotonin in a sample of 43 depressed hospital patients attending an eight-session group cognitive behavior therapy program. Most participants (91%) were taking antidepressant medication throughout the therapy period according to their treating Psychiatrists' prescriptions. The sample was divided into outcome categories (19 Responders and 24 Non-responders) on the basis of a clinically reliable change index [Jacobson, N.S., & Truax, P., 1991. Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12–19.] applied to the Beck Depression Inventory scores at the end of the therapy. Results of repeated measures analysis of variance [ANOVA] analyses of variance indicated that all measures of negative thinking improved significantly during therapy, and significantly more so in the Responders as expected. The treatment had a significant impact on urinary adrenaline and metadrenaline excretion however, these changes occurred in both Responders and Non-responders. Acute treatment did not significantly influence the six other monoamine metabolites. In summary, changes in urinary monoamine levels during combined treatment for depression were not associated with self-reported changes in mood symptoms.
Resumo:
This paper summarises results from an evaluation of the adequacy and utility of the Australian Competency Standards for Entry-Level Occupational Therapists © (OT AUSTRALIA, 1994a). It comprised a two-part study, incorporating an online survey of key national stakeholders (n = 26), and 13 focus groups (n = 152) conducted throughout Australia with occupational therapy clinicians, academics, OT AUSTRALIA association and Occupational Therapy Registration Board representatives, as well as university program accreditors. The key recommendations were that: (i) urgent revision to reflect contemporary practice, paradigms, approaches and frameworks is required; (ii) the standards should exemplify basic competence at graduation (not within two years following); (iii) a revision cycle of five years is required; (iv) the Australian Qualifications Framework should be retained, preceded by an introduction describing the scope and nature of occupational therapy practice in the national context; (v) access to the standards should be free and unrestricted to occupational therapists, students and the public via the OT AUSTRALIA (national) website; (vi) the standards should incorporate a succinct executive summary and additional tools or templates formatted to enable occupational therapists to develop professional portfolios and create working documents specific to their workplace; and (vii) language must accommodate contextual variation while striking an appropriate balance between providing instruction and encouraging innovation in practice.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.
Resumo:
Non-driving related cognitive load and variations of emotional state may impact a driver’s capability to control a vehicle and introduces driving errors. Availability of reliable cognitive load and emotion detection in drivers would benefit the design of active safety systems and other intelligent in-vehicle interfaces. In this study, speech produced by 68 subjects while driving in urban areas is analyzed. A particular focus is on speech production differences in two secondary cognitive tasks, interactions with a co-driver and calls to automated spoken dialog systems (SDS), and two emotional states during the SDS interactions - neutral/negative. A number of speech parameters are found to vary across the cognitive/emotion classes. Suitability of selected cepstral- and production-based features for automatic cognitive task/emotion classification is investigated. A fusion of GMM/SVM classifiers yields an accuracy of 94.3% in cognitive task and 81.3% in emotion classification.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.
Resumo:
Aims To identify self-care activities undertaken and determine relationships between self-efficacy, depression, quality of life, social support and adherence to compression therapy in a sample of patients with chronic venous insufficiency. Background Up to 70% of venous leg ulcers recur after healing. Compression hosiery is a primary strategy to prevent recurrence, however, problems with adherence to this strategy are well documented and an improved understanding of how psychosocial factors influence patients with chronic venous insufficiency will help guide effective preventive strategies. Design Cross-sectional survey and retrospective medical record review. Method All patients previously diagnosed with a venous leg ulcer which healed between 12–36 months prior to the study were invited to participate. Data on health, psychosocial variables and self-care activities were obtained from a self-report survey and data on medical and previous ulcer history were obtained from medical records. Multiple linear regression modelling was used to determine the independent influences of psychosocial factors on adherence to compression therapy. Results In a sample of 122 participants, the most frequently identified self-care activities were application of topical skin treatments, wearing compression hosiery and covering legs to prevent trauma. Compression hosiery was worn for a median of 4 days/week (range 0–7). After adjustment for all variables and potential confounders in a multivariable regression model, wearing compression hosiery was found to be significantly positively associated with participants’ knowledge of the cause of their condition (p=0.002), higher self-efficacy scores (p=0.026) and lower depression scores (p=0.009). Conclusion In this sample, depression, self-efficacy and knowledge were found to be significantly related to adherence to compression therapy. Relevance to clinical practice These findings support the need to screen for and treat depression in this population. In addition, strategies to improve patient knowledge and self-efficacy may positively influence adherence to compression therapy.
Resumo:
What really changed for Australian Aboriginal and Torres Strait Islander people between Paul Keating’s Redfern Park Speech (Keating 1992) and Kevin Rudd’s Apology to the stolen generations (Rudd 2008)? What will change between the Apology and the next speech of an Australian Prime Minister? The two speeches were intricately linked, and they were both personal and political. But do they really signify change at the political level? This paper reflects my attempt to turn the gaze away from Aboriginal and Torres Strait Islander people, and back to where the speeches originated: the Australian Labor Party (ALP). I question whether the changes foreshadowed in the two speeches – including changes by the Australian public and within Australian society – are evident in the internal mechanisms of the ALP. I also seek to understand why non-Indigenous women seem to have given in to the existing ways of the ALP instead of challenging the status quo which keeps Aboriginal and Torres Strait Islander peoples marginalised. I believe that, without a thorough examination and a change in the ALP’s practices, the domination and subjugation of Indigenous peoples will continue – within the Party, through the Australian political process and, therefore, through governments.