Biblioteca Digital

In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.

Veja mais

Impact of cognitive load and frustration on drivers’ speech [Abstract]

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.

Veja mais

Simulated cataracts and their effect on speech intelligibility

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.

Veja mais

Contrasting scenarios : embracing speech recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this chapter is to describe the use of caricatured contrasting scenarios (Bødker, 2000) and how they can be used to consider potential designs for disruptive technologies. The disruptive technology in this case is Automatic Speech Recognition (ASR) software in workplace settings. The particular workplace is the Magistrates Court of the Australian Capital Territory.----- Caricatured contrasting scenarios are ideally suited to exploring how ASR might be implemented in a particular setting because they allow potential implementations to be “sketched” quickly and with little effort. This sketching of potential interactions and the emphasis of both positive and negative outcomes allows the benefits and pitfalls of design decisions to become apparent.----- A brief description of the Court is given, describing the reasons for choosing the Court for this case study. The work of the Court is framed as taking place in two modes: Front of house, where the courtroom itself is, and backstage, where documents are processed and the business of the court is recorded and encoded into various systems.----- Caricatured contrasting scenarios describing the introduction of ASR to the front of house are presented and then analysed. These scenarios show that the introduction of ASR to the court would be highly problematic.----- The final section describes how ASR could be re-imagined in order to make it useful for the court. A final scenario is presented that describes how this re-imagined ASR could be integrated into both the front of house and backstage of the court in a way that could strengthen both processes.

Veja mais

Thermal analysis and hot stage Raman spectroscopy of the basic copper arsenate mineral : euchroite

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The thermal analysis of euchroite shows two mass loss steps in the temperature range 100 to 105°C and 185 to 205°C. These mass loss steps are attributed to dehydration and dehydroxylation of the mineral. Hot stage Raman spectroscopy (HSRS) has been used to study the thermal stability of the mineral euchroite, a mineral involved in a complex set of equilibria between the copper hydroxy arsenates: euchroite Cu2(AsO4)(OH).3H2O → olivenite Cu2(AsO4)(OH) → strashimirite Cu8(AsO4)4(OH)4.5H2O → arhbarite Cu2Mg(AsO4)(OH)3. Hot stage Raman spectroscopy inolves the collection of Raman spectra as a function of the temperature. HSRS shows that the mineral euchroite decomposes between 125 and 175 °C with the loss of water. At 125 °C, Raman bands are observed at 858 cm-1 assigned to the ν1 AsO43- symmetric stretching vibration and 801, 822 and 871 cm-1 assigned to the ν3 AsO43- (A1) antisymmetric stretching vibration. A distinct band shift is observed upon heating to 275 °C. At 275 °C the four Raman bands are resolved at 762, 810, 837 and 862 cm-1. Further heating results in the diminution of the intensity in the Raman spectra and this is attributed to sublimation of the arsenate mineral. Hot stage Raman spectroscopy is most useful technique for studying the thermal stability of minerals especially when only very small amounts of mineral are available.

Veja mais

Thermogravimetric analysis and hot stage Raman spectroscopy of cubic indium hydroxide

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The transition of cubic indium hydroxide to cubic indium oxide has been studied by thermogravimetric analysis complimented with hot stage Raman spectroscopy. Thermal analysis shows the transition of In(OH)3 to In2O3 occurs at 219°C. The structure and morphology of In(OH)3 synthesised using a soft chemical route at low temperatures was confirmed by X-ray diffraction and scanning electron microscopy. A topotactical relationship exists between the micro/nano-cubes of In(OH)3 and In2O3. The Raman spectrum of In(OH)3 is characterised by an intense sharp band at 309 cm-1 attributed to ν1 In-O symmetric stretching mode, bands at 1137 and 1155 cm-1 attributed to In-OH δ deformation modes, bands at 3083, 3215, 3123 and 3262 cm-1 assigned to the OH stretching vibrations. Upon thermal treatment of In(OH)3 new Raman bands are observed at 125, 295, 488 and 615 cm-1 attributed to In2O3. Changes in the structure of In(OH)3 with thermal treatment is readily followed by hot stage Raman spectroscopy.

Veja mais

Robust speech recognition using speech enhancement

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.

Veja mais

Delirium in early-stage Alzheimer's disease : enhancing cognitive reserve as a possible preventive measure

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Delirium is a disorder of acute onset with fluctuating symptoms and is characterized by inattention, disorganized thinking, and altered levels of consciousness. The risk for delirium is greatest in individuals with dementia, and the incidence of both is increasing worldwide because of the aging of our population. Although several clinical trials have tested interventions for delirium prevention in individuals without dementia, little is known about the mechanisms for the prevention of delirium in early-stage Alzheimer’s disease (AD). The purpose of this article is to explore ways of preventing delirium and slowing the rate of cognitive decline in early-stage AD by enhancing cognitive reserve. An agenda for future research on interventions to prevent delirium in individuals with early-stage AD is also presented.

Veja mais

Effectiveness of community-based nonpharmacological interventions for early-stage dementia : conclusions and recommendations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In 2007, a comprehensive review of the extant research on nonpharmacological interventions for persons with early-stage dementia was conducted. More than 150 research reports, centered on six major domains, were included: early-stage support groups, cognitive training and enhancement programs, exercise programs, exemplar programs, health promotion programs, and “other” programs not fitting into previous categories. Theories of neural regeneration and plasticity were most often used to support the tested interventions. Recommendations for practice, research, and health policy are outlined, including evidence-based, nonpharmacological treatment protocols for persons with mild cognitive impairment and early-stage dementia. A tested, community-based, multimodal treatment program is also described. Overall, findings identify well-supported nonpharmacological treatments for persons with early-stage dementia and implications for a national health care agenda to optimize outcomes for this growing population of older adults.

Veja mais

Tango Femme : placing the lesbian centre stage

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The play Tango Femme places the lesbian centre stage by creating characters, narrative and drama in the world of same-sex dancing. The accompanying exegesis examines the problems and issues associated with creating lesbian characters in theatre, using a synthesized, practice led methodology. During the process of imagining, constructing and writing my case study play, I have investigated lesbian theatre productions and companies in order to make sense of my personal experiences in the theatre world. I have also reflected on the lesbian as represented in mainstream theatre and popular culture. Through journal writing and contemplation, I have sought to identify difficulties inherent in writing this type of play, using my own journey as a focus. My study illuminates the historical and sociological circumstances in the eighties and nineties in Australia and concludes that as a lesbian playwright I was caught between a rock and a hard place: the rock being lesbian theatre on a community level, as defined and attended primarily by separatist lesbians, and the hard place being mainstream theatre, located within the dominant, heteronormative discourse. The play Tango Femme has developed in conversation with my reflective practice and research and is written in the space outside the master narrative as "an instance of lesbian discourse" (Davy 1996, p.153).

Veja mais

Lip detection for audio-visual speech recognition in-car environment

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.

Veja mais

997 resultados para stage speech

Filtro por publicador