109 resultados para speech databases
Resumo:
This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification
Resumo:
Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.
Resumo:
A database will be protected under Australian law if it is a literary work; expressed in material form; meets the originality test; and has a relevant connection with Australia. Facts and data in themselves are not protected by copyright. However, a collection of data, a dataset, or a database may be protected by copyright if it is sufficiently original. Whether a work is sufficiently original to be protected by copyright depends on whether it has been produced by the application of independent intellectual effort by the author/s, which may involve the exercise of skill, judgement, or creativity in selecting, presenting, or arranging the information. This summary synthesises recent cases regarding originality in factual compilations.
Resumo:
Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.
Resumo:
Limited research is available on how well visual cues integrate with auditory cues to improve speech intelligibility in persons with visual impairments, such as cataracts. We investigated whether simulated cataracts interfered with participants’ ability to use visual cues to help disambiguate a spoken message in the presence of spoken background noise. We tested 21 young adults with normal visual acuity and hearing sensitivity. Speech intelligibility was tested under three conditions: auditory only with no visual input, auditory-visual with normal viewing, and auditory-visual with simulated cataracts. Central Institute for the Deaf (CID) Everyday Speech Sentences were spoken by a live talker, mimicking a pre-recorded audio track, in the presence of pre-recorded four-person background babble at a signal-to-noise ratio (SNR) of -13 dB. The talker was masked to the experimental conditions to control for experimenter bias. Relative to the normal vision condition, speech intelligibility was significantly poorer, [t (20) = 4.17, p < .01, Cohen’s d =1.0], in the simulated cataract condition. These results suggest that cataracts can interfere with speech perception, which may occur through a reduction in visual cues, less effective integration or a combination of the two effects. These novel findings contribute to our understanding of the association between two common sensory problems in adults: reduced contrast sensitivity associated with cataracts and reduced face-to-face communication in noise.
Resumo:
Background Emergency department (ED) crowding caused by access block is an increasing public health issue and has been associated with impaired healthcare delivery, negative patient outcomes and increased staff workload. Aim To investigate the impact of opening a new ED on patient and healthcare service outcomes. Methods A 24-month time series analysis was employed using deterministically linked data from the ambulance service and three ED and hospital admission databases in Queensland, Australia. Results Total volume of ED presentations increased 18%, while local population growth increased by 3%. Healthcare service and patient outcomes at the two pre-existing hospitals did not improve. These outcomes included ambulance offload time: (Hospital A PRE: 10 min, POST: 10 min, P < 0.001; Hospital B PRE: 10 min, POST: 15 min, P < 0.001); ED length of stay: (Hospital A PRE: 242 min, POST: 246 min, P < 0.001; Hospital B PRE: 182 min, POST: 210 min, P < 0.001); and access block: (Hospital A PRE: 41%, POST: 46%, P < 0.001; Hospital B PRE: 23%, POST: 40%, P < 0.001). Time series modelling indicated that the effect was worst at the hospital furthest away from the new ED. Conclusions An additional ED within the region saw an increase in the total volume of presentations at a rate far greater than local population growth, suggesting it either provided an unmet need or a shifting of activity from one sector to another. Future studies should examine patient decision making regarding reasons for presenting to a new or pre-existing ED. There is an inherent need to take a ‘whole of health service area’ approach to solve crowding issues.
Resumo:
Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas.We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches.After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.
Resumo:
Balcony acoustic treatments can mitigate the effects of community road traffic noise. To further investigate, a theoretical study into the effects of balcony acoustic treatment combinations on speech interference and transmission is conducted for various street geometries. Nine different balcony types are investigated using a combined specular and diffuse reflection computer model. Diffusion in the model is calculated using the radiosity technique. The balcony types include a standard balcony with or without a ceiling and with various combinations of parapet, ceiling absorption and ceiling shield. A total of 70 balcony and street geometrical configurations are analyzed with each balcony type, resulting in 630 scenarios. In each scenario the reverberation time, speech interference level (SIL) and speech transmission index (STI) are calculated. These indicators are compared to determine trends based on the effects of propagation path, inclusion of opposite buildings and difference with a reference position outside the balcony. The results demonstrate trends in SIL and STI with different balcony types. It is found that an acoustically treated balcony reduces speech interference. A parapet provides the largest improvement, followed by absorption on the ceiling. The largest reductions in speech interference arise when a combination of balcony acoustic treatments are applied.
Resumo:
Background Aphasia is an acquired language disorder that can present a significant barrier to patient involvement in healthcare decisions. Speech-language pathologists (SLPs) are viewed as experts in the field of communication. However, many SLP students do not receive practical training in techniques to communicate with people with aphasia (PWA) until they encounter PWA during clinical education placements. Methods This study investigated the confidence and knowledge of SLP students in communicating with PWA prior to clinical placements using a customised questionnaire. Confidence in communicating with people with aphasia was assessed using a 100-point visual analogue scale. Linear, and logistic, regressions were used to examine the association between confidence and age, as well as confidence and course type (graduate-entry masters or undergraduate), respectively. Knowledge of strategies to assist communication with PWA was examined by asking respondents to list specific strategies that could assist communication with PWA. Results SLP students were not confident with the prospect of communicating with PWA; reporting a median 29-points (inter-quartile range 17–47) on the visual analogue confidence scale. Only, four (8.2%) of respondents rated their confidence greater than 55 (out of 100). Regression analyses indicated no relationship existed between confidence and students‘ age (p = 0.31, r-squared = 0.02), or confidence and course type (p = 0.22, pseudo r-squared = 0.03). Students displayed limited knowledge about communication strategies. Thematic analysis of strategies revealed four overarching themes; Physical, Verbal Communication, Visual Information and Environmental Changes. While most students identified potential use of resources (such as images and written information), fewer students identified strategies to alter their verbal communication (such as reduced speech rate). Conclusions SLP students who had received aphasia related theoretical coursework, but not commenced clinical placements with PWA, were not confident in their ability to communicate with PWA. Students may benefit from an educational intervention or curriculum modification to incorporate practical training in effective strategies to communicate with PWA, before they encounter PWA in clinical settings. Ensuring students have confidence and knowledge of potential communication strategies to assist communication with PWA may allow them to focus their learning experiences in more specific clinical domains, such as clinical reasoning, rather than building foundation interpersonal communication skills.
Resumo:
Background Recent initiatives within an Australia public healthcare service have seen a focus on increasing the research capacity of their workforce. One of the key initiatives involves encouraging clinicians to be research generators rather than solely research consumers. As a result, baseline data of current research capacity are essential to determine whether initiatives encouraging clinicians to undertake research have been effective. Speech pathologists have previously been shown to be interested in conducting research within their clinical role; therefore they are well positioned to benefit from such initiatives. The present study examined the current research interest, confidence and experience of speech language pathologists (SLPs) in a public healthcare workforce, as well as factors that predicted clinician research engagement. Methods Data were collected via an online survey emailed to an estimated 330 SLPs working within Queensland, Australia. The survey consisted of 30 questions relating to current levels of interest, confidence and experience performing specific research tasks, as well as how frequently SLPs had performed these tasks in the last 5 years. Results Although 158 SLPs responded to the survey, complete data were available for only 137. Respondents were more confident and experienced with basic research tasks (e.g., finding literature) and less confident and experienced with complex research tasks (e.g., analysing and interpreting results, publishing results). For most tasks, SLPs displayed higher levels of interest in the task than confidence and experience. Research engagement was predicted by highest qualification obtained, current job classification level and overall interest in research. Conclusions Respondents generally reported levels of interest in research higher than their confidence and experience, with many respondents reporting limited experience in most research tasks. Therefore SLPs have potential to benefit from research capacity building activities to increase their research skills in order to meet organisational research engagement objectives. However, these findings must be interpreted with the caveats that a relatively low response rate occurred and participants were recruited from a single state-wide health service, and therefore may not be representative of the wider SLP workforce.
Resumo:
Postgraduate candidates in the creative arts encounter unique challenges when writing an exegesis (the written document that accompanies creative work as a thesis). As a practitioner-researcher, they must adopt a dual perspective–looking out towards an established field of research, exemplars and theories, as well as inwards towards their experiential creative processes and practice. This dual orientation provides clear benefits, for it enables them to situate the research within its field and make objective claims for the research methodologies and outcomes while maintaining an intimate, voiced relationship with the practice. However, a dual orientation introduces considerable complexities in the writing. It requires a reconciliation of multi-perspectival subject positions: the disinterested academic posture of the observer/ethnographer/analyst/theorist at times; and the invested, subjective stance the practitioner/producer at others. It requires the author to negotiate a range of writing styles and speech genres–from the formal, polemical style of the theorist to the personal, questioning and emotive voice of reflexivity. Moreover, these multi-variant orientations, subject positions, styles and voices must be integrated into a unified and coherent text. In this chapter I offer a conceptual framework and strategies for approaching this relatively new genre of thesis. I begin by summarizing the characteristics of what has begun to emerge as the predominant model of exegesis (the dual-oriented ‘Connective’ exegesis). Framing it against theoretical and philosophical understandings of polyvocality and matrixicality, I go on to point to recent textual models that provide precedents for connecting differently oriented perspectives, subjectivities and voices. I then turn to emergent archives of practice-led research to explain how the challenge of writing a ‘Connective’ exegesis has so far been resolved by higher degree research (HDR) candidates. Exemplars illustrate a range of strategies they have used to compose a multi-perspectival text, reconcile the divergent subject positions of the practitioner researcher, and harmonize the speech genres of a ployvocal text.
Resumo:
Residential balcony design influences speech interference levels caused by road traffic noise and a simplified design methodology is needed for optimising balcony acoustic treatments. This research comprehensively assesses speech interference levels and benefits of nine different balcony designs situated in urban street canyons through the use of a combined direct, specular reflection and diffuse reflection path theoretical model. This thesis outlines the theory, analysis and results that lead up to the presentation of a practical design guide which can be used to predict the acoustic effects of balcony geometry and acoustic treatments in streets with variable geometry and acoustic characteristics.