999 resultados para Poetic Speech


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of automatic speech recognition systems deteriorates in the presence of noise. One known solution is to incorporate video information with an existing acoustic speech recognition system. We investigate the performance of the individual acoustic and visual sub-systems and then examine different ways in which the integration of the two systems may be performed. The system is to be implemented in real time on a Texas Instruments' TMS320C80 DSP.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Limited research is available on how well visual cues integrate with auditory cues to improve speech intelligibility in persons with visual impairments, such as cataracts. We investigated whether simulated cataracts interfered with participants’ ability to use visual cues to help disambiguate a spoken message in the presence of spoken background noise. We tested 21 young adults with normal visual acuity and hearing sensitivity. Speech intelligibility was tested under three conditions: auditory only with no visual input, auditory-visual with normal viewing, and auditory-visual with simulated cataracts. Central Institute for the Deaf (CID) Everyday Speech Sentences were spoken by a live talker, mimicking a pre-recorded audio track, in the presence of pre-recorded four-person background babble at a signal-to-noise ratio (SNR) of -13 dB. The talker was masked to the experimental conditions to control for experimenter bias. Relative to the normal vision condition, speech intelligibility was significantly poorer, [t (20) = 4.17, p < .01, Cohen’s d =1.0], in the simulated cataract condition. These results suggest that cataracts can interfere with speech perception, which may occur through a reduction in visual cues, less effective integration or a combination of the two effects. These novel findings contribute to our understanding of the association between two common sensory problems in adults: reduced contrast sensitivity associated with cataracts and reduced face-to-face communication in noise.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas.We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches.After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstraction in its resistance to evident meaning has the capacity to interrupt or at least provide tools with which to question an overly compliant reception of the information to which we are subject. It does so by highlighting a latency or potentiality inherent in materiality that points to the possibility of a critical resistance to this ceaseless flow of sound/image/data. This resistance has been remarked on in differing ways by a number of commentators such as Lyotard, in his exploration of the avant-garde and the sublime for example. This joint paper will initially map the collaborative project by Daniel Mafe and Andrew Brown, Affecting Interference which conjoins painting with digital sound and animations into a single, large scale, immersive exhibition/installation. The work acts as an interstitial point between contrasting approaches to abstraction: the visual and aural, the digital and analogue. The paper will then explore the ramifications of this through the examination of abstraction as ‘noise’, that is as that raw inassimilable materiality, within which lays the creative possibility to forge and embrace the as-yet-unthought and almost-forgotten. It does so by establishing a space for a more poetic and slower paced critical engagement for the viewing and receiving information or data. This slowing of perception through the suspension of easy recognition runs counter to our current ‘high performance’ culture, and it’s requisite demand for speedy assimilation of content, representing instead the poetic encounter with a potentiality or latency inherent in the nameless particularity of that which is.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Balcony acoustic treatments can mitigate the effects of community road traffic noise. To further investigate, a theoretical study into the effects of balcony acoustic treatment combinations on speech interference and transmission is conducted for various street geometries. Nine different balcony types are investigated using a combined specular and diffuse reflection computer model. Diffusion in the model is calculated using the radiosity technique. The balcony types include a standard balcony with or without a ceiling and with various combinations of parapet, ceiling absorption and ceiling shield. A total of 70 balcony and street geometrical configurations are analyzed with each balcony type, resulting in 630 scenarios. In each scenario the reverberation time, speech interference level (SIL) and speech transmission index (STI) are calculated. These indicators are compared to determine trends based on the effects of propagation path, inclusion of opposite buildings and difference with a reference position outside the balcony. The results demonstrate trends in SIL and STI with different balcony types. It is found that an acoustically treated balcony reduces speech interference. A parapet provides the largest improvement, followed by absorption on the ceiling. The largest reductions in speech interference arise when a combination of balcony acoustic treatments are applied.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Aphasia is an acquired language disorder that can present a significant barrier to patient involvement in healthcare decisions. Speech-language pathologists (SLPs) are viewed as experts in the field of communication. However, many SLP students do not receive practical training in techniques to communicate with people with aphasia (PWA) until they encounter PWA during clinical education placements. Methods This study investigated the confidence and knowledge of SLP students in communicating with PWA prior to clinical placements using a customised questionnaire. Confidence in communicating with people with aphasia was assessed using a 100-point visual analogue scale. Linear, and logistic, regressions were used to examine the association between confidence and age, as well as confidence and course type (graduate-entry masters or undergraduate), respectively. Knowledge of strategies to assist communication with PWA was examined by asking respondents to list specific strategies that could assist communication with PWA. Results SLP students were not confident with the prospect of communicating with PWA; reporting a median 29-points (inter-quartile range 17–47) on the visual analogue confidence scale. Only, four (8.2%) of respondents rated their confidence greater than 55 (out of 100). Regression analyses indicated no relationship existed between confidence and students‘ age (p = 0.31, r-squared = 0.02), or confidence and course type (p = 0.22, pseudo r-squared = 0.03). Students displayed limited knowledge about communication strategies. Thematic analysis of strategies revealed four overarching themes; Physical, Verbal Communication, Visual Information and Environmental Changes. While most students identified potential use of resources (such as images and written information), fewer students identified strategies to alter their verbal communication (such as reduced speech rate). Conclusions SLP students who had received aphasia related theoretical coursework, but not commenced clinical placements with PWA, were not confident in their ability to communicate with PWA. Students may benefit from an educational intervention or curriculum modification to incorporate practical training in effective strategies to communicate with PWA, before they encounter PWA in clinical settings. Ensuring students have confidence and knowledge of potential communication strategies to assist communication with PWA may allow them to focus their learning experiences in more specific clinical domains, such as clinical reasoning, rather than building foundation interpersonal communication skills.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Recent initiatives within an Australia public healthcare service have seen a focus on increasing the research capacity of their workforce. One of the key initiatives involves encouraging clinicians to be research generators rather than solely research consumers. As a result, baseline data of current research capacity are essential to determine whether initiatives encouraging clinicians to undertake research have been effective. Speech pathologists have previously been shown to be interested in conducting research within their clinical role; therefore they are well positioned to benefit from such initiatives. The present study examined the current research interest, confidence and experience of speech language pathologists (SLPs) in a public healthcare workforce, as well as factors that predicted clinician research engagement. Methods Data were collected via an online survey emailed to an estimated 330 SLPs working within Queensland, Australia. The survey consisted of 30 questions relating to current levels of interest, confidence and experience performing specific research tasks, as well as how frequently SLPs had performed these tasks in the last 5 years. Results Although 158 SLPs responded to the survey, complete data were available for only 137. Respondents were more confident and experienced with basic research tasks (e.g., finding literature) and less confident and experienced with complex research tasks (e.g., analysing and interpreting results, publishing results). For most tasks, SLPs displayed higher levels of interest in the task than confidence and experience. Research engagement was predicted by highest qualification obtained, current job classification level and overall interest in research. Conclusions Respondents generally reported levels of interest in research higher than their confidence and experience, with many respondents reporting limited experience in most research tasks. Therefore SLPs have potential to benefit from research capacity building activities to increase their research skills in order to meet organisational research engagement objectives. However, these findings must be interpreted with the caveats that a relatively low response rate occurred and participants were recruited from a single state-wide health service, and therefore may not be representative of the wider SLP workforce.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Postgraduate candidates in the creative arts encounter unique challenges when writing an exegesis (the written document that accompanies creative work as a thesis). As a practitioner-researcher, they must adopt a dual perspective–looking out towards an established field of research, exemplars and theories, as well as inwards towards their experiential creative processes and practice. This dual orientation provides clear benefits, for it enables them to situate the research within its field and make objective claims for the research methodologies and outcomes while maintaining an intimate, voiced relationship with the practice. However, a dual orientation introduces considerable complexities in the writing. It requires a reconciliation of multi-perspectival subject positions: the disinterested academic posture of the observer/ethnographer/analyst/theorist at times; and the invested, subjective stance the practitioner/producer at others. It requires the author to negotiate a range of writing styles and speech genres–from the formal, polemical style of the theorist to the personal, questioning and emotive voice of reflexivity. Moreover, these multi-variant orientations, subject positions, styles and voices must be integrated into a unified and coherent text. In this chapter I offer a conceptual framework and strategies for approaching this relatively new genre of thesis. I begin by summarizing the characteristics of what has begun to emerge as the predominant model of exegesis (the dual-oriented ‘Connective’ exegesis). Framing it against theoretical and philosophical understandings of polyvocality and matrixicality, I go on to point to recent textual models that provide precedents for connecting differently oriented perspectives, subjectivities and voices. I then turn to emergent archives of practice-led research to explain how the challenge of writing a ‘Connective’ exegesis has so far been resolved by higher degree research (HDR) candidates. Exemplars illustrate a range of strategies they have used to compose a multi-perspectival text, reconcile the divergent subject positions of the practitioner researcher, and harmonize the speech genres of a ployvocal text.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Residential balcony design influences speech interference levels caused by road traffic noise and a simplified design methodology is needed for optimising balcony acoustic treatments. This research comprehensively assesses speech interference levels and benefits of nine different balcony designs situated in urban street canyons through the use of a combined direct, specular reflection and diffuse reflection path theoretical model. This thesis outlines the theory, analysis and results that lead up to the presentation of a practical design guide which can be used to predict the acoustic effects of balcony geometry and acoustic treatments in streets with variable geometry and acoustic characteristics.