39 resultados para Speech, Intelligibility of

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Limited research is available on how well visual cues integrate with auditory cues to improve speech intelligibility in persons with visual impairments, such as cataracts. We investigated whether simulated cataracts interfered with participants’ ability to use visual cues to help disambiguate a spoken message in the presence of spoken background noise. We tested 21 young adults with normal visual acuity and hearing sensitivity. Speech intelligibility was tested under three conditions: auditory only with no visual input, auditory-visual with normal viewing, and auditory-visual with simulated cataracts. Central Institute for the Deaf (CID) Everyday Speech Sentences were spoken by a live talker, mimicking a pre-recorded audio track, in the presence of pre-recorded four-person background babble at a signal-to-noise ratio (SNR) of -13 dB. The talker was masked to the experimental conditions to control for experimenter bias. Relative to the normal vision condition, speech intelligibility was significantly poorer, [t (20) = 4.17, p < .01, Cohen’s d =1.0], in the simulated cataract condition. These results suggest that cataracts can interfere with speech perception, which may occur through a reduction in visual cues, less effective integration or a combination of the two effects. These novel findings contribute to our understanding of the association between two common sensory problems in adults: reduced contrast sensitivity associated with cataracts and reduced face-to-face communication in noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intelligible and accurate risk-based decision-making requires a complex balance of information from different sources, appropriate statistical analysis of this information and consequent intelligent inference and decisions made on the basis of these analyses. Importantly, this requires an explicit acknowledgement of uncertainty in the inputs and outputs of the statistical model. The aim of this paper is to progress a discussion of these issues in the context of several motivating problems related to the wider scope of agricultural production. These problems include biosecurity surveillance design, pest incursion, environmental monitoring and import risk assessment. The information to be integrated includes observational and experimental data, remotely sensed data and expert information. We describe our efforts in addressing these problems using Bayesian models and Bayesian networks. These approaches provide a coherent and transparent framework for modelling complex systems, combining the different information sources, and allowing for uncertainty in inputs and outputs. While the theory underlying Bayesian modelling has a long and well established history, its application is only now becoming more possible for complex problems, due to increased availability of methodological and computational tools. Of course, there are still hurdles and constraints, which we also address through sharing our endeavours and experiences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Postgraduate candidates in the creative arts encounter unique challenges when writing an exegesis (the written document that accompanies creative work as a thesis). As a practitioner-researcher, they must adopt a dual perspective–looking out towards an established field of research, exemplars and theories, as well as inwards towards their experiential creative processes and practice. This dual orientation provides clear benefits, for it enables them to situate the research within its field and make objective claims for the research methodologies and outcomes while maintaining an intimate, voiced relationship with the practice. However, a dual orientation introduces considerable complexities in the writing. It requires a reconciliation of multi-perspectival subject positions: the disinterested academic posture of the observer/ethnographer/analyst/theorist at times; and the invested, subjective stance the practitioner/producer at others. It requires the author to negotiate a range of writing styles and speech genres–from the formal, polemical style of the theorist to the personal, questioning and emotive voice of reflexivity. Moreover, these multi-variant orientations, subject positions, styles and voices must be integrated into a unified and coherent text. In this chapter I offer a conceptual framework and strategies for approaching this relatively new genre of thesis. I begin by summarizing the characteristics of what has begun to emerge as the predominant model of exegesis (the dual-oriented ‘Connective’ exegesis). Framing it against theoretical and philosophical understandings of polyvocality and matrixicality, I go on to point to recent textual models that provide precedents for connecting differently oriented perspectives, subjectivities and voices. I then turn to emergent archives of practice-led research to explain how the challenge of writing a ‘Connective’ exegesis has so far been resolved by higher degree research (HDR) candidates. Exemplars illustrate a range of strategies they have used to compose a multi-perspectival text, reconcile the divergent subject positions of the practitioner researcher, and harmonize the speech genres of a ployvocal text.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The internet by its very nature challenges an individual’s notions of propriety, moral acuity and social correctness. A tension will always exist between the censorship of obscene and sensitive information and the freedom to publish and/or access such information. Freedom of expression and communication on the internet is not a static concept: ‘Its continual regeneration is the product of particular combinations of political, legal, cultural and philosophical conditions’.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Poem published in Free Speech segment of NTEU Newletter.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Twitter and other social networking sites play an ever more present role in the spread of current events. The dynamics of information dissemination through digital network structures are still relatively unexplored, however. At what time an issue is taken up by whom? Who forwards a message when to whom else? What role do individual communication participants, existing digital communities or the technical foundations of each network platform play in the spread of news? In this chapter we discuss, using the example of a video on a current sociopolitical issue in Australia that was shared on Twitter, a number of new methods for the dynamic visualisation and analysis of communication processes. Our method combines temporal and spatial analytical approaches and provides new insights into the spread of news in digital networks. [Social media dienen immer häufger als Disseminationsmechanismen für Medieninhalte. Auf Twitter ermöglicht besonders die Retweet-Funktion den schnellen und weitläufgen Transfer von Nachrichten. In diesem Beitrag etablieren neue methodische Ansätze zur Erfassung, Visualisierung und Analyse von Retweet-Ketten. Insbesondere heben wir hervor, wie bestehende Netzwerkanalysemethoden ergänzt werden können, um den Ablauf der Weiterleitung sowohl temporal als auch spatial zu erfassen. Unsere Fallstudie demonstriert die verbreitung des videoclips einer am 9. Oktober 2012 spontan gehaltenen Wutrede der australischen Premierministerin Julia Gillard, in der sie Oppositionsführer Tony Abbott als Frauenhasser brandmarkte. Durch die Erfassung von Hintergrunddaten zu den jeweiligen NutzerInnen, die sich an der Weiterleitung des Videoclips beteiligten, erstellen wir ein detailliertes Bild des Disseminationsablaufs im vorliegenden Fall. So lassen sich die wichtigsten AkteurInnen und der Ablauf der Weiterleitung darstellen und analysieren. Daraus entstehen Einblicke in die allgemeinen verbreitungsmuster von Nachrichten auf Twitter].

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article applies a Wittgensteinian approach to the examination of the intelligibility of religious belief, in the wake of the recent attack on the Judeo-Christian religion by Richard Dawkins's book The God Delusion. The article attempts to show that Dawkins has confused religion with superstition, and that while Dawkins's arguments are decisive in the case of superstition, they do not successfully show religion to be a delusion. Religious belief in God is not like belief in the existence of a planet, and genuine religious faith is not like the belief in something for which there is not yet enough evidence, like belief in dark matter. The Christian doctrines of the resurrection and eternal life are misconstrued if they are understood as factual claims because they are then merely shallow superstitions, and not the great religious riddles they are meant to be.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Teacher professional standards have become a key policy mechanism for the reform of teaching and education in recent years. While standards policies claim to improve the quality of teaching and learning in schools today, this paper argues that a disjunction exists between the stated intentions of such programmes and the intelligibility of the practices of government in which they are invested. To this effect, the paper conducts an analytics of government of the recently released National Professional Standards for Teachers (Australian Institute for Teaching and School Leadership, 2011) arguing that the explicit, calculated rationality of the programme exists within a wider field of effects. Such analysis has the critical consequence of calling into question the claims of the programmers themselves thus breaching the self-evidence on which the standards rest.