16 resultados para intelligibility
em Queensland University of Technology - ePrints Archive
Resumo:
Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.
Resumo:
This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.
Resumo:
Intelligible and accurate risk-based decision-making requires a complex balance of information from different sources, appropriate statistical analysis of this information and consequent intelligent inference and decisions made on the basis of these analyses. Importantly, this requires an explicit acknowledgement of uncertainty in the inputs and outputs of the statistical model. The aim of this paper is to progress a discussion of these issues in the context of several motivating problems related to the wider scope of agricultural production. These problems include biosecurity surveillance design, pest incursion, environmental monitoring and import risk assessment. The information to be integrated includes observational and experimental data, remotely sensed data and expert information. We describe our efforts in addressing these problems using Bayesian models and Bayesian networks. These approaches provide a coherent and transparent framework for modelling complex systems, combining the different information sources, and allowing for uncertainty in inputs and outputs. While the theory underlying Bayesian modelling has a long and well established history, its application is only now becoming more possible for complex problems, due to increased availability of methodological and computational tools. Of course, there are still hurdles and constraints, which we also address through sharing our endeavours and experiences.
Resumo:
Limited research is available on how well visual cues integrate with auditory cues to improve speech intelligibility in persons with visual impairments, such as cataracts. We investigated whether simulated cataracts interfered with participants’ ability to use visual cues to help disambiguate a spoken message in the presence of spoken background noise. We tested 21 young adults with normal visual acuity and hearing sensitivity. Speech intelligibility was tested under three conditions: auditory only with no visual input, auditory-visual with normal viewing, and auditory-visual with simulated cataracts. Central Institute for the Deaf (CID) Everyday Speech Sentences were spoken by a live talker, mimicking a pre-recorded audio track, in the presence of pre-recorded four-person background babble at a signal-to-noise ratio (SNR) of -13 dB. The talker was masked to the experimental conditions to control for experimenter bias. Relative to the normal vision condition, speech intelligibility was significantly poorer, [t (20) = 4.17, p < .01, Cohen’s d =1.0], in the simulated cataract condition. These results suggest that cataracts can interfere with speech perception, which may occur through a reduction in visual cues, less effective integration or a combination of the two effects. These novel findings contribute to our understanding of the association between two common sensory problems in adults: reduced contrast sensitivity associated with cataracts and reduced face-to-face communication in noise.
Resumo:
The category of the `at-risk youth' currently underpins a good deal of youth policy, and in particular, education policy. Primarily, the category is centred around a range of programmes associated with the need for state intervention, intervention which largely occurs `at a distance' within domains such as the school and the family. While it is argued that in some ways, the `at-risk youth' simply replaces older characterisations used in the policing of the young, it will also be argued that the preventative policies associated with `risk' are constituted in terms of factors rather than individuals; that prevention is no longer primarily based upon personal expertise, but rather upon the gathering and collation of statistical knowledge which identifies `risks' within given populations; and that `risk' permits a greater number of young people to be brought into the field of regulatory strategies. Importantly, the category of the `at-risk youth' underpins crucial sections of policy documents such as the Finn Report (into credentialling/ education and vocational competency). In this case, youth is deemed to be `at-risk' of not making the transition to adulthood successfully. It will be argued that not only is the Finn Report significant in the administrative and cultural shaping of the category of `youth', but also by employing the notion of `risk', the Report puts in place yet another element of an effective network of governmental intelligibility covering the young. Finally, it will be argued that young women, as a specific example of a `risk' group (vis-a-vis obtaining certain types of employment), require particular forms of intervention, primarily through changing the vocational aspirations of their parents.
Resumo:
The category of the `at-risk' youth currently underpins a good deal of youth policy. Primarily, it centres around a range of programs associated with the need for state intervention. The `at-risk' youth tenuously appears at the intersection of a variety of knowledges/problematisations, such as vocational guidance, youth welfare, family management, and so on. Whilst it is argued that in some ways, the `at-risk' youth simply replaces older characterisations used in the policing of the young, it will also be argued that the preventative policies associated with `risk' are constituted in terms of factors rather than individuals, that prevention is no longer primarily based upon personal expertise, but rather upon the gathering and collation of statistical knowledge which identifies `risks' within given populations, and that `risk' legitimates unlimited governmental intervention. Importantly, the category of the `at-risk' youth underpins crucial sections of policy documents such as the Finn Report (into credentialling/education and vocational competency). In this case, youth is deemed to be `at-risk' of not making the transition to adulthood successfully. It will be argued that not only is the Finn Report significant in the administrative and cultural shaping of the category of `youth', but also by employing the notion of `risk', the Report puts in place yet another element of an effective network of governmental intelligibility covering the young. Finally, it will be argued that young women, as a specific an example of a `risk' group (vis-a-vis obtaining certain types of employment), require particular forms of intervention, primarily through changing the vocational aspirations of their parents.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
This thesis critically analyses sperm donation practices from a child-centred perspective. It examines the effects, both personal and social, of disrupting the unity of biological and social relatedness in families affected by donor conception. It examines how disruption is facilitated by a process of mediation which is detailed using a model provided by Sunderland (2002). This model identifies mediating movements - alienation, translation, re-contextualisation and absorption - which help to explain the powerful and dominating material, and social and political processes which occur in biotechnology, or in reproductive technology in this case. The understanding of such movements and mediation of meanings is inspired by the complementary work of Silverstone (1999) and Sunderland. This model allows for a more critical appreciation of the movement of meaning from previously inalienable aspects of life to alienable products through biotechnology (Sunderland, 2002). Once this mediation in donor conception is subjected to critical examination here, it is then approached from different angles of investigation. The thesis posits that two conflicting notions of the self are being applied to fertility-frustrated adults and the offspring of reproductive interventions. Adults using reproductive interventions receive support to maximise their genetic continuity, but in so doing they create and dismiss the corresponding genetic discontinuity produced for the offspring. The offspring’s kinship and identity are then framed through an experimental postmodernist notion, presenting them as social rather than innate constructs. The adults using the reproductive intervention, on the other hand, have their identity and kinship continuity framed and supported as normative, innate, and based on genetic connection. This use of shifting frameworks is presented as unjust and harmful, creating double standards and a corrosion of kinship values, connection and intelligibility between generations; indeed, it is put forward as adult-centric. The analysis of other forms of human kinship dislocation provided by this thesis explores an under-utilised resource which is used to counter the commonly held opinion that any disruption of social and genetic relatedness for donor offspring is insignificant. The experiences of adoption and the stolen generations are used to inform understanding of the personal and social effects of such kinship disruption and potential reunion for donor offspring. These examples, along with laws governing international human rights, further strengthen the appeal here for normative principles and protections based on collective knowledge and standards to be applied to children of reproductive technology. The thesis presents the argument that the framing and regulation of reproductive technology is excessively influenced by industry providers and users. The interests of these parties collide with and corrode any accurate assessments and protections afforded to the children of reproductive technology. The thesis seeks to counter such encroachments and concludes by presenting these protections, frameworks, and human experiences as resources which can help to address the problems created for the offspring of such reproductive interventions, thereby illustrating why these reproductive interventions should be discontinued.
Resumo:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
Resumo:
This article applies a Wittgensteinian approach to the examination of the intelligibility of religious belief, in the wake of the recent attack on the Judeo-Christian religion by Richard Dawkins's book The God Delusion. The article attempts to show that Dawkins has confused religion with superstition, and that while Dawkins's arguments are decisive in the case of superstition, they do not successfully show religion to be a delusion. Religious belief in God is not like belief in the existence of a planet, and genuine religious faith is not like the belief in something for which there is not yet enough evidence, like belief in dark matter. The Christian doctrines of the resurrection and eternal life are misconstrued if they are understood as factual claims because they are then merely shallow superstitions, and not the great religious riddles they are meant to be.
Resumo:
Teacher professional standards have become a key policy mechanism for the reform of teaching and education in recent years. While standards policies claim to improve the quality of teaching and learning in schools today, this paper argues that a disjunction exists between the stated intentions of such programmes and the intelligibility of the practices of government in which they are invested. To this effect, the paper conducts an analytics of government of the recently released National Professional Standards for Teachers (Australian Institute for Teaching and School Leadership, 2011) arguing that the explicit, calculated rationality of the programme exists within a wider field of effects. Such analysis has the critical consequence of calling into question the claims of the programmers themselves thus breaching the self-evidence on which the standards rest.
Resumo:
In this study, I investigate the model of English language teacher education developed in Cuba. It includes features that would be considered innovative, contemporary, good practice anywhere in the Western world, as well as having distinctly Cuban elements. English is widely taught in Cuba in the education system and on television by Cuban teachers who are prepared in five-year courses at pedagogical universities by bilingual Cuban teacher educators. This case study explores the identity and pedagogy of six English language teacher educators at Cuba’s largest university of pedagogical sciences. Postcolonial theory provides a framework for examining how the Cuban pedagogy of English language teacher education resists the negative representation of Cuba in hegemonic Western discourse; and challenges neoliberal Western dogma. Postcolonial concepts of representation, resistance and hybridity are used in this examination. Cuban teacher education features a distinctive ‘pedagogy of tenderness’. Teacher educators build on caring relationships and institutionalised values of solidarity, collectivism and collaboration. Communicative English language teaching strategies are contextualised to enhance the pedagogical and communicative competence of student teachers, and intercultural intelligibility is emphasised. The collaborative pedagogy of Cuban English language teacher education features peer observation, mentoring and continuing professional development; as well as extensive pre-service classroom teaching and research skill development for student teachers. Being Cuban and bilingual are significant aspects of the professional identity of case members, who regard their profession as a vocation and who are committed to preparing good English language teachers.
Resumo:
Many governments in western democracies conduct the work of leading their societies forward through policy generation and implementation. Despite government attempts at extensive negotiation, collaboration and debate, the general populace in these same countries frequently express feelings of disempowerment and undue pressure to be compliant, often leading to disengagement. Here we outline Plan B: a process for examining how policies that emerge from good intentions are frequently interpreted as burdensome or irrelevant by those on whom they have an impact. Using a case study of professional standards for teachers in Australia, we describe how we distilled Foucault’s notions of archaeology into a research approach centring on the creation of ‘polyhedrons of intelligibility’ as an alternative approach by which both policy makers and those affected by their policies may understand how their respective causes are supported and adversely affected.