52 resultados para Text-to-speech
em Queensland University of Technology - ePrints Archive
Resumo:
Concept mapping involves determining relevant concepts from a free-text input, where concepts are defined in an external reference ontology. This is an important process that underpins many applications for clinical information reporting, derivation of phenotypic descriptions, and a number of state-of-the-art medical information retrieval methods. Concept mapping can be cast into an information retrieval (IR) problem: free-text mentions are treated as queries and concepts from a reference ontology as the documents to be indexed and retrieved. This paper presents an empirical investigation applying general-purpose IR techniques for concept mapping in the medical domain. A dataset used for evaluating medical information extraction is adapted to measure the effectiveness of the considered IR approaches. Standard IR approaches used here are contrasted with the effectiveness of two established benchmark methods specifically developed for medical concept mapping. The empirical findings show that the IR approaches are comparable with one benchmark method but well below the best benchmark.
Resumo:
This paper describes an interactive installation work set in a large dome space. The installation is an audio and physical re-rendition of an interactive writing work. In the original work, the user interacted via keyboard and screen while online. This rendition of the work retains the online interaction, but also places the interaction within a physical space, where the main 'conversation' takes place by the participant-audience speaking through microphones and listening through headphones. The work now also includes voice and SMS input, using speech-to-text and text-to-speech conversion technologies, and audio and displayed text for output. These additions allow the participant-audience to co-author the work while they participate in audible conversation with keyword-triggering characters (bots). Communication in the space can be person-to-computer via microphone, keyboard, and phone; person-to-person via machine and within the physical space; computer-to- computer; and computer-to-person via audio and projected text.
Resumo:
"An Introduction to Public Health is about the discipline of public health and the nature and scope of public health activity set within the challenges of the twenty first century. It is an introductory text to the principles and practice of public health written in a way that is easy to understand. Of what relevance is public health to the many allied health disciplines who contribute to it? How might an understanding of public health contribute to a range of health professionals who use the principles and practices of public health in their professional activities? These are the questions that this book addresses. An Introduction to Public Health leads the reader on a journey of discovery that concludes with not only an understanding of the nature and scope of public health but the challenges that face the field into the future." Provided by publisher.
Resumo:
Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.
Resumo:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
Resumo:
The availability and use of online counseling approaches has increased rapidly over the last decade. While research has suggested a range of potential affordances and limitations of online counseling modalities, very few studies have offered detailed examinations of how counselors and clients manage asynchronous email counseling exchanges. In this paper we examine email exchanges involving clients and counselors through Kids Helpline, a national Australian counseling service that offers free online, email and telephone counseling for young people up to the age of 25. We employ tools from the traditions of ethnomethodology and conversation analysis to analyze the ways in which counselors from Kids Helpline request that their clients call them, and hence change the modality of their counseling relationship, from email to telephone counseling. This paper shows the counselors’ three multi-layered approaches in these emails as they negotiate the potentially delicate task of requesting and persuading a client to change the trajectory of their counseling relationship from text to talk without placing that relationship in jeopardy.
Resumo:
Why is public health important? An Introduction to Public Health is about the discipline of public health, the nature and scope of public health activity, and the challenges that face public health in the twenty-first century. The book is designed as an introductory text to the principles and practice of public health. This is a complex and multifaceted area. What we have tried to do in this book is make public health easy to understand without making it simplistic. As many authors have stated, public health is essentially about the organised efforts of society to promote, protect and restore the public’s health (Brownson 2011, Last 2001, Schneider 2011, Turnock 2012, Winslow 1920). It is multidisciplinary in nature, and it is influenced by genetic, physical, social, cultural, economic and political determinants of health. How do we define public health, and what are the disciplines that contribute to public health? How has the area changed over time? Are there health issues in the twenty-first century that change the focus and activity of public health? Yes, there are! There are many challenges facing public health now and in the future, just as there have been over the course of the history of organised public health efforts, dating from around 1850 in the Western world. Of what relevance is public health to the many health disciplines that contribute to it? How might an understanding of public health contribute to a range of health professionals who use the principles and practices of public health in their professional activities? These are the questions that this book addresses. Introduction to Public Health leads the reader on a journey of discovery that concludes with an understanding of the nature and scope of public health and the challenges facing the field into the future. In this edition we have included one new chapter, ‘Public health and social policy’, in order to broaden our understanding of the policy influences on public health. The book is designed for a range of students undertaking health courses where there is a focus on advancing the health of the population. While it is imperative that people wanting to be public health professionals understand the theory and practice of public health, many other health workers contribute to effective public health practice. The book would also be relevant to a range of undergraduate students who want an introductory understanding of public health and its practice.
Resumo:
This paper explores how game authoring tools can teach processes that transform everyday places into engaging learning spaces. It discusses the motivation inherent in playing games and creating games for others, and how this stimulates an iterative process of creation and reflection and evokes a natural desire to engage in learning. The use of MiLK at the Adelaide Botanic Gardens is offered as a case in point. MiLK is an authoring tool that allows students and teachers to create and share SMS games for mobile phones. A group of South Australian high school students used MiLK to play a game, create their own games and play each other’s games during a day at the gardens. This paper details the learning processes involved in these activities and how the students, without prompting, reflected on their learning, conducted peer assessment, and engaged in a two-way discussion with their teacher about new technologies and their implications for learning. The paper concludes with a discussion of the needs and requirements of 21st century learners and how MiLK can support constructivist and connectivist teaching methods that engage learners and will produce an appropriately skilled future workforce.
Resumo:
What are the ethical and political implications when the very foundations of life —things of awe and spiritual significance — are translated into products accessible to few people? This book critically analyses this historic recontextualisation. Through mediation — when meaning moves ‘from one text to another, from one discourse to another’ — biotechnology is transformed into analysable data and into public discourses. The unique book links biotechnology with media and citizenship. As with any ‘commodity’, biological products have been commodified. Because enormous speculative investment rests on this, risk will be understated and benefit will be overstated. Benefits will be unfairly distributed. Already, the bioprospecting of Southern megadiverse nations, legally sanctioned by U.S. property rights conventions, has led to wealth and health benefits in the North. Crucial to this development are biotechnological discourses that shift meanings from a “language of life” into technocratic discourses, infused with neo-liberal economic assumptions that promise progress and benefits for all. Crucial in this is the mass media’s representation of biotechnology for an audience with poor scientific literacy. Yet, even apparently benign biotechnology spawned by the Human Genome Project such as prenatal screening has eugenic possibilities, and genetic codes for illness are eagerly sought by insurance companies seeking to exclude certain people. These issues raise important questions about a citizenship that is founded on moral responsibility for the wellbeing of society now and into the future. After all, biotechnology is very much concerned with the essence of life itself. This book provides a space for alternative and dissident voices beyond the hype that surrounds biotechnology.
Resumo:
Margaret Kettle examines grammar, its image problem and some new developments aimed at improving its teaching and learning in the TESOL classroom.
Resumo:
Our students come from diverse backgrounds. They need flexibility in their learning. First year students tend to worry when they miss lectures or part of lectures. Having the lecture as an on line resource allows students to miss a lecture without stressing about it and to be more relaxed in the lecture, knowing that anything they may miss will be available later. The resource: The Windows based program from Blueberry Software (not Blackberry!) - BB Flashback - allows the simultaneous recording of the computer screen together with the audio, as well as Webcam recording. Editing capabilities include adding pause buttons, graphics and text to the file before exporting it in a flash file. Any diagrams drawn on the board or shown via visualiser can be photographed and easily incorporated. The audio from the file can be extracted if required to be posted as podcast. Exporting modes other than Flash are also available, allowing vodcasting if you wish. What you will need: - the recording software: it can be installed on the lecture hall computer just prior to lecture if needed - a computer: either the ones in lecture halls, especially if fitted with audio recording, or a laptop (I have used audio recording via Bluetooth for mobility). Feedback from students has been positive and will be presented on the poster.
Resumo:
This thesis is a problematisation of the teaching of art to young children. To problematise a domain of social endeavour, is, in Michel Foucault's terms, to ask how we come to believe that "something ... can and must be thought" (Foucault, 1985:7). The aim is to document what counts (i.e., what is sayable, thinkable, feelable) as proper art teaching in Queensland at this point ofhistorical time. In this sense, the thesis is a departure from more recognisable research on 'more effective' teaching, including critical studies of art teaching and early childhood teaching. It treats 'good teaching' as an effect of moral training made possible through disciplinary discourses organised around certain epistemic rules at a particular place and time. There are four key tasks accomplished within the thesis. The first is to describe an event which is not easily resolved by means of orthodox theories or explanations, either liberal-humanist or critical ones. The second is to indicate how poststructuralist understandings of the self and social practice enable fresh engagements with uneasy pedagogical moments. What follows this discussion is the documentation of an empirical investigation that was made into texts generated by early childhood teachers, artists and parents about what constitutes 'good practice' in art teaching. Twenty-two participants produced text to tell and re-tell the meaning of 'proper' art education, from different subject positions. Rather than attempting to capture 'typical' representations of art education in the early years, a pool of 'exemplary' teachers, artists and parents were chosen, using "purposeful sampling", and from this pool, three videos were filmed and later discussed by the audience of participants. The fourth aspect of the thesis involves developing a means of analysing these texts in such a way as to allow a 're-description' of the field of art teaching by attempting to foreground the epistemic rules through which such teacher-generated texts come to count as true ie, as propriety in art pedagogy. This analysis drew on Donna Haraway's (1995) understanding of 'ironic' categorisation to hold the tensions within the propositions inside the categories of analysis rather than setting these up as discursive oppositions. The analysis is therefore ironic in the sense that Richard Rorty (1989) understands the term to apply to social scientific research. Three 'ironic' categories were argued to inform the discursive construction of 'proper' art teaching. It is argued that a teacher should (a) Teach without teaching; (b) Manufacture the natural; and (c) Train for creativity. These ironic categories work to undo modernist assumptions about theory/practice gaps and finding a 'balance' between oppositional binary terms. They were produced through a discourse theoretical reading of the texts generated by the participants in the study, texts that these same individuals use as a means of discipline and self-training as they work to teach properly. In arguing the usefulness of such approaches to empirical data analysis, the thesis challenges early childhood research in arts education, in relation to its capacity to deal with ambiguity and to acknowledge contradiction in the work of teachers and in their explanations for what they do. It works as a challenge at a range of levels - at the level of theorising, of method and of analysis. In opening up thinking about normalised categories, and questioning traditional Western philosophy and the grand narratives of early childhood art pedagogy, it makes a space for re-thinking art pedagogy as "a game oftruth and error" (Foucault, 1985). In doing so, it opens up a space for thinking how art education might be otherwise.