32 resultados para Subtraction
em Queensland University of Technology - ePrints Archive
Resumo:
In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.
Resumo:
Early-number is a rich fabric of interconnected ideas that is often misunderstood and thus taught in ways that do not lead to rich understanding. In this presentation, a visual language is used to describe the organisation of this domain of knowledge. This visual language is based upon Piaget’s notion of reflective abstraction (Dubinsky, 1991; Piaget, 1977/2001), and thus captures the epistemological associations that link the problems, concepts and representations of the domain. The constructs of this visual language are introduced and then applied to the early-number domain. The introduction to this visual language may prompt reflection upon its suitability and significance to the description of other domains of knowledge. Through such a process of analysis and description, the visual language may serve as a scaffold for enhancing pedagogical content knowledge and thus ultimately improve learning outcomes.
Resumo:
Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating non-critical in-car systems. Likelihood-maximising (LIMA) frameworks optimise speech enhancement algorithms based on recognised state sequences rather than traditional signal-level criteria such as maximising signal-to-noise ratio. Previously presented LIMA frameworks require calibration utterances to generate optimised enhancement parameters which are used for all subsequent utterances. Sub-optimal recognition performance occurs in noise conditions which are significantly different from that present during the calibration session - a serious problem in rapidly changing noise environments. We propose a dialog-based design which allows regular optimisation iterations in order to track the changing noise conditions. Experiments using Mel-filterbank spectral subtraction are performed to determine the optimisation requirements for vehicular environments and show that minimal optimisation assists real-time operation with improved speech recognition accuracy. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session.
Resumo:
Kindergartens in China offer structured full-day programs for children aged 3-6. Although formal schooling does not commence until age 7, the mathematics program in kindergartens is specifically focused on developing young children’s facility with simple addition and subtraction. This study explored young Chinese children’s strategies for solving basic addition facts as well as their intuitive understanding of addition via interview methods. Results indicate a strong impact that teacher-directed teaching methods have on young children’s cognitions in relation to addition.
Resumo:
Acquiring accurate silhouettes has many applications in computer vision. This is usually done through motion detection, or a simple background subtraction under highly controlled environments (i.e. chroma-key backgrounds). Lighting and contrast issues in typical outdoor or office environments make accurate segmentation very difficult in these scenes. In this paper, gradients are used in conjunction with intensity and colour to provide a robust segmentation of motion, after which graph cuts are utilised to refine the segmentation. The results presented using the ETISEO database demonstrate that an improved segmentation is achieved through the combined use of motion detection and graph cuts, particularly in complex scenes.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.
Mental computation : the identification of associated cognitive, metacognitive and affective factors
Resumo:
In this study, the feasibility of difference imaging for improving the contrast of electronic portal imaging device (EPID) images is investigated. The difference imaging technique consists of the acquisition of two EPID images (with and without the placement of an additional layer of attenuating medium on the surface of the EPID)and the subtraction of one of these images from the other. The resulting difference image shows improved contrast, compared to a standard EPID image, since it is generated by lower-energy photons. Results of this study show that, ¯rstly, this method can produce images exhibiting greater contrast than is seen in standard megavoltage EPID images and that, secondly, the optimal thickness of attenuating material for producing a maximum contrast enhancement may vary with phantom thickness and composition. Further studies of the possibilities and limitations of the di®erence imaging technique, and the physics behind it, are therefore recommended.
Resumo:
Voice recognition is one of the key enablers to reduce driver distraction as in-vehicle systems become more and more complex. With the integration of voice recognition in vehicles, safety and usability are improved as the driver’s eyes and hands are not required to operate system controls. Whilst speaker independent voice recognition is well developed, performance in high noise environments (e.g. vehicles) is still limited. La Trobe University and Queensland University of Technology have developed a low-cost hardware-based speech enhancement system for automotive environments based on spectral subtraction and delay–sum beamforming techniques. The enhancement algorithms have been optimised using authentic Australian English collected under typical driving conditions. Performance tests conducted using speech data collected under variety of vehicle noise conditions demonstrate a word recognition rate improvement in the order of 10% or more under the noisiest conditions. Currently developed to a proof of concept stage there is potential for even greater performance improvement.
Resumo:
Mathematics education literature has called for an abandonment of ontological and epistemological ideologies that have often divided theory-based practice. Instead, a consilience of theories has been sought which would leverage the strengths of each learning theory and so positively impact upon contemporary educational practice. This research activity is based upon Popper’s notion of three knowledge worlds which differentiates the knowledge shared in a community from the personal knowledge of the individual, and Bereiter’s characterisation of understanding as the individual’s relationship to tool-like knowledge. Using these notions, a re-conceptualisation of knowledge and understanding and a subsequent re-consideration of learning theories are proposed as a way to address the challenge set by literature. Referred to as the alternative theoretical framework, the proposed theory accounts for the scaffolded transformation of each individual’s unique understanding, whilst acknowledging the existence of a body of domain knowledge shared amongst participants in a scientific community of practice. The alternative theoretical framework is embodied within an operational model that is accompanied by a visual nomenclature with which to describe consensually developed shared knowledge and personal understanding. This research activity has sought to iteratively evaluate this proposed theory through the practical application of the operational model and visual nomenclature to the domain of early-number counting, addition and subtraction. This domain of mathematical knowledge has been comprehensively analysed and described. Through this process, the viability of the proposed theory as a tool with which to discuss and thus improve the knowledge and understanding with the domain of mathematics has been validated. Putting of the proposed theory into practice has lead to the theory’s refinement and the subsequent achievement of a solid theoretical base for the future development of educational tools to support teaching and learning practice, including computer-mediated learning environments. Such future activity, using the proposed theory, will advance contemporary mathematics educational practice by bringing together the strengths of cognitivist, constructivist and post-constructivist learning theories.