204 resultados para Janet Cardiff
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but such approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks on the other hand, optimise the parameters of speech enhancement algorithms based on state sequences generated by a speech recogniser for utterances of known transcriptions. Previous applications of LIMA frameworks have generated a set of global enhancement parameters for all model states without taking in account the distribution of model occurrence, making optimisation susceptible to favouring frequently occurring models, in particular silence. In this paper, we demonstrate the existence of highly disproportionate phonetic distributions on two corpora with distinct speech tasks, and propose to normalise the influence of each phone based on a priori occurrence probabilities. Likelihood analysis and speech recognition experiments verify this approach for improving ASR performance in noisy environments.
The impact of the educational setting on the aesthetic dimension : a study of three drama classrooms
Resumo:
To successfully navigate their habitats, many mammals use a combination of two mechanisms, path integration and calibration using landmarks, which together enable them to estimate their location and orientation, or pose. In large natural environments, both these mechanisms are characterized by uncertainty: the path integration process is subject to the accumulation of error, while landmark calibration is limited by perceptual ambiguity. It remains unclear how animals form coherent spatial representations in the presence of such uncertainty. Navigation research using robots has determined that uncertainty can be effectively addressed by maintaining multiple probabilistic estimates of a robot's pose. Here we show how conjunctive grid cells in dorsocaudal medial entorhinal cortex (dMEC) may maintain multiple estimates of pose using a brain-based robot navigation system known as RatSLAM. Based both on rodent spatially-responsive cells and functional engineering principles, the cells at the core of the RatSLAM computational model have similar characteristics to rodent grid cells, which we demonstrate by replicating the seminal Moser experiments. We apply the RatSLAM model to a new experimental paradigm designed to examine the responses of a robot or animal in the presence of perceptual ambiguity. Our computational approach enables us to observe short-term population coding of multiple location hypotheses, a phenomenon which would not be easily observable in rodent recordings. We present behavioral and neural evidence demonstrating that the conjunctive grid cells maintain and propagate multiple estimates of pose, enabling the correct pose estimate to be resolved over time even without uniquely identifying cues. While recent research has focused on the grid-like firing characteristics, accuracy and representational capacity of grid cells, our results identify a possible critical and unique role for conjunctive grid cells in filtering sensory uncertainty. We anticipate our study to be a starting point for animal experiments that test navigation in perceptually ambiguous environments.
Resumo:
Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.
Resumo:
This paper presents a new rat animat, a rat-sized bio-inspired robot platform currently being developed for embodied cognition and neuroscience research. The rodent animat is 150mm x 80mm x 70mm and has a different drive, visual, proximity, and odometry sensors, x86 PC, and LCD interface. The rat animat has a bio-inspired rodent navigation and mapping system called RatSLAM which demonstrates the capabilities of the platform and framework. A case study is presented of the robot's ability to learn the spatial layout of a figure of eight laboratory environment, including its ability to close physical loops based on visual input and odometry. A firing field plot similar to rodent 'non-conjunctive grid cells' is shown by plotting the activity of an internal network. Having a rodent animat the size of a real rat allows exploration of embodiment issues such as how the robot's sensori-motor systems and cognitive abilities interact. The initial observations concern the limitations of the deisgn as well as its strengths. For example, the visual sensor has a narrower field of view and is located much closer to the ground than for other robots in the lab, which alters the salience of visual cues and the effectiveness of different visual filtering techniques. The small size of the robot relative to corridors and open areas impacts on the possible trajectories of the robot. These perspective and size issues affect the formation and use of the cognitive map, and hence the navigation abilities of the rat animat.
Resumo:
Objective: This study investigated: (i) the prevalence of ureaplasmas in semen and washed semen and (ii) the effect of ureaplasmas on semen andrology parameters. Design: Prospective study. Setting: IVF unit -private hospital, Brisbane, Australia. Patient(s): Three hundred and forty three men participating in an assisted reproductive technology (ART) treatment cycle. Intervention(s): Semen and washed semen tested by culture, PCR assays and indirect immunofluorescent antibody assays. Statistical differences were determined by a t-test, Wilcoxon or Pearson’s Chi- square test where appropriate. Main Outcome Measure(s): The prevalence of ureaplasmas in semen and washed semen and the effect of these microorganisms on semen andrology parameters. Result(s): Ureaplasmas were detected in 73/343 (22%) semen samples and 29/343 (8.5%) washed semen samples. Ureaplasmas adherent to the surface of spermatozoa were demonstrated by indirect immunofluorescent antibody testing. U. parvum serovar 6 (36.6%) and U. urealyticum (30%) were the most prevalent isolates in washed semen. A comparison of the semen andrology parameters of washed semen ureaplasma positive and negative groups demonstrated a lower proportion of non-motile sperm in the washed semen ureaplasma positive group. Conclusion(s): Ureaplasmas are not always removed from semen by a standard ART washing procedure and can remain adherent to the surface of spermatozoa.
Resumo:
This chapter considers the complex literate repertoires of 21st century children in multicultural primary classrooms in Adelaide South Australia. It draws on the curricular and pedagogical work of two experienced primary school teachers who explore culture, race and class, by positioning children as textual producers across a variety of media. In particular we discuss two child-authored texts – A is for Arndale – a local alphabet book co-authored by children aged between eight and ten, and – Cooking Afghani Style - a magazine style film produced by a multi-aged class of children (aged eight to thirteen) recently arrived in Australia. In the process of making these texts, primary children engaged in reading as a cultural practice – re-reading and re-writing their neighbourhoods and identities (both individual and collective). This involved frequent excursions to local key sites, both familiar and unfamiliar to the children. They investigated how diverse children experienced and lived their lives in particular places within changing communities.
Resumo:
Traditional approaches to the use of machine learning algorithms do not provide a method to learn multiple tasks in one-shot on an embodied robot. It is proposed that grounding actions within the sensory space leads to the development of action-state relationships which can be re-used despite a change in task. A novel approach called an Experience Network is developed and assessed on a real-world robot required to perform three separate tasks. After grounded representations were developed in the initial task, only minimal further learning was required to perform the second and third task.