88 resultados para Speech in Noise
Resumo:
Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification
Resumo:
Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.
Resumo:
This chapter focuses on the interactions and roles between delays and intrinsic noise effects within cellular pathways and regulatory networks. We address these aspects by focusing on genetic regulatory networks that share a common network motif, namely the negative feedback loop, leading to oscillatory gene expression and protein levels. In this context, we discuss computational simulation algorithms for addressing the interplay of delays and noise within the signaling pathways based on biological data. We address implementational issues associated with efficiency and robustness. In a molecular biology setting we present two case studies of temporal models for the Hes1 gene (Monk, 2003; Hirata et al., 2002), known to act as a molecular clock, and the Her1/Her7 regulatory system controlling the periodic somite segmentation in vertebrate embryos (Giudicelli and Lewis, 2004; Horikawa et al., 2006).
Resumo:
Delays are an important feature in temporal models of genetic regulation due to slow biochemical processes, such as transcription and translation. In this paper, we show how to model intrinsic noise effects in a delayed setting by either using a delay stochastic simulation algorithm (DSSA) or, for larger and more complex systems, a generalized Binomial τ-leap method (Bτ-DSSA). As a particular application, we apply these ideas to modeling somite segmentation in zebra fish across a number of cells in which two linked oscillatory genes (her1 and her7) are synchronized via Notch signaling between the cells.
Resumo:
The phase of an analytic signal constructed from the autocorrelation function of a signal contains significant information about the shape of the signal. Using Bedrosian's (1963) theorem for the Hilbert transform it is proved that this phase is robust to multiplicative noise if the signal is baseband and the spectra of the signal and the noise do not overlap. Higher-order spectral features are interpreted in this context and shown to extract nonlinear phase information while retaining robustness. The significance of the result is that prior knowledge of the spectra is not required.
Resumo:
This paper presents results on the robustness of higher-order spectral features to Gaussian, Rayleigh, and uniform distributed noise. Based on cluster plots and accuracy results for various signal to noise conditions, the higher-order spectral features are shown to be better than moment invariant features.
Resumo:
Vernier acuity, a form of visual hyperacuity, is amongst the most precise forms of spatial vision. Under optimal conditions Vernier thresholds are much finer than the inter-photoreceptor distance. Achievement of such high precision is based substantially on cortical computations, most likely in the primary visual cortex. Using stimuli with added positional noise, we show that Vernier processing is reduced with advancing age across a wide range of noise levels. Using an ideal observer model, we are able to characterize the mechanisms underlying age-related loss, and show that the reduction in Vernier acuity can be mainly attributed to the reduction in efficiency of sampling, with no significant change in the level of internal position noise, or spatial distortion, in the visual system.
Resumo:
Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.
Resumo:
As one of the measures for decreasing road traffic noise in a city, the control of the traffic flow and the physical distribution is considered. To conduct the measure effectively, the model for predicting the traffic flow in the citywide road network is necessary. In this study, the existing model named AVENUE was used as a traffic flow prediction model. The traffic flow model was integrated with the road vehicles' sound power model and the sound propagation model, and the new road traffic noise prediction model was established. As a case study, the prediction model was applied to the road network of Tsukuba city in Japan and the noise map of the city was made. To examine the calculation accuracy of the noise map, the calculated values of the noise at the main roads were compared with the measured values. As a result, it was found that there was a possibility that the high accuracy noise map of the city could be made by using the noise prediction model developed in this study.
Resumo:
In many applications of active noise control (ANC), an online secondary path modelling method using a white noise as a training signal is required to ensure convergence of the system. The modelling accuracy and the convergence rate increase when a white noise with larger variance is used, however larger the variance increases the residual noise, which decreases performance of the system. The proposed algorithm uses the advantages of the white noise with larger variance to model the secondary path, but the injection is stopped at the optimum point to increase performance of the system. In this approach, instead of continuous injection of the white noise, a sudden change in secondary path during the operation makes the algorithm to reactivate injection of the white noise to adjust the secondary path estimation. Comparative simulation results shown in this paper indicate effectiveness of the proposed method.
Resumo:
Urban road traffic noise in cities is an ongoing and increasing problem across much of the world. Consequently a large amount of effort is expended in attempts to address this problem, especially in the area of acoustic design of buildings. Acoustic design policies developed by government authorities will typically focus on required transport noise reductions through a building façade to meet a specified internal noise levels. The significance of balcony acoustic treatments has been highlighted in recent decades yet this area has potentially been considered less important than the need for acoustic isolation of building facades. This paper outlines recent research that has been conducted in determining the significance of balcony acoustic treatments in mitigating urban road traffic noise. It summarizes recent literature, some of which focuses on technological advances in the knowledge of balcony acoustic design and some literature discusses the overall aims and benefits of balcony acoustic design. The aim of this paper is to promote the use of balcony acoustic design as a significant element in the overall solution towards mitigating road traffic noise in modern cities.
Resumo:
Background Recent initiatives within an Australia public healthcare service have seen a focus on increasing the research capacity of their workforce. One of the key initiatives involves encouraging clinicians to be research generators rather than solely research consumers. As a result, baseline data of current research capacity are essential to determine whether initiatives encouraging clinicians to undertake research have been effective. Speech pathologists have previously been shown to be interested in conducting research within their clinical role; therefore they are well positioned to benefit from such initiatives. The present study examined the current research interest, confidence and experience of speech language pathologists (SLPs) in a public healthcare workforce, as well as factors that predicted clinician research engagement. Methods Data were collected via an online survey emailed to an estimated 330 SLPs working within Queensland, Australia. The survey consisted of 30 questions relating to current levels of interest, confidence and experience performing specific research tasks, as well as how frequently SLPs had performed these tasks in the last 5 years. Results Although 158 SLPs responded to the survey, complete data were available for only 137. Respondents were more confident and experienced with basic research tasks (e.g., finding literature) and less confident and experienced with complex research tasks (e.g., analysing and interpreting results, publishing results). For most tasks, SLPs displayed higher levels of interest in the task than confidence and experience. Research engagement was predicted by highest qualification obtained, current job classification level and overall interest in research. Conclusions Respondents generally reported levels of interest in research higher than their confidence and experience, with many respondents reporting limited experience in most research tasks. Therefore SLPs have potential to benefit from research capacity building activities to increase their research skills in order to meet organisational research engagement objectives. However, these findings must be interpreted with the caveats that a relatively low response rate occurred and participants were recruited from a single state-wide health service, and therefore may not be representative of the wider SLP workforce.
Resumo:
This thesis studied the source of instability in optical phase modulators used in high accuracy laser measurement systems. The nonlinear origin of the amplitude noise helped further reducing this instability in applications that rely on phase modulators to function. This outcome will have positive impacts on the development of new methods in the amplitude noise suppression.
Resumo:
Our results demonstrate that photorefractive residual amplitude modulation (RAM) noise in electro-optic modulators (EOMs) can be reduced by modifying the incident beam intensity distribution. Here we report an order of magnitude reduction in RAM when beams with uniform intensity (flat-top) profiles, generated with an LCOS-SLM, are used instead of the usual fundamental Gaussian mode (TEM00). RAM arises from the photorefractive amplified scatter noise off the defects and impurities within the crystal. A reduction in RAM is observed with increasing intensity uniformity (flatness), which is attributed to a reduction in space charge field on the beam axis. The level of RAM reduction that can be achieved is physically limited by clipping at EOM apertures, with the observed results agreeing well with a simple model. These results are particularly important in applications where the reduction of residual amplitude modulation to 10^-6 is essential.