158 resultados para speech signals
Resumo:
Self-potential and spectral induced polarization responses associated with microbial processes involved in sulphate reduction have been monitored in a Perspex Winogradsky column filled with glass beads and growth medium. Salt-bridge is utilized as an electrolytic contact between experiment and control column. Equally spaced SP electrodes are used in combination of Ag-AgCl electrodes to compare electrodic and SP signals associated with the microbial processes involved in sulphate reduction. This study reveals that magnitude of SP varies from 5 to -2 mV and Electrodic potential 0 to -20 mV at the time of domination (day 39) of sulphate reducing bacteria which are very small in comparison to those measured by fixing both measuring and reference Ag-AgCl electrodes in experiment column. We observed that real and imaginary parts of complex conductivities increase with increase in production of H2S and CO in the experiment column. Both real and imaginary parts of surface complex conductivity vary at low frequencies similar to typical growth curve of bacterial population. Sodium lactate as a carbon source, dissolved in Lagan River water was flushed into the column for biostimulation on 144th day. The dissolved oxygen in flushed fluid might have killed the anaerobes in the column and decrease in complex conductivities similar to death phase of bacteria is observed for one week. The results obtained from this experiment should contribute to further understanding the biogeophysical responses involved in complex environments.
Read More: http://library.seg.org/doi/abs/10.1190/segj092009-001.57
Resumo:
Research has been undertaken to investigate the use of artificial neural network (ANN) techniques to improve the performance of a low bit-rate vector transform coder. Considerable improvements in the perceptual quality of the coded speech have been obtained. New ANN-based methods for vector quantiser (VQ) design and for the adaptive updating of VQ codebook are introduced for use in speech coding applications.
Resumo:
The subjective performance of the G. 722 7-kHz wideband speech coding recommendation using music signals is described. A number of audible distortions specific to music signals were found to be present in real-time evaluations of the coder. As a result, three modifications are proposed which are found to improve the performance for music signals. These modifications are compatible with the G. 722 system configuration. Modifications made to G. 722 to alleviate the most serious aspects of the noise modulation are described: (1) an adaptive bit allocation scheme is used to reduce short and long-term nonoptimality; (2) spectral noise shaping is incorporated, significantly enhancing the subjective performance of certain modes; and (3) backward block adaptive prediction is used.
Resumo:
We derive and employ a semiclassical Langevin equation obtained from path integrals to describe the ionic dynamics of a molecular junction in the presence of electrical current. The electronic environment serves as an effective nonequilibrium bath. The bath results in random forces describing Joule heating, current-induced forces including the nonconservative wind force, dissipative frictional forces, and an effective Lorentz-type force due to the Berry phase of the nonequilibrium electrons. Using a generic two-level molecular model, we highlight the importance of both current-induced forces and Joule heating for the stability of the system. We compare the impact of the different forces, and the wide-band approximation for the electronic structure on our result. We examine the current-induced instabilities (excitation of runaway "waterwheel" modes) and investigate the signature of these in the Raman signals.
Resumo:
There is considerable interest in creating embedded, speech recognition hardware using the weighted finite state transducer (WFST) technique but there are performance and memory usage challenges. Two system optimization techniques are presented to address this; one approach improves token propagation by removing the WFST epsilon input arcs; another one-pass, adaptive pruning algorithm gives a dramatic reduction in active nodes to be computed. Results for memory and bandwidth are given for a 5,000 word vocabulary giving a better practical performance than conventional WFST; this is then exploited in an adaptive pruning algorithm that reduces the active nodes from 30,000 down to 4,000 with only a 2 percent sacrifice in speech recognition accuracy; these optimizations lead to a more simplified design with deterministic performance.
Resumo:
This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Resumo:
Lipopolysaccharide (LPS) is a glycolipid present in the outer membrane of all Gram-negative bacteria, and it is one of the signature molecules recognized by the receptors of the innate immune system. In addition to its lipid A portion (the endotoxin), its O-chain polysaccharide (the O-antigen) plays a critical role in the bacterium-host interplay and, in a number of bacterial pathogens, it is a virulence factor. We present evidence that, in Yersinia enterocolitica serotype O:8, a complex signalling network regulates O-antigen expression in response to temperature. Northern blotting and reporter fusion analyses indicated that temperature regulates the O-antigen expression at the transcriptional level. Promoter cloning showed that the O-antigen gene cluster contains two transcriptional units under the control of promoters P(wb1) and P(wb2). The activity of both promoters is under temperature regulation and is repressed in bacteria grown at 37 degrees C. We demonstrate that the RosA/RosB efflux pump/potassium antiporter system and Wzz, the O-antigen chain length determinant, are indirectly involved in the regulation mainly affecting the activity of promoter P(wb2). The rosAB transcription, under the control of P(ros), is activated at 37 degrees C, and P(wb2) is repressed through the signals generated by the RosAB system activation, i.e. decreased [K+] and increased [H+]. The wzz transcription is under the control of P(wb2), and we show that, at 37 degrees C, overexpression of Wzz downregulates slightly the P(wb1) and P(wb2) activities and more strongly the P(ros) activity, with the net result that more O-antigen is produced. Finally, we demonstrate that overexpression of Wzz causes membrane stress that activates the CpxAR two-component signal transduction system.
Resumo:
Temporal dynamics and speaker characteristics are two important features of speech that distinguish speech from noise. In this paper, we propose a method to maximally extract these two features of speech for speech enhancement. We demonstrate that this can reduce the requirement for prior information about the noise, which can be difficult to estimate for fast-varying noise. Given noisy speech, the new approach estimates clean speech by recognizing long segments of the clean speech as whole units. In the recognition, clean speech sentences, taken from a speech corpus, are used as examples. Matching segments are identified between the noisy sentence and the corpus sentences. The estimate is formed by using the longest matching segments found in the corpus sentences. Longer speech segments as whole units contain more distinct dynamics and richer speaker characteristics, and can be identified more accurately from noise than shorter speech segments. Therefore, estimation based on the longest recognized segments increases the noise immunity and hence the estimation accuracy. The new approach consists of a statistical model to represent up to sentence-long temporal dynamics in the corpus speech, and an algorithm to identify the longest matching segments between the noisy sentence and the corpus sentences. The algorithm is made more robust to noise uncertainty by introducing missing-feature based noise compensation into the corpus sentences. Experiments have been conducted on the TIMIT database for speech enhancement from various types of nonstationary noise including song, music, and crosstalk speech. The new approach has shown improved performance over conventional enhancement algorithms in both objective and subjective evaluations.
Resumo:
This paper considers the separation and recognition of overlapped speech sentences assuming single-channel observation. A system based on a combination of several different techniques is proposed. The system uses a missing-feature approach for improving crosstalk/noise robustness, a Wiener filter for speech enhancement, hidden Markov models for speech reconstruction, and speaker-dependent/-independent modeling for speaker and speech recognition. We develop the system on the Speech Separation Challenge database, involving a task of separating and recognizing two mixing sentences without assuming advanced knowledge about the identity of the speakers nor about the signal-to-noise ratio. The paper is an extended version of a previous conference paper submitted for the challenge.
Resumo:
Three experiments measured the effects of age on informational masking of speech by competing speech. The experiments were designed to minimize the energetic contributions of the competing speech so that informational masking could be measured with no large corrections for energetic masking. Experiment 1 used a "speech-in-speech-in-noise" design, in which the competing speech was presented in noise at a signal-to-noise ratio (SNR) of -4 dB. This ensured that the noise primarily contributed the energetic masking but the competing speech contributed the informational masking. Equal amounts of informational masking (3 dB) were observed for young and elderly listeners, although less was found for hearing-impaired listeners. Experiment 2 tested a range of SNRs in this design and showed that informational masking increased with SNR up to about an SNR of -4 dB, but decreased thereafter. Experiment 3 further reduced the energetic contribution of the competing speech by filtering it into different frequency bands from the target speech. The elderly listeners again showed approximately the same amount of informational masking (4-5 dB), although some elderly listeners had particular difficulty understanding these stimuli in any condition. On the whole, these results suggest that young and elderly listeners were equally susceptible to informational masking. © 2009 Acoustical Society of America.
Resumo:
AIMS/HYPOTHESIS: Premature death of retinal pericytes is a pathophysiological hallmark of diabetic retinopathy. Among the mechanisms proposed for pericyte death is exposure to AGE, which accumulate during diabetes. The current study used an in vitro model, whereby retinal pericytes were exposed to AGE-modified substrate and the mechanisms underlying pericyte death explored. METHODS: Pericytes were isolated from bovine retinal capillaries and propagated on AGE-modified basement membrane (BM) extract or non-modified native BM. The extent of AGE modification was analysed. Proliferative responses of retinal pericytes propagated on AGE-modified BM were investigated using a 5-bromo-2-deoxy-uridine-based assay. The effect of extrinsically added platelet-derived growth factor (PDGF) isoforms on these proliferative responses was also analysed alongside mRNA expression of the PDGF receptors. Apoptotic death of retinal pericytes grown on AGE-modified BM was investigated using terminal deoxynucleotidyl transferase-mediated dUTP nick end-labelling labelling, mitochondrial membrane depolarisation and by morphological assessment. We also measured both the ability of PDGF to reverse Akt dephosphorylation that was mediated by AGE-modified BM, and increased pericyte apoptosis. RESULTS: Retinal pericytes exposed to AGE-modified BM showed reduced proliferative responses in comparison to controls (p
Resumo:
Many of the items in the “Speech, Spatial, and Qualities of Hearing” scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol.43, 85–99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.