986 resultados para speech signals
Resumo:
An approach to pattern recognition using invariant parameters based on higher-order spectra is presented. In particular, bispectral invariants are used to classify one-dimensional shapes. The bispectrum, which is translation invariant, is integrated along straight lines passing through the origin in bifrequency space. The phase of the integrated bispectrum is shown to be scale- and amplification-invariant. A minimal set of these invariants is selected as the feature vector for pattern classification. Pattern recognition using higher-order spectral invariants is fast, suited for parallel implementation, and works for signals corrupted by Gaussian noise. The classification technique is shown to distinguish two similar but different bolts given their one-dimensional profiles
Resumo:
Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.
Resumo:
The performance of automatic speech recognition systems deteriorates in the presence of noise. One known solution is to incorporate video information with an existing acoustic speech recognition system. We investigate the performance of the individual acoustic and visual sub-systems and then examine different ways in which the integration of the two systems may be performed. The system is to be implemented in real time on a Texas Instruments' TMS320C80 DSP.
Resumo:
This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification
Resumo:
Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.
Resumo:
In spite of significant research in the development of efficient algorithms for three carrier ambiguity resolution, full performance potential of the additional frequency signals cannot be demonstrated effectively without actual triple frequency data. In addition, all the proposed algorithms showed their difficulties in reliable resolution of the medium-lane and narrow-lane ambiguities in different long-range scenarios. In this contribution, we will investigate the effects of various distance-dependent biases, identifying the tropospheric delay to be the key limitation for long-range three carrier ambiguity resolution. In order to achieve reliable ambiguity resolution in regional networks with the inter-station distances of hundreds of kilometers, a new geometry-free and ionosphere-free model is proposed to fix the integer ambiguities of the medium-lane or narrow-lane observables over just several minutes without distance constraint. Finally, the semi-simulation method is introduced to generate the third frequency signals from dual-frequency GPS data and experimentally demonstrate the research findings of this paper.
Resumo:
This paper presents techniques which can lead to diagnosis of faults in a small size multi-cylinder diesel engine. Preliminary analysis of the acoustic emission (AE) signals is outline, including time-frequency analysis and selection of optimum frequency band.The results of applying mean field independent component analysis (MFICA) to separate the AE root mean square (RMS) signals and the effects of changing parameter values are also outlined. The results on separation of RMS signals show thsi technique has the potential of increasing the probability to successfully identify the AE events associated with the various mechanical events within the combustion process of multi-cylinder diesel engines.
Resumo:
Sequence data often have competing signals that are detected by network programs or Lento plots. Such data can be formed by generating sequences on more than one tree, and combining the results, a mixture model. We report that with such mixture models, the estimates of edge (branch) lengths from maximum likelihood (ML) methods that assume a single tree are biased. Based on the observed number of competing signals in real data, such a bias of ML is expected to occur frequently. Because network methods can recover competing signals more accurately, there is a need for ML methods allowing a network. A fundamental problem is that mixture models can have more parameters than can be recovered from the data, so that some mixtures are not, in principle, identifiable. We recommend that network programs be incorporated into best practice analysis, along with ML and Bayesian trees.
Resumo:
This paper discusses commonly encountered diesel engine problems and the underlying combustion related faults. Also discussed are the methods used in previous studies to simulate diesel engine faults and the initial results of an experimental simulation of a common combustion related diesel engine fault, namely diesel engine misfire. This experimental fault simulation represents the first step towards a comprehensive investigation and analysis into the characteristics of acoustic emission signals arising from combustion related diesel engine faults. Data corresponding to different engine running conditions was captured using in-cylinder pressure, vibration and acoustic emission transducers along with both crank-angle encoder and top-dead centre signals. Using these signals, it was possible to characterise the diesel engine in-cylinder pressure profiles and the effect of different combustion conditions on both vibration and acoustic emission signals.
Resumo:
Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.
Resumo:
Acoustic emission (AE) is the phenomenon where stress waves are generated due to rapid release of energy within a material caused by sources such as crack initiation or growth. AE technique involves recording the stress waves by means of sensors and subsequent analysis of the recorded signals to gather information about the nature of the source. Though AE technique is one of the popular non destructive evaluation (NDE) techniques for structural health monitoring of mechanical, aerospace and civil structures; several challenges still exist in successful application of this technique. Presence of spurious noise signals can mask genuine damage‐related AE signals; hence a major challenge identified is finding ways to discriminate signals from different sources. Analysis of parameters of recorded AE signals, comparison of amplitudes of AE wave modes and investigation of uniqueness of recorded AE signals have been mentioned as possible criteria for source differentiation. This paper reviews common approaches currently in use for source discrimination, particularly focusing on structural health monitoring of civil engineering structural components such as beams; and further investigates the applications of some of these methods by analyzing AE data from laboratory tests.