940 resultados para noisy speaker verification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is a continuation of the paper titled “Concurrent multi-scale modeling of civil infrastructure for analyses on structural deteriorating—Part I: Modeling methodology and strategy” with the emphasis on model updating and verification for the developed concurrent multi-scale model. The sensitivity-based parameter updating method was applied and some important issues such as selection of reference data and model parameters, and model updating procedures on the multi-scale model were investigated based on the sensitivity analysis of the selected model parameters. The experimental modal data as well as static response in terms of component nominal stresses and hot-spot stresses at the concerned locations were used for dynamic response- and static response-oriented model updating, respectively. The updated multi-scale model was further verified to act as the baseline model which is assumed to be finite-element model closest to the real situation of the structure available for the subsequent arbitrary numerical simulation. The comparison of dynamic and static responses between the calculated results by the final model and measured data indicated the updating and verification methods applied in this paper are reliable and accurate for the multi-scale model of frame-like structure. The general procedures of multi-scale model updating and verification were finally proposed for nonlinear physical-based modeling of large civil infrastructure, and it was applied to the model verification of a long-span bridge as an actual engineering practice of the proposed procedures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tzeng et al. proposed a new threshold multi-proxy multi-signature scheme with threshold verification. In their scheme, a subset of original signers authenticates a designated proxy group to sign on behalf of the original group. A message m has to be signed by a subset of proxy signers who can represent the proxy group. Then, the proxy signature is sent to the verifier group. A subset of verifiers in the verifier group can also represent the group to authenticate the proxy signature. Subsequently, there are two improved schemes to eliminate the security leak of Tzeng et al.’s scheme. In this paper, we have pointed out the security leakage of the three schemes and further proposed a novel threshold multi-proxy multi-signature scheme with threshold verification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of the PC and Internet for placing telephone calls will present new opportunities to capture vast amounts of un-transcribed speech for a particular speaker. This paper investigates how to best exploit this data for speaker-dependent speech recognition. Supervised and unsupervised experiments in acoustic model and language model adaptation are presented. Using one hour of automatically transcribed speech per speaker with a word error rate of 36.0%, unsupervised adaptation resulted in an absolute gain of 6.3%, equivalent to 70% of the gain from the supervised case, with additional adaptation data likely to yield further improvements. LM adaptation experiments suggested that although there seems to be a small degree of speaker idiolect, adaptation to the speaker alone, without considering the topic of the conversation, is in itself unlikely to improve transcription accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The term self-selected (i.e., individual or comfortable walking pace or speed) is commonly used in the literature (Frost, Dowling, Bar-Or, & Dyson, 1997; Jeng, Liao, Lai, & Hou, 1997; Wergel-Kolmert & Wohlfart, 1999; Maltais, Bar-Or, Pienynowski, & Galea, 2003; Browning & Kram, 2005; Browning, Baker, Herron, & Kram, 2006; Hills, Byrne, Wearing, & Armstrong, 2006) and is identified as the most efficient walking speed, with increased efficiency defined by lower oxygen uptake (VO^sub 2^) per unit mechanical work (Hoyt & Taylor, 1981; Taylor, Heglund, & Maloiy, 1982; Hreljac, 1993). [...] assessing individual and group differences in metabolic energy expenditure using oxygen uptake requires individuals to be comfortable with, and able to accommodate to, the equipment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Privacy enhancing protocols (PEPs) are a family of protocols that allow secure exchange and management of sensitive user information. They are important in preserving users’ privacy in today’s open environment. Proof of the correctness of PEPs is necessary before they can be deployed. However, the traditional provable security approach, though well established for verifying cryptographic primitives, is not applicable to PEPs. We apply the formal method of Coloured Petri Nets (CPNs) to construct an executable specification of a representative PEP, namely the Private Information Escrow Bound to Multiple Conditions Protocol (PIEMCP). Formal semantics of the CPN specification allow us to reason about various security properties of PIEMCP using state space analysis techniques. This investigation provides us with preliminary insights for modeling and verification of PEPs in general, demonstrating the benefit of applying the CPN-based formal approach to proving the correctness of PEPs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor to replace the Bayesian Information Criterion (BIC) as a criterion for speaker clustering within a speaker diarization system. The BIC is one of the most popular decision criteria used in speaker diarization systems today. However, it will be shown in this paper that the BIC is only an approximation to the Bayes factor of marginal likelihoods of the data given each hypothesis. This paper uses the Bayes factor directly as a decision criterion for speaker clustering, thus removing the error introduced by the BIC approximation. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, leading to a 14.7% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For several reasons, the Fourier phase domain is less favored than the magnitude domain in signal processing and modeling of speech. To correctly analyze the phase, several factors must be considered and compensated, including the effect of the step size, windowing function and other processing parameters. Building on a review of these factors, this paper investigates a spectral representation based on the Instantaneous Frequency Deviation, but in which the step size between processing frames is used in calculating phase changes, rather than the traditional single sample interval. Reflecting these longer intervals, the term delta-phase spectrum is used to distinguish this from instantaneous derivatives. Experiments show that mel-frequency cepstral coefficients features derived from the delta-phase spectrum (termed Mel-Frequency delta-phase features) can produce broadly similar performance to equivalent magnitude domain features for both voice activity detection and speaker recognition tasks. Further, it is shown that the fusion of the magnitude and phase representations yields performance benefits over either in isolation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.