819 resultados para Classification error rate
Resumo:
Parallel combinatory orthogonal frequency division multiplexing (PC-OFDM yields lower maximum peak-to-average power ratio (PAR), high bandwidth efficiency and lower bit error rate (BER) on Gaussian channels compared to OFDM systems. However, PC-OFDM does not improve the statistics of PAR significantly. In this chapter, the use of a set of fixed permutations to improve the statistics of the PAR of a PC-OFDM signal is presented. For this technique, interleavers are used to produce K-1 permuted sequences from the same information sequence. The sequence with the lowest PAR, among K sequences is chosen for the transmission. The PAR of a PC-OFDM signal can be further reduced by 3-4 dB by this technique. Mathematical expressions for the complementary cumulative density function (CCDF)of PAR of PC-OFDM signal and interleaved PC-OFDM signal are also presented.
Resumo:
In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.
Resumo:
This paper presents a method of voice activity detection (VAD) for high noise scenarios, using a noise robust voiced speech detection feature. The developed method is based on the fusion of two systems. The first system utilises the maximum peak of the normalised time-domain autocorrelation function (MaxPeak). The second zone system uses a novel combination of cross-correlation and zero-crossing rate of the normalised autocorrelation to approximate a measure of signal pitch and periodicity (CrossCorr) that is hypothesised to be noise robust. The score outputs by the two systems are then merged using weighted sum fusion to create the proposed autocorrelation zero-crossing rate (AZR) VAD. Accuracy of AZR was compared to state of the art and standardised VAD methods and was shown to outperform the best performing system with an average relative improvement of 24.8% in half-total error rate (HTER) on the QUT-NOISE-TIMIT database created using real recordings from high-noise environments.
Resumo:
This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.
Resumo:
This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
Commonwealth Scientific and Industrial Research Organization (CSIRO) has recently conducted a technology demonstration of a novel fixed wireless broadband access system in rural Australia. The system is based on multi user multiple-input multiple-output orthogonal frequency division multiplexing (MU-MIMO-OFDM). It demonstrated an uplink of six simultaneous users with distances ranging from 10 m to 8.5 km from a central tower, achieving 20 bits s/Hz spectrum efficiency. This paper reports on the analysis of channel capacity and bit error probability simulation based on the measured MUMIMO-OFDM channels obtained during the demonstration, and their comparison with the results based on channels simulated by a novel geometric optics based channel model suitable for MU-MIMO OFDM in rural areas. Despite its simplicity, the model was found to predict channel capacity and bit error rate probability accurately for a typical MU-MIMO-OFDM deployment scenario.
Resumo:
Statistical dependence between classifier decisions is often shown to improve performance over statistically independent decisions. Though the solution for favourable dependence between two classifier decisions has been derived, the theoretical analysis for the general case of 'n' client and impostor decision fusion has not been presented before. This paper presents the expressions developed for favourable dependence of multi-instance and multi-sample fusion schemes that employ 'AND' and 'OR' rules. The expressions are experimentally evaluated by considering the proposed architecture for text-dependent speaker verification using HMM based digit dependent speaker models. The improvement in fusion performance is found to be higher when digit combinations with favourable client and impostor decisions are used for speaker verification. The total error rate of 20% for fusion of independent decisions is reduced to 2.1% for fusion of decisions that are favourable for both client and impostors. The expressions developed here are also applicable to other biometric modalities, such as finger prints and handwriting samples, for reliable identity verification.
Resumo:
This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.
Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach
Resumo:
In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.
Resumo:
Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.
Resumo:
In cooperative communication systems, several wireless communication terminals collaborate to form a virtual-multiple antenna array system and exploit the spatial diversity to achieve a better performance. This thesis proposes a practical slotted protocol for cooperative communication systems with half-duplex single antennas. The performance of the proposed slotted cooperative communication protocol is evaluated in terms of the pairwise error probability and the bit error rate. The proposed protocol achieves the multiple-input single-output performance bound with a novel relay ordering and scheduling strategy.
Resumo:
Neutrophils serve as an intriguing model for the study of innate immune cellular activity induced by physiological stress. We measured changes in the transcriptome of circulating neutrophils following an experimental exercise trial (EXTRI) consisting of 1 h of intense cycling immediately followed by 1 h of intense running. Blood samples were taken at baseline, 3 h, 48 h, and 96 h post-EXTRI from eight healthy, endurance-trained, male subjects. RNA was extracted from isolated neutrophils. Differential gene expression was evaluated using Illumina microarrays and validated with quantitative PCR. Gene set enrichment analysis identified enriched molecular signatures chosen from the Molecular Signatures Database. Blood concentrations of muscle damage indexes, neutrophils, interleukin (IL)-6 and IL-10 were increased (P < 0.05) 3 h post-EXTRI. Upregulated groups of functionally related genes 3 h post-EXTRI included gene sets associated with the recognition of tissue damage, the IL-1 receptor, and Toll-like receptor (TLR) pathways (familywise error rate, P value < 0.05). The core enrichment for these pathways included TLRs, low-affinity immunoglobulin receptors, S100 calcium binding protein A12, and negative regulators of innate immunity, e.g., IL-1 receptor antagonist, and IL-1 receptor associated kinase-3. Plasma myoglobin changes correlated with neutrophil TLR4 gene expression (r = 0.74; P < 0.05). Neutrophils had returned to their nonactivated state 48 h post-EXTRI, indicating that their initial proinflammatory response was transient and rapidly counterregulated. This study provides novel insight into the signaling mechanisms underlying the neutrophil responses to endurance exercise, suggesting that their transcriptional activity was particularly induced by damage-associated molecule patterns, hypothetically originating from the leakage of muscle components into the circulation.
Resumo:
Iris based identity verification is highly reliable but it can also be subject to attacks. Pupil dilation or constriction stimulated by the application of drugs are examples of sample presentation security attacks which can lead to higher false rejection rates. Suspects on a watch list can potentially circumvent the iris based system using such methods. This paper investigates a new approach using multiple parts of the iris (instances) and multiple iris samples in a sequential decision fusion framework that can yield robust performance. Results are presented and compared with the standard full iris based approach for a number of iris degradations. An advantage of the proposed fusion scheme is that the trade-off between detection errors can be controlled by setting parameters such as the number of instances and the number of samples used in the system. The system can then be operated to match security threat levels. It is shown that for optimal values of these parameters, the fused system also has a lower total error rate.
Resumo:
In this paper we analyse the effects of highway traffic flow parameters like vehicle arrival rate and density on the performance of Amplify and Forward (AF) cooperative vehicular networks along a multi-lane highway under free flow state. We derive analytical expressions for connectivity performance and verify them with Monte-Carlo simulations. When AF cooperative relaying is employed together with Maximum Ratio Combining (MRC) at the receivers the average route error rate shows 10-20 fold improvement compared to direct communication. A 4-8 fold increase in maximum number of traversable hops can also be observed at different vehicle densities when AF cooperative communication is used to strengthen communication routes. However the theorical upper bound of maximum number of hops promises higher performance gains.