188 resultados para Bit error rate
Resumo:
This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
Statistical dependence between classifier decisions is often shown to improve performance over statistically independent decisions. Though the solution for favourable dependence between two classifier decisions has been derived, the theoretical analysis for the general case of 'n' client and impostor decision fusion has not been presented before. This paper presents the expressions developed for favourable dependence of multi-instance and multi-sample fusion schemes that employ 'AND' and 'OR' rules. The expressions are experimentally evaluated by considering the proposed architecture for text-dependent speaker verification using HMM based digit dependent speaker models. The improvement in fusion performance is found to be higher when digit combinations with favourable client and impostor decisions are used for speaker verification. The total error rate of 20% for fusion of independent decisions is reduced to 2.1% for fusion of decisions that are favourable for both client and impostors. The expressions developed here are also applicable to other biometric modalities, such as finger prints and handwriting samples, for reliable identity verification.
Resumo:
This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.
Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach
Resumo:
In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.
Resumo:
Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.
Resumo:
Neutrophils serve as an intriguing model for the study of innate immune cellular activity induced by physiological stress. We measured changes in the transcriptome of circulating neutrophils following an experimental exercise trial (EXTRI) consisting of 1 h of intense cycling immediately followed by 1 h of intense running. Blood samples were taken at baseline, 3 h, 48 h, and 96 h post-EXTRI from eight healthy, endurance-trained, male subjects. RNA was extracted from isolated neutrophils. Differential gene expression was evaluated using Illumina microarrays and validated with quantitative PCR. Gene set enrichment analysis identified enriched molecular signatures chosen from the Molecular Signatures Database. Blood concentrations of muscle damage indexes, neutrophils, interleukin (IL)-6 and IL-10 were increased (P < 0.05) 3 h post-EXTRI. Upregulated groups of functionally related genes 3 h post-EXTRI included gene sets associated with the recognition of tissue damage, the IL-1 receptor, and Toll-like receptor (TLR) pathways (familywise error rate, P value < 0.05). The core enrichment for these pathways included TLRs, low-affinity immunoglobulin receptors, S100 calcium binding protein A12, and negative regulators of innate immunity, e.g., IL-1 receptor antagonist, and IL-1 receptor associated kinase-3. Plasma myoglobin changes correlated with neutrophil TLR4 gene expression (r = 0.74; P < 0.05). Neutrophils had returned to their nonactivated state 48 h post-EXTRI, indicating that their initial proinflammatory response was transient and rapidly counterregulated. This study provides novel insight into the signaling mechanisms underlying the neutrophil responses to endurance exercise, suggesting that their transcriptional activity was particularly induced by damage-associated molecule patterns, hypothetically originating from the leakage of muscle components into the circulation.
Resumo:
Iris based identity verification is highly reliable but it can also be subject to attacks. Pupil dilation or constriction stimulated by the application of drugs are examples of sample presentation security attacks which can lead to higher false rejection rates. Suspects on a watch list can potentially circumvent the iris based system using such methods. This paper investigates a new approach using multiple parts of the iris (instances) and multiple iris samples in a sequential decision fusion framework that can yield robust performance. Results are presented and compared with the standard full iris based approach for a number of iris degradations. An advantage of the proposed fusion scheme is that the trade-off between detection errors can be controlled by setting parameters such as the number of instances and the number of samples used in the system. The system can then be operated to match security threat levels. It is shown that for optimal values of these parameters, the fused system also has a lower total error rate.
Resumo:
In this paper we analyse the effects of highway traffic flow parameters like vehicle arrival rate and density on the performance of Amplify and Forward (AF) cooperative vehicular networks along a multi-lane highway under free flow state. We derive analytical expressions for connectivity performance and verify them with Monte-Carlo simulations. When AF cooperative relaying is employed together with Maximum Ratio Combining (MRC) at the receivers the average route error rate shows 10-20 fold improvement compared to direct communication. A 4-8 fold increase in maximum number of traversable hops can also be observed at different vehicle densities when AF cooperative communication is used to strengthen communication routes. However the theorical upper bound of maximum number of hops promises higher performance gains.
Resumo:
Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.
Resumo:
Re-programming of gene expression is fundamental for skeletal muscle adaptations in response to endurance exercise. This study investigated the time-course dependent changes in the muscular transcriptome following an endurance exercise trial consisting of 1 h of intense cycling immediately followed by 1 h of intense running. Skeletal muscle samples were taken at baseline, 3 h, 48 h, and 96 h post-exercise from eight healthy, endurance-trained, male individuals. RNA was extracted from muscle. Differential gene expression was evaluated using Illumina microarrays and validated with qPCR. Gene set enrichment analysis identified enriched molecular signatures chosen from the Molecular Signatures Database. Three h post-exercise, 102 gene sets were up-regulated [family wise error rate (FWER), P < 0.05]; including groups of genes related with leukocyte migration, immune and chaperone activation, and cyclic AMP responsive element binding protein (CREB) 1-signaling. Forty-eight h post-exercise, among 19 enriched gene sets (FWER, P < 0.05), two gene sets related to actin cytoskeleton remodeling were up-regulated. Ninety-six h post-exercise, 83 gene sets were enriched (FWER, P < 0.05), 80 of which were up-regulated; including gene groups related to chemokine signaling, cell stress management, and extracellular matrix remodeling. These data provide comprehensive insights into the molecular pathways involved in acute stress, recovery, and adaptive muscular responses to endurance exercise. The novel 96 h post-exercise transcriptome indicates substantial transcriptional activity, potentially associated with the prolonged presence of leukocytes in the muscles. This suggests that muscular recovery, from a transcriptional perspective, is incomplete 96 h after endurance exercise involving muscle damage.
Resumo:
Fusion techniques can be used in biometrics to achieve higher accuracy. When biometric systems are in operation and the threat level changes, controlling the trade-off between detection error rates can reduce the impact of an attack. In a fused system, varying a single threshold does not allow this to be achieved, but systematic adjustment of a set of parameters does. In this paper, fused decisions from a multi-part, multi-sample sequential architecture are investigated for that purpose in an iris recognition system. A specific implementation of the multi-part architecture is proposed and the effect of the number of parts and samples in the resultant detection error rate is analysed. The effectiveness of the proposed architecture is then evaluated under two specific cases of obfuscation attack: miosis and mydriasis. Results show that robustness to such obfuscation attacks is achieved, since lower error rates than in the case of the non-fused base system are obtained.
Resumo:
In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.
Resumo:
In this paper we present a novel scheme for improving speaker diarization by making use of repeating speakers across multiple recordings within a large corpus. We call this technique speaker re-diarization and demonstrate that it is possible to reuse the initial speaker-linked diarization outputs to boost diarization accuracy within individual recordings. We first propose and evaluate two novel re-diarization techniques. We demonstrate their complementary characteristics and fuse the two techniques to successfully conduct speaker re-diarization across the SAIVT-BNEWS corpus of Australian broadcast data. This corpus contains recurring speakers in various independent recordings that need to be linked across the dataset. We show that our speaker re-diarization approach can provide a relative improvement of 23% in diarization error rate (DER), over the original diarization results, as well as improve the estimated number of speakers and the cluster purity and coverage metrics.
Resumo:
We present a novel method for improving hierarchical speaker clustering in the tasks of speaker diarization and speaker linking. In hierarchical clustering, a tree can be formed that demonstrates various levels of clustering. We propose a ratio that expresses the impact of each cluster on the formation of this tree and use this to rescale cluster scores. This provides score normalisation based on the impact of each cluster. We use a state-of-the-art speaker diarization and linking system across the SAIVT-BNEWS corpus to show that our proposed impact ratio can provide a relative improvement of 16% in diarization error rate (DER).