110 resultados para Likelihood Ratio
em Queensland University of Technology - ePrints Archive
Resumo:
This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
This paper presents a novel technique for segmenting an audio stream into homogeneous regions according to speaker identities, background noise, music, environmental and channel conditions. Audio segmentation is useful in audio diarization systems, which aim to annotate an input audio stream with information that attributes temporal regions of the audio into their specific sources. The segmentation method introduced in this paper is performed using the Generalized Likelihood Ratio (GLR), computed between two adjacent sliding windows over preprocessed speech. This approach is inspired by the popular segmentation method proposed by the pioneering work of Chen and Gopalakrishnan, using the Bayesian Information Criterion (BIC) with an expanding search window. This paper will aim to identify and address the shortcomings associated with such an approach. The result obtained by the proposed segmentation strategy is evaluated on the 2002 Rich Transcription (RT-02) Evaluation dataset, and a miss rate of 19.47% and a false alarm rate of 16.94% is achieved at the optimal threshold.
Resumo:
This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
Hazard perception in driving is the one of the few driving-specific skills associated with crash involvement. However, this relationship has only been examined in studies where the majority of individuals were younger than 65. We present the first data revealing an association between hazard perception and self-reported crash involvement in drivers aged 65 and over. In a sample of 271 drivers, we found that individuals whose mean response time to traffic hazards was slower than 6.68 seconds (the ROC-curve derived pass mark for the test) were 2.32 times (95% CI 1.46, 3.22) more likely to have been involved in a self-reported crash within the previous five years than those with faster response times. This likelihood ratio became 2.37 (95% CI 1.49, 3.28) when driving exposure was controlled for. As a comparison, individuals who failed a test of useful field of view were 2.70 (95% CI 1.44, 4.44) times more likely to crash than those who passed. The hazard perception test and the useful field of view measure accounted for separate variance in crash involvement. These findings indicate that hazard perception testing and training could be potentially useful for road safety interventions for this age group.
Resumo:
This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.
Resumo:
In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.
Resumo:
The progesterone receptor (PR) is a candidate gene for the development of endometriosis, a complex disease with strong hormonal features, common in women of reproductive age. We typed the 306 base pair Alu insertion (AluIns) polymorphism in intron G of PR in 101 individuals, estimated linkage disequilibrium (LD) between five single-nucleotide polymorphisms (SNPs) across the PR locus in 980 Australian triads (endometriosis case and two parents) and used transmission disequilibrium testing (TDT) for association with endometriosis. The five SNPs showed strong pairwise LD, and the AluIns was highly correlated with proximal SNPs rs1042839 (Δ2 = 0.877, D9 = 1.00, P < 0.0001) and rs500760 (Δ2 = 0.438, D9 = 0.942, P < 0.0001). TDT showed weak evidence of allelic association between endometriosis and rs500760 (P = 0.027) but not in the expected direction. We identified a common susceptibility haplotype GGGCA across the five SNPs (P = 0.0167) in the whole sample, but likelihood ratio testing of haplotype transmission and non-transmission of the AluIns and flanking SNPs showed no significant pattern. Further, analysis of our results pooled with those from two previous studies suggested that neither the T2 allele of the AluIns nor the T1/T2 genotype was associated with endometriosis.
Resumo:
Background: Queensland men aged 50 years and older are at high risk for melanoma. Early detection via skin self examination (SSE) (particularly whole-body SSE) followed by presentation to a doctor with suspicious lesions, may decrease morbidity and mortality from melanoma. Prevalence of whole-body SSE (wbSSE) is lower in Queensland older men compared to other population subgroups. With the exception of the present study no previous research has investigated the determinants of wbSSE in older men, or interventions to increase the behaviour in this population. Furthermore, although past SSE intervention studies for other populations have cited health behaviour models in the development of interventions, no study has tested these models in full. The Skin Awareness Study: A recent randomised trial, called the Skin Awareness Study, tested the impact of a video-delivered intervention compared to written materials alone on wbSSE in men aged 50 years or older (n=930). Men were recruited from the general population and interviewed over the telephone at baseline and 13 months. The proportion of men who reported wbSSE rose from 10% to 31% in the control group, and from 11% to 36% in the intervention group. Current research: The current research was a secondary analysis of data collected for the Skin Awareness Study. The objectives were as follows: • To describe how men who did not take up any SSE during the study period differed from those who did take up examining their skin. • To determine whether the intervention program was successful in affecting the constructs of the Health Belief Model it was aimed at (self-efficacy, perceived threat, and outcome expectations); and whether this in turn influenced wbSSE. • To determine whether the Health Action Process Approach (HAPA) was a better predictor of wbSSE behaviour compared to the Health Belief Model (HBM). Methods: For objective 1, men who did not report any past SSE at baseline (n=308) were categorised as having ‘taken up SSE’ (reported SSE at study end) or ‘resisted SSE’ (reported no SSE at study end). Bivariate logistic regression, followed by multivariable regression, investigated the association between participant characteristics measured at baseline and resisting SSE. For objective 2 proxy measures of self-efficacy, perceived threat, and outcome expectations were selected. To determine whether these mediated the effect of the intervention on the outcome, a mediator analysis was performed with all participants who completed interviews at both time points (n=830) following the Baron and Kenny approach, modified for use with structural equation modelling (SEM). For objective 3, control group participants only were included (n=410). Proxy measures of all HBM and HAPA constructs were selected and SEM was used to build up models and test the significance of each hypothesised pathway. A likelihood ratio test compared the HAPA to the HBM. Results: Amongst men who did not report any SSE at baseline, 27% did not take up any SSE by the end of the study. In multivariable analyses, resisting SSE was associated with having more freckly skin (p=0.027); being unsure about the statement ‘if I saw something suspicious on my skin, I’d go to the doctor straight away’ (p=0.028); not intending to perform SSE (p=0.015), having lower SSE self-efficacy (p<0.001), and having no recommendation for SSE from a doctor (p=0.002). In the mediator analysis none of the tested variables mediated the relationship between the intervention and wbSSE. In regards to health behaviour models, the HBM did not predict wbSSE well overall. Only the construct of self-efficacy was a significant predictor of future wbSSE (p=0.001), while neither perceived threat (p=0.584) nor outcome expectations (p=0.220) were. By contrast, when the HAPA constructs were added, all three HBM variables predicted intention to perform SSE, which in turn predicted future behaviour (p=0.015). The HAPA construct of volitional self-efficacy was also associated with wbSSE (p=0.046). The HAPA was a significantly better model compared to the HBM (p<0.001). Limitations: Items selected to measure HBM and HAPA model constructs for objectives 2 and 3 may not have accurately reflected each construct. Conclusions: This research added to the evidence base on how best to target interventions to older men; and on the appropriateness of particular health behaviour models to guide interventions. Findings indicate that to overcome resistance those men with more negative pre-existing attitudes to SSE (not intending to do it, lower initial self-efficacy) may need to be targeted with more intensive interventions in the future. Involving general practitioners in recommending SSE to their patients in this population, alongside disseminating an intervention, may increase its success. Comparison of the HBM and HAPA showed that while two of the three HBM variables examined did not directly predict future wbSSE, all three were associated with intention to self-examine skin. This suggests that in this population, intervening on these variables may increase intention to examine skin, but not necessarily the behaviour itself. Future interventions could potentially focus on increasing both the motivational variables of perceived threat and outcome expectations as well as a combination of both action and volitional self-efficacy; with the aim of increasing intention as well as its translation to taking up and maintaining regular wbSSE.
Resumo:
Motorcycles are particularly vulnerable in right-angle crashes at signalized intersections. The objective of this study is to explore how variations in roadway characteristics, environmental factors, traffic factors, maneuver types, human factors as well as driver demographics influence the right-angle crash vulnerability of motorcycles at intersections. The problem is modeled using a mixed logit model with a binary choice category formulation to differentiate how an at-fault vehicle collides with a not-at-fault motorcycle in comparison to other collision types. The mixed logit formulation allows randomness in the parameters and hence takes into account the underlying heterogeneities potentially inherent in driver behavior, and other unobserved variables. A likelihood ratio test reveals that the mixed logit model is indeed better than the standard logit model. Night time riding shows a positive association with the vulnerability of motorcyclists. Moreover, motorcyclists are particularly vulnerable on single lane roads, on the curb and median lanes of multi-lane roads, and on one-way and two-way road type relative to divided-highway. Drivers who deliberately run red light as well as those who are careless towards motorcyclists especially when making turns at intersections increase the vulnerability of motorcyclists. Drivers appear more restrained when there is a passenger onboard and this has decreased the crash potential with motorcyclists. The presence of red light cameras also significantly decreases right-angle crash vulnerabilities of motorcyclists. The findings of this study would be helpful in developing more targeted countermeasures for traffic enforcement, driver/rider training and/or education, safety awareness programs to reduce the vulnerability of motorcyclists.
Resumo:
Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.
Resumo:
This research makes a major contribution which enables efficient searching and indexing of large archives of spoken audio based on speaker identity. It introduces a novel technique dubbed as “speaker attribution” which is the task of automatically determining ‘who spoke when?’ in recordings and then automatically linking the unique speaker identities within each recording across multiple recordings. The outcome of the research will also have significant impact in improving the performance of automatic speech recognition systems through the extracted speaker identities.
Resumo:
Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.
Resumo:
In this paper we present truncated differential analysis of reduced-round LBlock by computing the differential distribution of every nibble of the state. LLR statistical test is used as a tool to apply the distinguishing and key-recovery attacks. To build the distinguisher, all possible differences are traced through the cipher and the truncated differential probability distribution is determined for every output nibble. We concatenate additional rounds to the beginning and end of the truncated differential distribution to apply the key-recovery attack. By exploiting properties of the key schedule, we obtain a large overlap of key bits used in the beginning and final rounds. This allows us to significantly increase the differential probabilities and hence reduce the attack complexity. We validate the analysis by implementing the attack on LBlock reduced to 12 rounds. Finally, we apply single-key and related-key attacks on 18 and 21-round LBlock, respectively.
Resumo:
The quick detection of an abrupt unknown change in the conditional distribution of a dependent stochastic process has numerous applications. In this paper, we pose a minimax robust quickest change detection problem for cases where there is uncertainty about the post-change conditional distribution. Our minimax robust formulation is based on the popular Lorden criteria of optimal quickest change detection. Under a condition on the set of possible post-change distributions, we show that the widely known cumulative sum (CUSUM) rule is asymptotically minimax robust under our Lorden minimax robust formulation as a false alarm constraint becomes more strict. We also establish general asymptotic bounds on the detection delay of misspecified CUSUM rules (i.e. CUSUM rules that are designed with post- change distributions that differ from those of the observed sequence). We exploit these bounds to compare the delay performance of asymptotically minimax robust, asymptotically optimal, and other misspecified CUSUM rules. In simulation examples, we illustrate that asymptotically minimax robust CUSUM rules can provide better detection delay performance at greatly reduced computation effort compared to competing generalised likelihood ratio procedures.
Resumo:
In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.