940 resultados para noisy speaker verification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel technique for the tracking of moving lips for the purpose of speaker identification. In our system, a model of the lip contour is formed directly from chromatic information in the lip region. Iterative refinement of contour point estimates is not required. Colour features are extracted from the lips via concatenated profiles taken around the lip contour. Reduction of order in lip features is obtained via principal component analysis (PCA) followed by linear discriminant analysis (LDA). Statistical speaker models are built from the lip features based on the Gaussian mixture model (GMM). Identification experiments performed on the M2VTS1 database, show encouraging results

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We introduce a genetic programming (GP) approach for evolving genetic networks that demonstrate desired dynamics when simulated as a discrete stochastic process. Our representation of genetic networks is based on a biochemical reaction model including key elements such as transcription, translation and post-translational modifications. The stochastic, reaction-based GP system is similar but not identical with algorithmic chemistries. We evolved genetic networks with noisy oscillatory dynamics. The results show the practicality of evolving particular dynamics in gene regulatory networks when modelled with intrinsic noise.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Radiotherapy is a cancer treatment modality in which a dose of ionising radiation is delivered to a tumour. The accurate calculation of the dose to the patient is very important in the design of an effective therapeutic strategy. This study aimed to systematically examine the accuracy of the radiotherapy dose calculations performed by clinical treatment planning systems by comparison againstMonte Carlo simulations of the treatment delivery. A suite of software tools known as MCDTK (Monte Carlo DICOM ToolKit) was developed for this purpose, and is capable of: • Importing DICOM-format radiotherapy treatment plans and producing Monte Carlo simulation input files (allowing simple simulation of complex treatments), and calibrating the results; • Analysing the predicted doses of and deviations between the Monte Carlo simulation results and treatment planning system calculations in regions of interest (tumours and organs-at-risk) and generating dose-volume histograms, so that conformity with dose prescriptions can be evaluated. The code has been tested against various treatment planning systems, linear acceleratormodels and treatment complexities. Six clinical head and neck cancer treatments were simulated and the results analysed using this software. The deviations were greatest where the treatment volume encompassed tissues on both sides of an air cavity. This was likely due to the method the planning system used to model low density media.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The quality assurance of stereotactic radiotherapy and radiosurgery treatments requires the use of small-field dose measurements that can be experimentally challenging. This study used Monte Carlo simulations to establish that PAGAT dosimetry gel can be used to provide accurate, high resolution, three-dimensional dose measurements of stereotactic radiotherapy fields. A small cylindrical container (4 cm height, 4.2 cm diameter) was filled with PAGAT gel, placed in the parietal region inside a CIRS head phantom, and irradiated with a 12 field stereotactic radiotherapy plan. The resulting three-dimensional dose measurement was read out using an optical CT scanner and compared with the treatment planning prediction of the dose delivered to the gel during the treatment. A BEAMnrc DOSXYZnrc simulation of this treatment was completed, to provide a standard against which the accuracy of the gel measurement could be gauged. The three dimensional dose distributions obtained from Monte Carlo and from the gel measurement were found to be in better agreement with each other than with the dose distribution provided by the treatment planning system's pencil beam calculation. Both sets of data showed close agreement with the treatment planning system's dose distribution through the centre of the irradiated volume and substantial disagreement with the treatment planning system at the penumbrae. The Monte Carlo calculations and gel measurements both indicated that the treated volume was up to 3 mm narrower, with steeper penumbrae and more variable out-of-field dose, than predicted by the treatment planning system. The Monte Carlo simulations allowed the accuracy of the PAGAT gel dosimeter to be verified in this case, allowing PAGAT gel to be utilised in the measurement of dose from stereotactic and other radiotherapy treatments, with greater confidence in the future.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To provide privacy protection, cryptographic primitives are frequently applied to communication protocols in an open environment (e.g. the Internet). We call these protocols privacy enhancing protocols (PEPs) which constitute a class of cryptographic protocols. Proof of the security properties, in terms of the privacy compliance, of PEPs is desirable before they can be deployed. However, the traditional provable security approach, though well-established for proving the security of cryptographic primitives, is not applicable to PEPs. We apply the formal language of Coloured Petri Nets (CPNs) to construct an executable specification of a representative PEP, namely the Private Information Escrow Bound to Multiple Conditions Protocol (PIEMCP). Formal semantics of the CPN specification allow us to reason about various privacy properties of PIEMCP using state space analysis techniques. This investigation provides insights into the modelling and analysis of PEPs in general, and demonstrates the benefit of applying a CPN-based formal approach to the privacy compliance verification of PEPs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The method on concurrent multi-scale model of structural behavior (CMSM-of-SB) for the purpose of structural health monitoring including model updating and validating has been studied. The detailed process of model updating and validating is discussed in terms of reduced scale specimen of the steel box girder in longitudinal stiffening truss of a long span bridge. Firstly, some influence factors affecting the accuracy of the CMSM-of-SB including the boundary restraint regidity, the geometry and material parameters on the toe of the weld and its neighbor are analyzed using sensitivity method. Then, sensitivity-based model updating technology is adopted to update the developed CMSM-of-SB and model verification is carried out through calculating and comparing stresses on different locations under various loading from dynamic characteristic and static response. It can be concluded that the CMSM-of-SB based on the substructure method is valid.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

"Defrauding land titles systems impacts upon us all. Those who deal in land include ordinary citizens, big business, small business, governments, not-for-profit organisation, deceased estates...Fraud here touches almost everybody." the thesis presented in this paper is that the current and disparate steps taken by jurisdictions to alleviate land fraud associated with identity-based crimes are inadequate. The centrepiece of the analysis is the consideration of two scenarios that have recently occurred. One is the typical scenario where a spouse forges the partner's signature to obtain a mortgage from a financial institution. The second is atypical. It involves a sophisticated overseas fraud duping many stakeholders involved in the conveyancing process. After outlining these scenarios, we will examine how identity verification requirements of the United Kingdom, Ontario, the Australian states, and New Zealand would have been applied to these two frauds. Our conclusion is that even though some jurisdictions may have prevented the frauds from occurring, the current requirements are inadequate. We use the lessons learnt to propose what we consider core principles for identity verification in land transactions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Internet services are important part of daily activities for most of us. These services come with sophisticated authentication requirements which may not be handled by average Internet users. The management of secure passwords for example creates an extra overhead which is often neglected due to usability reasons. Furthermore, password-based approaches are applicable only for initial logins and do not protect against unlocked workstation attacks. In this paper, we provide a non-intrusive identity verification scheme based on behavior biometrics where keystroke dynamics based-on free-text is used continuously for verifying the identity of a user in real-time. We improved existing keystroke dynamics based verification schemes in four aspects. First, we improve the scalability where we use a constant number of users instead of whole user space to verify the identity of target user. Second, we provide an adaptive user model which enables our solution to take the change of user behavior into consideration in verification decision. Next, we identify a new distance measure which enables us to verify identity of a user with shorter text. Fourth, we decrease the number of false results. Our solution is evaluated on a data set which we have collected from users while they were interacting with their mail-boxes during their daily activities.