893 resultados para noisy speaker verification
Resumo:
This paper investigates advanced channel compensation techniques for the purpose of improving i-vector speaker verification performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora. The performance of four channel compensation techniques: (a) weighted maximum margin criterion (WMMC), (b) source-normalized WMMC (SN-WMMC), (c) weighted linear discriminant analysis (WLDA), and; (d) source-normalized WLDA (SN-WLDA) have been investigated. We show that, by extracting the discriminatory information between pairs of speakers as well as capturing the source variation information in the development i-vector space, the SN-WLDA based cosine similarity scoring (CSS) i-vector system is shown to provide over 20% improvement in EER for NIST 2008 interview and microphone verification and over 10% improvement in EER for NIST 2008 telephone verification, when compared to SN-LDA based CSS i-vector system. Further, score-level fusion techniques are analyzed to combine the best channel compensation approaches, to provide over 8% improvement in DCF over the best single approach, (SN-WLDA), for NIST 2008 interview/ telephone enrolment-verification condition. Finally, we demonstrate that the improvements found in the context of CSS also generalize to state-of-the-art GPLDA with up to 14% relative improvement in EER for NIST SRE 2010 interview and microphone verification and over 7% relative improvement in EER for NIST SRE 2010 telephone verification.
Resumo:
Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.
Resumo:
The aim of this work is to develop software that is capable of back projecting primary fluence images obtained from EPID measurements through phantom and patient geometries in order to calculate 3D dose distributions. In the first instance, we aim to develop a tool for pretreatment verification in IMRT. In our approach, a Geant4 application is used to back project primary fluence values from each EPID pixel towards the source. Each beam is considered to be polyenergetic, with a spectrum obtained from Monte Carlo calculations for the LINAC in question. At each step of the ray tracing process, the energy differential fluence is corrected for attenuation and beam divergence. Subsequently, the TERMA is calculated and accumulated to an energy differential 3D TERMA distribution. This distribution is then convolved with monoenergetic point spread kernels, thus generating energy differential 3D dose distributions. The resulting dose distributions are accumulated to yield the total dose distribution, which can then be used for pre-treatment verification of IMRT plans. Preliminary results were obtained for a test EPID image comprised of 100 9 100 pixels of unity fluence. Back projection of this field into a 30 cm9 30 cm 9 30 cm water phantom was performed, with TERMA distributions obtained in approximately 10 min (running on a single core of a 3 GHz processor). Point spread kernels for monoenergetic photons in water were calculated using a separate Geant4 application. Following convolution and summation, the resulting 3D dose distribution produced familiar build-up and penumbral features. In order to validate the dose model we will use EPID images recorded without any attenuating material in the beam for a number of MLC defined square fields. The dose distributions in water will be calculated and compared to TPS predictions.
Resumo:
This research makes a major contribution which enables efficient searching and indexing of large archives of spoken audio based on speaker identity. It introduces a novel technique dubbed as “speaker attribution” which is the task of automatically determining ‘who spoke when?’ in recordings and then automatically linking the unique speaker identities within each recording across multiple recordings. The outcome of the research will also have significant impact in improving the performance of automatic speech recognition systems through the extracted speaker identities.
Resumo:
Iris based identity verification is highly reliable but it can also be subject to attacks. Pupil dilation or constriction stimulated by the application of drugs are examples of sample presentation security attacks which can lead to higher false rejection rates. Suspects on a watch list can potentially circumvent the iris based system using such methods. This paper investigates a new approach using multiple parts of the iris (instances) and multiple iris samples in a sequential decision fusion framework that can yield robust performance. Results are presented and compared with the standard full iris based approach for a number of iris degradations. An advantage of the proposed fusion scheme is that the trade-off between detection errors can be controlled by setting parameters such as the number of instances and the number of samples used in the system. The system can then be operated to match security threat levels. It is shown that for optimal values of these parameters, the fused system also has a lower total error rate.
Resumo:
Background and purpose: The purpose of the work presented in this paper was to determine whether patient positioning and delivery errors could be detected using electronic portal images of intensity modulated radiotherapy (IMRT). Patients and methods: We carried out a series of controlled experiments delivering an IMRT beam to a humanoid phantom using both the dynamic and multiple static field method of delivery. The beams were imaged, the images calibrated to remove the IMRT fluence variation and then compared with calibrated images of the reference beams without any delivery or position errors. The first set of experiments involved translating the position of the phantom both laterally and in a superior/inferior direction a distance of 1, 2, 5 and 10 mm. The phantom was also rotated 1 and 28. For the second set of measurements the phantom position was kept fixed and delivery errors were introduced to the beam. The delivery errors took the form of leaf position and segment intensity errors. Results: The method was able to detect shifts in the phantom position of 1 mm, leaf position errors of 2 mm, and dosimetry errors of 10% on a single segment of a 15 segment IMRT step and shoot delivery (significantly less than 1% of the total dose). Conclusions: The results of this work have shown that the method of imaging the IMRT beam and calibrating the images to remove the intensity modulations could be a useful tool in verifying both the patient position and the delivery of the beam.
Resumo:
Purpose: Electronic Portal Imaging Devices (EPIDs) are available with most linear accelerators (Amonuk, 2002), the current technology being amorphous silicon flat panel imagers. EPIDs are currently used routinely in patient positioning before radiotherapy treatments. There has been an increasing interest in using EPID technology tor dosimetric verification of radiotherapy treatments (van Elmpt, 2008). A straightforward technique involves the EPID panel being used to measure the fluence exiting the patient during a treatment which is then compared to a prediction of the fluence based on the treatment plan. However, there are a number of significant limitations which exist in this Method: Resulting in a limited proliferation ot this technique in a clinical environment. In this paper, we aim to present a technique of simulating IMRT fields using Monte Carlo to predict the dose in an EPID which can then be compared to the measured dose in the EPID. Materials: Measurements were made using an iView GT flat panel a-SI EPfD mounted on an Elekta Synergy linear accelerator. The images from the EPID were acquired using the XIS software (Heimann Imaging Systems). Monte Carlo simulations were performed using the BEAMnrc and DOSXVZnrc user codes. The IMRT fieids to be delivered were taken from the treatment planning system in DICOMRT format and converted into BEAMnrc and DOSXYZnrc input files using an in-house application (Crowe, 2009). Additionally. all image processing and analysis was performed using another in-house application written using the Interactive Data Language (IDL) (In Visual Information Systems). Comparison between the measured and Monte Carlo EPID images was performed using a gamma analysis (Low, 1998) incorporating dose and distance to agreement criteria. Results: The fluence maps recorded by the EPID were found to provide good agreement between measured and simulated data. Figure 1 shows an example of measured and simulated IMRT dose images and profiles in the x and y directions. "A technique for the quantitative evaluation of dose distributions", Med Phys, 25(5) May 1998 S. Crowe, 1. Kairn, A. Fielding, "The Development of a Monte Carlo system to verify Radiotherapy treatment dose calculations", Radiotherapy & Oncology, Volume 92, Supplement 1, August 2009, Pages S71-S71.
Resumo:
Introduction: Recent advances in the planning and delivery of radiotherapy treatments have resulted in improvements in the accuracy and precision with which therapeutic radiation can be administered. As the complexity of the treatments increases it becomes more difficult to predict the dose distribution in the patient accurately. Monte Carlo (MC) methods have the potential to improve the accuracy of the dose calculations and are increasingly being recognised as the ‘gold standard’ for predicting dose deposition in the patient [1]. This project has three main aims: 1. To develop tools that enable the transfer of treatment plan information from the treatment planning system (TPS) to a MC dose calculation engine. 2. To develop tools for comparing the 3D dose distributions calculated by the TPS and the MC dose engine. 3. To investigate the radiobiological significance of any errors between the TPS patient dose distribution and the MC dose distribution in terms of Tumour Control Probability (TCP) and Normal Tissue Complication Probabilities (NTCP). The work presented here addresses the first two aims. Methods: (1a) Plan Importing: A database of commissioned accelerator models (Elekta Precise and Varian 2100CD) has been developed for treatment simulations in the MC system (EGSnrc/BEAMnrc). Beam descriptions can be exported from the TPS using the widespread DICOM framework, and the resultant files are parsed with the assistance of a software library (PixelMed Java DICOM Toolkit). The information in these files (such as the monitor units, the jaw positions and gantry orientation) is used to construct a plan-specific accelerator model which allows an accurate simulation of the patient treatment field. (1b) Dose Simulation: The calculation of a dose distribution requires patient CT images which are prepared for the MC simulation using a tool (CTCREATE) packaged with the system. Beam simulation results are converted to absolute dose per- MU using calibration factors recorded during the commissioning process and treatment simulation. These distributions are combined according to the MU meter settings stored in the exported plan to produce an accurate description of the prescribed dose to the patient. (2) Dose Comparison: TPS dose calculations can be obtained using either a DICOM export or by direct retrieval of binary dose files from the file system. Dose difference, gamma evaluation and normalised dose difference algorithms [2] were employed for the comparison of the TPS dose distribution and the MC dose distribution. These implementations are spatial resolution independent and able to interpolate for comparisons. Results and Discussion: The tools successfully produced Monte Carlo input files for a variety of plans exported from the Eclipse (Varian Medical Systems) and Pinnacle (Philips Medical Systems) planning systems: ranging in complexity from a single uniform square field to a five-field step and shoot IMRT treatment. The simulation of collimated beams has been verified geometrically, and validation of dose distributions in a simple body phantom (QUASAR) will follow. The developed dose comparison algorithms have also been tested with controlled dose distribution changes. Conclusion: The capability of the developed code to independently process treatment plans has been demonstrated. A number of limitations exist: only static fields are currently supported (dynamic wedges and dynamic IMRT will require further development), and the process has not been tested for planning systems other than Eclipse and Pinnacle. The tools will be used to independently assess the accuracy of the current treatment planning system dose calculation algorithms for complex treatment deliveries such as IMRT in treatment sites where patient inhomogeneities are expected to be significant. Acknowledgements: Computational resources and services used in this work were provided by the HPC and Research Support Group, Queensland University of Technology, Brisbane, Australia. Pinnacle dose parsing made possible with the help of Paul Reich, North Coast Cancer Institute, North Coast, New South Wales.
Resumo:
We have taken a new method of calibrating portal images of IMRT beams and used this to measure patient set-up accuracy and delivery errors, such as leaf errors and segment intensity errors during treatment. A calibration technique was used to remove the intensity modulations from the images leaving equivalent open field images that show patient anatomy that can be used for verification of the patient position. The images of the treatment beam can also be used to verify the delivery of the beam in terms of multileaf collimator leaf position and dosimetric errors. A series of controlled experiments delivering an IMRT anterior beam to the head and neck of a humanoid phantom were undertaken. A 2mm translation in the position of the phantom could be detected. With intentional introduction of delivery errors into the beam this method allowed us to detect leaf positioning errors of 2mm and variation in monitor units of 1%. The method was then applied to the case of a patient who received IMRT treatment to the larynx and cervical nodes. The anterior IMRT beam was imaged during four fractions and the images calibrated and investigated for the characteristic signs of patient position error and delivery error that were shown in the control experiments. No significant errors were seen. The method of imaging the IMRT beam and calibrating the images to remove the intensity modulations can be a useful tool in verifying both the patient position and the delivery of the beam.
Resumo:
Application of "advanced analysis" methods suitable for non-linear analysis and design of steel frame structures permits direct and accurate determination of ultimate system strengths, without resort to simplified elastic methods of analysis and semi-empirical specification equations. However, the application of advanced analysis methods has previously been restricted to steel frames comprising only compact sections that are not influenced by the effects of local buckling. A refined plastic hinge method suitable for practical advanced analysis of steel frame structures comprising non-compact sections is presented in a companion paper. The method implicitly accounts for the effects of gradual cross-sectional yielding, longitudinal spread of plasticity, initial geometric imperfections, residual stresses, and local buckling. The accuracy and precision of the method for the analysis of steel frames comprising non-compact sections is established in this paper by comparison with a comprehensive range of analytical benchmark frame solutions. The refined plastic hinge method is shown to be more accurate and precise than the conventional individual member design methods based on elastic analysis and specification equations.
Resumo:
Recently, vision-based systems have been deployed in professional sports to track the ball and players to enhance analysis of matches. Due to their unobtrusive nature, vision-based approaches are preferred to wearable sensors (e.g. GPS or RFID sensors) as it does not require players or balls to be instrumented prior to matches. Unfortunately, in continuous team sports where players need to be tracked continuously over long-periods of time (e.g. 35 minutes in field-hockey or 45 minutes in soccer), current vision-based tracking approaches are not reliable enough to provide fully automatic solutions. As such, human intervention is required to fix-up missed or false detections. However, in instances where a human can not intervene due to the sheer amount of data being generated - this data can not be used due to the missing/noisy data. In this paper, we investigate two representations based on raw player detections (and not tracking) which are immune to missed and false detections. Specifically, we show that both team occupancy maps and centroids can be used to detect team activities, while the occupancy maps can be used to retrieve specific team activities. An evaluation on over 8 hours of field hockey data captured at a recent international tournament demonstrates the validity of the proposed approach.
Resumo:
Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.
Resumo:
Techniques to improve the automated analysis of natural and spontaneous facial expressions have been developed. The outcome of the research has applications in several fields including national security (eg: expression invariant face recognition); education (eg: affect aware interfaces); mental and physical health (eg: depression and pain recognition).
Resumo:
Background: A recent study by Dhillon et al. [12], identified both angioinvasion and mTOR as prognostic biomarkers for poor survival in early stage NSCLC. The aim of this study was to verify the above study by examining the angioinvasion and mTOR expression profile in a cohort of early stage NSCLC patients and correlate the results to patient clinico-pathological data and survival. Methods: Angioinvasion was routinely recorded by the pathologist at the initial assessment of the tumor following resection. mTOR was evaluated in 141 early stage (IA-IIB) NSCLC patients (67 - squamous; 60 - adenocarcinoma; 14 - others) using immunohistochemistry (IHC) analysis with an immunohistochemical score (IHS) calculated (% positive cells × staining intensity). Intensity was scored as follows: 0 (negative); 1+ (weak); 2+ (moderate); 3+ (strong). The range of scores was 0-300. Based on the previous study a cut-off score of 30 was used to define positive versus negative patients. The impact of angioinvasion and mTOR expression on prognosis was then evaluated. Results: 101 of the 141 tumors studied expressed mTOR. There was no difference in mTOR expression between squamous cell carcinoma and adenocarcinoma. Angioinvasion (p= 0.024) and mTOR staining (p= 0.048) were significant univariate predictors of poor survival. Both remained significant after multivariate analysis (p= 0.037 and p= 0.020, respectively). Conclusions: Our findings verify angioinvasion and mTOR expression as new biomarkers for poor outcome in patients with early stage NSCLC. mTOR expressing patients may benefit from novel therapies targeting the mTOR survival pathway. © 2011 Elsevier Ireland Ltd.