996 resultados para text-dependent speaker verification
Resumo:
Presently different audio watermarking methods are available; most of them inclined towards copyright protection and copy protection. This is the key motive for the notion to develop a speaker verification scheme that guar- antees non-repudiation services and the thesis is its outcome. The research presented in this thesis scrutinizes the field of audio water- marking and the outcome is a speaker verification scheme that is proficient in addressing issues allied to non-repudiation to a great extent. This work aimed in developing novel audio watermarking schemes utilizing the fun- damental ideas of Fast-Fourier Transform (FFT) or Fast Walsh-Hadamard Transform (FWHT). The Mel-Frequency Cepstral Coefficients (MFCC) the best parametric representation of the acoustic signals along with few other key acoustic characteristics is employed in crafting of new schemes. The au- dio watermark created is entirely dependent to the acoustic features, hence named as FeatureMark and is crucial in this work. In any watermarking scheme, the quality of the extracted watermark de- pends exclusively on the pre-processing action and in this work framing and windowing techniques are involved. The theme non-repudiation provides immense significance in the audio watermarking schemes proposed in this work. Modification of the signal spectrum is achieved in a variety of ways by selecting appropriate FFT/FWHT coefficients and the watermarking schemes were evaluated for imperceptibility, robustness and capacity char- acteristics. The proposed schemes are unequivocally effective in terms of maintaining the sound quality, retrieving the embedded FeatureMark and in terms of the capacity to hold the mark bits. Robust nature of these marking schemes is achieved with the help of syn- chronization codes such as Barker Code with FFT based FeatureMarking scheme and Walsh Code with FWHT based FeatureMarking scheme. An- other important feature associated with this scheme is the employment of an encryption scheme towards the preparation of its FeatureMark that scrambles the signal features that helps to keep the signal features unreve- laed. A comparative study with the existing watermarking schemes and the ex- periments to evaluate imperceptibility, robustness and capacity tests guar- antee that the proposed schemes can be baselined as efficient audio water- marking schemes. The four new digital audio watermarking algorithms in terms of their performance are remarkable thereby opening more opportu- nities for further research.
Resumo:
This paper presents a study on wavelets and their characteristics for the specific purpose of serving as a feature extraction tool for speaker verification (SV), considering a Radial Basis Function (RBF) classifier, which is a particular type of Artificial Neural Network (ANN). Examining characteristics such as support-size, frequency and phase responses, amongst others, we show how Discrete Wavelet Transforms (DWTs), particularly the ones which derive from Finite Impulse Response (FIR) filters, can be used to extract important features from a speech signal which are useful for SV. Lastly, an SV algorithm based on the concepts presented is described.
Resumo:
Speaker Recognition, Speaker Verification, Sparse Kernel Logistic Regression, Support Vector Machine
Resumo:
Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading.
Resumo:
Tässä diplomityössä perehdytään puhujantunnistukseen ja sen käyttökelpoisuuteen käyttäjän henkilöllisyyden todentamisessa osana puhelinverkon lisäarvopalveluja. Puhelimitse ohjattavat palvelut ovat yleensä perustuneet puhelimen näppäimillä lähetettäviin äänitaajuusvalintoihin. Käyttäjän henkilöllisyydestä on voitu varmistua esimerkiksi käyttäjätunnuksen ja salaisen tunnusluvun perusteella. Tulevaisuudessa palvelut voivat perustua puheentunnistukseen, jolloin myös käyttäjän todentaminen äänen perusteella vaikuttaa järkevältä. Työssä esitellään aluksi erilaisia biometrisiä tunnistamismenetelmiä. Työssä perehdytään tarkemmin äänen perusteella tapahtuvaan puhujan todentamiseen. Työn käytännön osuudessa toteutettiin puhelinverkon palveluihin soveltuva puhujantodennussovelluksen prototyyppi. Työn tarkoituksena oli selvittää teknologian käyttömahdollisuuksia sekä kerätä kokemusta puhujantodennuspalvelun toteuttamisesta tulevaisuutta silmällä pitäen. Prototyypin toteutuksessa ohjelmointikielenä käytettiin Javaa.
Resumo:
Parkinson’s disease (PD) is an increasing neurological disorder in an aging society. The motor and non-motor symptoms of PD advance with the disease progression and occur in varying frequency and duration. In order to affirm the full extent of a patient’s condition, repeated assessments are necessary to adjust medical prescription. In clinical studies, symptoms are assessed using the unified Parkinson’s disease rating scale (UPDRS). On one hand, the subjective rating using UPDRS relies on clinical expertise. On the other hand, it requires the physical presence of patients in clinics which implies high logistical costs. Another limitation of clinical assessment is that the observation in hospital may not accurately represent a patient’s situation at home. For such reasons, the practical frequency of tracking PD symptoms may under-represent the true time scale of PD fluctuations and may result in an overall inaccurate assessment. Current technologies for at-home PD treatment are based on data-driven approaches for which the interpretation and reproduction of results are problematic. The overall objective of this thesis is to develop and evaluate unobtrusive computer methods for enabling remote monitoring of patients with PD. It investigates first-principle data-driven model based novel signal and image processing techniques for extraction of clinically useful information from audio recordings of speech (in texts read aloud) and video recordings of gait and finger-tapping motor examinations. The aim is to map between PD symptoms severities estimated using novel computer methods and the clinical ratings based on UPDRS part-III (motor examination). A web-based test battery system consisting of self-assessment of symptoms and motor function tests was previously constructed for a touch screen mobile device. A comprehensive speech framework has been developed for this device to analyze text-dependent running speech by: (1) extracting novel signal features that are able to represent PD deficits in each individual component of the speech system, (2) mapping between clinical ratings and feature estimates of speech symptom severity, and (3) classifying between UPDRS part-III severity levels using speech features and statistical machine learning tools. A novel speech processing method called cepstral separation difference showed stronger ability to classify between speech symptom severities as compared to existing features of PD speech. In the case of finger tapping, the recorded videos of rapid finger tapping examination were processed using a novel computer-vision (CV) algorithm that extracts symptom information from video-based tapping signals using motion analysis of the index-finger which incorporates a face detection module for signal calibration. This algorithm was able to discriminate between UPDRS part III severity levels of finger tapping with high classification rates. Further analysis was performed on novel CV based gait features constructed using a standard human model to discriminate between a healthy gait and a Parkinsonian gait. The findings of this study suggest that the symptom severity levels in PD can be discriminated with high accuracies by involving a combination of first-principle (features) and data-driven (classification) approaches. The processing of audio and video recordings on one hand allows remote monitoring of speech, gait and finger-tapping examinations by the clinical staff. On the other hand, the first-principles approach eases the understanding of symptom estimates for clinicians. We have demonstrated that the selected features of speech, gait and finger tapping were able to discriminate between symptom severity levels, as well as, between healthy controls and PD patients with high classification rates. The findings support suitability of these methods to be used as decision support tools in the context of PD assessment.
Resumo:
One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.
Resumo:
The use of kilometre-scale ensembles in operational forecasting provides new challenges for forecast interpretation and evaluation to account for uncertainty on the convective scale. A new neighbourhood based method is presented for evaluating and characterising the local predictability variations from convective scale ensembles. Spatial scales over which ensemble forecasts agree (agreement scales, S^A) are calculated at each grid point ij, providing a map of the spatial agreement between forecasts. By comparing the average agreement scale obtained from ensemble member pairs (S^A(mm)_ij), with that between members and radar observations (S^A(mo)_ij), this approach allows the location-dependent spatial spread-skill relationship of the ensemble to be assessed. The properties of the agreement scales are demonstrated using an idealised experiment. To demonstrate the methods in an operational context the S^A(mm)_ij and S^A(mo)_ij are calculated for six convective cases run with the Met Office UK Ensemble Prediction System. The S^A(mm)_ij highlight predictability differences between cases, which can be linked to physical processes. Maps of S^A(mm)_ij are found to summarise the spatial predictability in a compact and physically meaningful manner that is useful for forecasting and for model interpretation. Comparison of S^A(mm)_ij and S^A(mo)_ij demonstrates the case-by-case and temporal variability of the spatial spread-skill, which can again be linked to physical processes.
Resumo:
The usage of intensity modulated radiotherapy (IMRT) treatments necessitates a significant amount of patient-specific quality assurance (QA). This research has investigated the precision and accuracy of Kodak EDR2 film measurements for IMRT verifications, the use of comparisons between 2D dose calculations and measurements to improve treatment plan beam models, and the dosimetric impact of delivery errors. New measurement techniques and software were developed and used clinically at M. D. Anderson Cancer Center. The software implemented two new dose comparison parameters, the 2D normalized agreement test (NAT) and the scalar NAT index. A single-film calibration technique using multileaf collimator (MLC) delivery was developed. EDR2 film's optical density response was found to be sensitive to several factors: radiation time, length of time between exposure and processing, and phantom material. Precision of EDR2 film measurements was found to be better than 1%. For IMRT verification, EDR2 film measurements agreed with ion chamber results to 2%/2mm accuracy for single-beam fluence map verifications and to 5%/2mm for transverse plane measurements of complete plan dose distributions. The same system was used to quantitatively optimize the radiation field offset and MLC transmission beam modeling parameters for Varian MLCs. While scalar dose comparison metrics can work well for optimization purposes, the influence of external parameters on the dose discrepancies must be minimized. The ability of 2D verifications to detect delivery errors was tested with simulated data. The dosimetric characteristics of delivery errors were compared to patient-specific clinical IMRT verifications. For the clinical verifications, the NAT index and percent of pixels failing the gamma index were exponentially distributed and dependent upon the measurement phantom but not the treatment site. Delivery errors affecting all beams in the treatment plan were flagged by the NAT index, although delivery errors impacting only one beam could not be differentiated from routine clinical verification discrepancies. Clinical use of this system will flag outliers, allow physicists to examine their causes, and perhaps improve the level of agreement between radiation dose distribution measurements and calculations. The principles used to design and evaluate this system are extensible to future multidimensional dose measurements and comparisons. ^
Resumo:
La cuestión principal abordada en esta tesis doctoral es la mejora de los sistemas biométricos de reconocimiento de personas a partir de la voz, proponiendo el uso de una nueva parametrización, que hemos denominado parametrización biométrica extendida dependiente de género (GDEBP en sus siglas en inglés). No se propone una ruptura completa respecto a los parámetros clásicos sino una nueva forma de utilizarlos y complementarlos. En concreto, proponemos el uso de parámetros diferentes dependiendo del género del locutor, ya que como es bien sabido, la voz masculina y femenina presentan características diferentes que deberán modelarse, por tanto, de diferente manera. Además complementamos los parámetros clásicos utilizados (MFFC extraídos de la señal de voz), con un nuevo conjunto de parámetros extraídos a partir de la deconstrucción de la señal de voz en sus componentes de fuente glótica (más relacionada con el proceso y órganos de fonación y por tanto con características físicas del locutor) y de tracto vocal (más relacionada con la articulación acústica y por tanto con el mensaje emitido). Para verificar la validez de esta propuesta se plantean diversos escenarios, utilizando diferentes bases de datos, para validar que la GDEBP permite generar una descripción más precisa de los locutores que los parámetros MFCC clásicos independientes del género. En concreto se plantean diferentes escenarios de identificación sobre texto restringido y texto independiente utilizando las bases de datos de HESPERIA y ALBAYZIN. El trabajo también se completa con la participación en dos competiciones internacionales de reconocimiento de locutor, NIST SRE (2010 y 2012) y MOBIO 2013. En el primer caso debido a la naturaleza de las bases de datos utilizadas se obtuvieron resultados cercanos al estado del arte, mientras que en el segundo de los casos el sistema presentado obtuvo la mejor tasa de reconocimiento para locutores femeninos. A pesar de que el objetivo principal de esta tesis no es el estudio de sistemas de clasificación, sí ha sido necesario analizar el rendimiento de diferentes sistemas de clasificación, para ver el rendimiento de la parametrización propuesta. En concreto, se ha abordado el uso de sistemas de reconocimiento basados en el paradigma GMM-UBM, supervectores e i-vectors. Los resultados que se presentan confirman que la utilización de características que permitan describir los locutores de manera más precisa es en cierto modo más importante que la elección del sistema de clasificación utilizado por el sistema. En este sentido la parametrización propuesta supone un paso adelante en la mejora de los sistemas de reconocimiento biométrico de personas por la voz, ya que incluso con sistemas de clasificación relativamente simples se consiguen tasas de reconocimiento realmente competitivas. ABSTRACT The main question addressed in this thesis is the improvement of automatic speaker recognition systems, by the introduction of a new front-end module that we have called Gender Dependent Extended Biometric Parameterisation (GDEBP). This front-end do not constitute a complete break with respect to classical parameterisation techniques used in speaker recognition but a new way to obtain these parameters while introducing some complementary ones. Specifically, we propose a gender-dependent parameterisation, since as it is well known male and female voices have different characteristic, and therefore the use of different parameters to model these distinguishing characteristics should provide a better characterisation of speakers. Additionally, we propose the introduction of a new set of biometric parameters extracted from the components which result from the deconstruction of the voice into its glottal source estimate (close related to the phonation process and the involved organs, and therefore the physical characteristics of the speaker) and vocal tract estimate (close related to acoustic articulation and therefore to the spoken message). These biometric parameters constitute a complement to the classical MFCC extracted from the power spectral density of speech as a whole. In order to check the validity of this proposal we establish different practical scenarios, using different databases, so we can conclude that a GDEBP generates a more accurate description of speakers than classical approaches based on gender-independent MFCC. Specifically, we propose scenarios based on text-constrain and text-independent test using HESPERIA and ALBAYZIN databases. This work is also completed with the participation in two international speaker recognition evaluations: NIST SRE (2010 and 2012) and MOBIO 2013, with diverse results. In the first case, due to the nature of the NIST databases, we obtain results closed to state-of-the-art although confirming our hypothesis, whereas in the MOBIO SRE we obtain the best simple system performance for female speakers. Although the study of classification systems is beyond the scope of this thesis, we found it necessary to analise the performance of different classification systems, in order to verify the effect of them on the propose parameterisation. In particular, we have addressed the use of speaker recognition systems based on the GMM-UBM paradigm, supervectors and i-vectors. The presented results confirm that the selection of a set of parameters that allows for a more accurate description of the speakers is as important as the selection of the classification method used by the biometric system. In this sense, the proposed parameterisation constitutes a step forward in improving speaker recognition systems, since even when using relatively simple classification systems, really competitive recognition rates are achieved.
Resumo:
ABSTRACT: The ability of Antarctic krill Euphausia superba Dana to withstand the overwintering period is critical to their success. Laboratory evidence suggests that krill may shrink in body length during this time in response to the low availability of food. Nevertheless, verification that krill can shrink in the natural environment is lacking because winter data are difficult to obtain. One of the few sources of winter krill population data is from commercial vessels. We examined length-frequency data of adult krill (>35 mm total body length) obtained from commercial vessels in the Scotia-Weddell region and compared our results with those obtained from a combination of science and commercial sampling operations carried out in this region at other times of the year. Our analyses revealed body-length shrinkage in adult females but not males during overwinter, based on both the tracking of modal size classes over seasons and sex-ratio patterns. Other explanatory factors, such as differential mortality, immigration and emigration, could not explain the observed differences. The same pattern was also observed at South Georgia and in the Western Antarctic Peninsula. Fitted seasonally modulated von Bertalanffy growth functions predicted a pattern of overwintering shrinkage in all body-length classes of females, but only stagnation in growth in males. This shrinkage most likely reflects morphometric changes resulting from the contraction of the ovaries and is not necessarily an outcome of winter hardship. The sex-dependent changes that we observed need to be incorporated into life cycle and population dynamic models of this species, particularly those used in managing the fishery. KEY WORDS: Southern Ocean · Population dynamics · Production · Life cycle · Fishery
Resumo:
ABSTRACT: The ability of Antarctic krill Euphausia superba Dana to withstand the overwintering period is critical to their success. Laboratory evidence suggests that krill may shrink in body length during this time in response to the low availability of food. Nevertheless, verification that krill can shrink in the natural environment is lacking because winter data are difficult to obtain. One of the few sources of winter krill population data is from commercial vessels. We examined length-frequency data of adult krill (>35 mm total body length) obtained from commercial vessels in the Scotia-Weddell region and compared our results with those obtained from a combination of science and commercial sampling operations carried out in this region at other times of the year. Our analyses revealed body-length shrinkage in adult females but not males during overwinter, based on both the tracking of modal size classes over seasons and sex-ratio patterns. Other explanatory factors, such as differential mortality, immigration and emigration, could not explain the observed differences. The same pattern was also observed at South Georgia and in the Western Antarctic Peninsula. Fitted seasonally modulated von Bertalanffy growth functions predicted a pattern of overwintering shrinkage in all body-length classes of females, but only stagnation in growth in males. This shrinkage most likely reflects morphometric changes resulting from the contraction of the ovaries and is not necessarily an outcome of winter hardship. The sex-dependent changes that we observed need to be incorporated into life cycle and population dynamic models of this species, particularly those used in managing the fishery. KEY WORDS: Southern Ocean · Population dynamics · Production · Life cycle · Fishery
Resumo:
OBJECTIVE: Describe the overall transmission of malaria through a compartmental model, considering the human host and mosquito vector. METHODS: A mathematical model was developed based on the following parameters: human host immunity, assuming the existence of acquired immunity and immunological memory, which boosts the protective response upon reinfection; mosquito vector, taking into account that the average period of development from egg to adult mosquito and the extrinsic incubation period of parasites (transformation of infected but non-infectious mosquitoes into infectious mosquitoes) are dependent on the ambient temperature. RESULTS: The steady state equilibrium values obtained with the model allowed the calculation of the basic reproduction ratio in terms of the model's parameters. CONCLUSIONS: The model allowed the calculation of the basic reproduction ratio, one of the most important epidemiological variables.
Resumo:
The effect of the colour group on the morbidity due to Schistosoma mansoni was examined in two endemic areas situated in the State of Minas Gerais, Brazil. Of the 2773 eligible inhabitants, 1971 (71.1%) participated in the study: 545 (27.6%) were classified as white, 719 (36.5%) as intermediate and 707 (35.9%) as black. For each colour group, signs and symptoms of individuals who eliminated S.mansoni eggs (cases) were compared to those who did not present eggs in the faeces (controls). The odds ratios were adjusted by age, gender, previous treatment for schistosomiasis, endemic area and quality of the household. There was no evidence of a modifier effect of colour on diarrhea, bloody faeces or abdominal pain. A modifier effect of colour on hepatomegaly was evident among those heaviest infected (> 400 epg): the adjusted odds ratios for palpable liver at the middle clavicular and the middle sternal lines were smaller among blacks (5.4 and 6.5, respectively) and higher among whites (10.6 and 12.9) and intermediates (10.4 and 10.1, respectively). These results point out the existence of some degree of protection against hepatomegaly among blacks heaviest infected in the studied areas.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática