967 resultados para Speaker Recognition, Text-constrained, Multilingual, Speaker Verification, HMMs


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presently different audio watermarking methods are available; most of them inclined towards copyright protection and copy protection. This is the key motive for the notion to develop a speaker verification scheme that guar- antees non-repudiation services and the thesis is its outcome. The research presented in this thesis scrutinizes the field of audio water- marking and the outcome is a speaker verification scheme that is proficient in addressing issues allied to non-repudiation to a great extent. This work aimed in developing novel audio watermarking schemes utilizing the fun- damental ideas of Fast-Fourier Transform (FFT) or Fast Walsh-Hadamard Transform (FWHT). The Mel-Frequency Cepstral Coefficients (MFCC) the best parametric representation of the acoustic signals along with few other key acoustic characteristics is employed in crafting of new schemes. The au- dio watermark created is entirely dependent to the acoustic features, hence named as FeatureMark and is crucial in this work. In any watermarking scheme, the quality of the extracted watermark de- pends exclusively on the pre-processing action and in this work framing and windowing techniques are involved. The theme non-repudiation provides immense significance in the audio watermarking schemes proposed in this work. Modification of the signal spectrum is achieved in a variety of ways by selecting appropriate FFT/FWHT coefficients and the watermarking schemes were evaluated for imperceptibility, robustness and capacity char- acteristics. The proposed schemes are unequivocally effective in terms of maintaining the sound quality, retrieving the embedded FeatureMark and in terms of the capacity to hold the mark bits. Robust nature of these marking schemes is achieved with the help of syn- chronization codes such as Barker Code with FFT based FeatureMarking scheme and Walsh Code with FWHT based FeatureMarking scheme. An- other important feature associated with this scheme is the employment of an encryption scheme towards the preparation of its FeatureMark that scrambles the signal features that helps to keep the signal features unreve- laed. A comparative study with the existing watermarking schemes and the ex- periments to evaluate imperceptibility, robustness and capacity tests guar- antee that the proposed schemes can be baselined as efficient audio water- marking schemes. The four new digital audio watermarking algorithms in terms of their performance are remarkable thereby opening more opportu- nities for further research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a study on wavelets and their characteristics for the specific purpose of serving as a feature extraction tool for speaker verification (SV), considering a Radial Basis Function (RBF) classifier, which is a particular type of Artificial Neural Network (ANN). Examining characteristics such as support-size, frequency and phase responses, amongst others, we show how Discrete Wavelet Transforms (DWTs), particularly the ones which derive from Finite Impulse Response (FIR) filters, can be used to extract important features from a speech signal which are useful for SV. Lastly, an SV algorithm based on the concepts presented is described.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speaker Recognition, Speaker Verification, Sparse Kernel Logistic Regression, Support Vector Machine

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. These advances promise to make speech recognition technology readily available to the general public. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation develops an innovative approach towards less-constrained iris biometrics. Two major contributions are made in this research endeavor: (1) Designed an award-winning segmentation algorithm in the less-constrained environment where image acquisition is made of subjects on the move and taken under visible lighting conditions, and (2) Developed a pioneering iris biometrics method coupling segmentation and recognition of the iris based on video of moving persons under different acquisitions scenarios. The first part of the dissertation introduces a robust and fast segmentation approach using still images contained in the UBIRIS (version 2) noisy iris database. The results show accuracy estimated at 98% when using 500 randomly selected images from the UBIRIS.v2 partial database, and estimated at 97% in a Noisy Iris Challenge Evaluation (NICE.I) in an international competition that involved 97 participants worldwide involving 35 countries, ranking this research group in sixth position. This accuracy is achieved with a processing speed nearing real time. The second part of this dissertation presents an innovative segmentation and recognition approach using video-based iris images. Following the segmentation stage which delineates the iris region through a novel segmentation strategy, some pioneering experiments on the recognition stage of the less-constrained video iris biometrics have been accomplished. In the video-based and less-constrained iris recognition, the test or subject iris videos/images and the enrolled iris images are acquired with different acquisition systems. In the matching step, the verification/identification result was accomplished by comparing the similarity distance of encoded signature from test images with each of the signature dataset from the enrolled iris images. With the improvements gained, the results proved to be highly accurate under the unconstrained environment which is more challenging. This has led to a false acceptance rate (FAR) of 0% and a false rejection rate (FRR) of 17.64% for 85 tested users with 305 test images from the video, which shows great promise and high practical implications for iris biometrics research and system design.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La interacció home-màquina per mitjà de la veu cobreix moltes àrees d’investigació. Es destaquen entre altres, el reconeixement de la parla, la síntesis i identificació de discurs, la verificació i identificació de locutor i l’activació per veu (ordres) de sistemes robòtics. Reconèixer la parla és natural i simple per a les persones, però és un treball complex per a les màquines, pel qual existeixen diverses metodologies i tècniques, entre elles les Xarxes Neuronals. L’objectiu d’aquest treball és desenvolupar una eina en Matlab per al reconeixement i identificació de paraules pronunciades per un locutor, entre un conjunt de paraules possibles, i amb una bona fiabilitat dins d’uns marges preestablerts. El sistema és independent del locutor que pronuncia la paraula, és a dir, aquest locutor no haurà intervingut en el procés d’entrenament del sistema. S’ha dissenyat una interfície que permet l’adquisició del senyal de veu i el seu processament mitjançant xarxes neuronals i altres tècniques. Adaptant una part de control al sistema, es podria utilitzar per donar ordres a un robot com l’Alfa6Uvic o qualsevol altre dispositiu.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tässä diplomityössä perehdytään puhujantunnistukseen ja sen käyttökelpoisuuteen käyttäjän henkilöllisyyden todentamisessa osana puhelinverkon lisäarvopalveluja. Puhelimitse ohjattavat palvelut ovat yleensä perustuneet puhelimen näppäimillä lähetettäviin äänitaajuusvalintoihin. Käyttäjän henkilöllisyydestä on voitu varmistua esimerkiksi käyttäjätunnuksen ja salaisen tunnusluvun perusteella. Tulevaisuudessa palvelut voivat perustua puheentunnistukseen, jolloin myös käyttäjän todentaminen äänen perusteella vaikuttaa järkevältä. Työssä esitellään aluksi erilaisia biometrisiä tunnistamismenetelmiä. Työssä perehdytään tarkemmin äänen perusteella tapahtuvaan puhujan todentamiseen. Työn käytännön osuudessa toteutettiin puhelinverkon palveluihin soveltuva puhujantodennussovelluksen prototyyppi. Työn tarkoituksena oli selvittää teknologian käyttömahdollisuuksia sekä kerätä kokemusta puhujantodennuspalvelun toteuttamisesta tulevaisuutta silmällä pitäen. Prototyypin toteutuksessa ohjelmointikielenä käytettiin Javaa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La interacció home-màquina per mitjà de la veu cobreix moltes àrees d’investigació. Es destaquen entre altres, el reconeixement de la parla, la síntesis i identificació de discurs, la verificació i identificació de locutor i l’activació per veu (ordres) de sistemes robòtics. Reconèixer la parla és natural i simple per a les persones, però és un treball complex per a les màquines, pel qual existeixen diverses metodologies i tècniques, entre elles les Xarxes Neuronals. L’objectiu d’aquest treball és desenvolupar una eina en Matlab per al reconeixement i identificació de paraules pronunciades per un locutor, entre un conjunt de paraules possibles, i amb una bona fiabilitat dins d’uns marges preestablerts. El sistema és independent del locutor que pronuncia la paraula, és a dir, aquest locutor no haurà intervingut en el procés d’entrenament del sistema. S’ha dissenyat una interfície que permet l’adquisició del senyal de veu i el seu processament mitjançant xarxes neuronals i altres tècniques. Adaptant una part de control al sistema, es podria utilitzar per donar ordres a un robot com l’Alfa6Uvic o qualsevol altre dispositiu.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the telecommunications industry evolves over the next decade to provide the products and services that people will desire, several key technologies will become commonplace. Two of these, automatic speech recognition and text-to-speech synthesis, will provide users with more freedom on when, where, and how they access information. While these technologies are currently in their infancy, their capabilities are rapidly increasing and their deployment in today's telephone network is expanding. The economic impact of just one application, the automation of operator services, is well over $100 million per year. Yet there still are many technical challenges that must be resolved before these technologies can be deployed ubiquitously in products and services throughout the worldwide telephone network. These challenges include: (i) High level of accuracy. The technology must be perceived by the user as highly accurate, robust, and reliable. (ii) Easy to use. Speech is only one of several possible input/output modalities for conveying information between a human and a machine, much like a computer terminal or Touch-Tone pad on a telephone. It is not the final product. Therefore, speech technologies must be hidden from the user. That is, the burden of using the technology must be on the technology itself. (iii) Quick prototyping and development of new products and services. The technology must support the creation of new products and services based on speech in an efficient and timely fashion. In this paper I present a vision of the voice-processing industry with a focus on the areas with the broadest base of user penetration: speech recognition, text-to-speech synthesis, natural language processing, and speaker recognition technologies. The current and future applications of these technologies in the telecommunications industry will be examined in terms of their strengths, limitations, and the degree to which user needs have been or have yet to be met. Although noteworthy gains have been made in areas with potentially small user bases and in the more mature speech-coding technologies, these subjects are outside the scope of this paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phosphatidylserine (PS) is distributed almost entirely in the inner leaflet of the erythrocyte membrane bilayer, and appears to be maintained by a 32 kDa integral membrane protein (PS translocase). The expression of PS on the outer leaflet may serve as a recognition signal for macrophages, since insertion of PS into erythrocytes enhances their adherence to macrophages and clearance from the circulation. Therefore I have hypothesized that erythroid cells display PS on their outer leaflet early in differentiation and upon aging. Analysis of murine erythroleukemia cells (MELC, undifferentiated erythroid progenitor cells) showed high levels of PS on the outer leaflet that decreased during differentiation, correlating with the pattern of macrophage adherence. The activity of the PS translocase during differentiation appears to be unchanged although the equilibrium distribution of PS differs. This difference may be due to qualitative changes in the PS translocase. $\sp{125}$I-Bolton/Hunter-labeled-pyridyldithioethylamine ($\sp{125}$I-B/H-PDA), a radiolabeled probe for the PS translocase, labeled a 32 kDa protein in mature erythrocytes whereas in MELC a 45 kDa protein as well as a 32 kDa protein was identified. The abundance of the 45 kDa protein in relation to the 32 kDa protein declined during differentiation, possibly indicating this protein was a precursor of the 32 kDa protein. Analysis of the 45 kDa protein by N-glycosidase F and endoproteinase cleavage suggested this protein was not a glycosylated form of the 32 kDa protein but appeared to share some structural homology. Aged murine erythrocytes had elevated levels of PS on their outer leaflet, as well as decreased PS translocase activity. $\sp{125}$I-B/H-PDA labeled a 32 kDa protein in both normal and aged erythrocytes. However, the latter cells also contained a 28 kDa protein. Experimental evidence suggests that the appearance of the 28 kDa protein may be due to increased oxidation of aged erythrocytes. Examination of PS distribution showed that the levels of PS on the outer leaflet were elevated early in differentiation, decreased during the mature state, and returned to high levels as the erythrocyte aged. In conclusion,the levels of outer leaflet PS correlated with the differentiation status and macrophage recognition of erythroid cells. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have used two monovalent phage display libraries containing variants of the Zif268 DNA-binding domain to obtain families of zinc fingers that bind to alterations in the last 4 bp of the DNA sequence of the Zif268 consensus operator, GCG TGGGCG. Affinity selection was performed by altering the Zif268 operator three base pairs at a time, and simultaneously selecting for sets of 16 related DNA sequences. In this way, only four experiments were required to select for all possible 64 combinations of DNA triplet sequences. The results show that (i) for high-affinity DNA binding in the range observed for the Zif268 wild-type complex (Kd = 0.5–5 nM), finger 1 specifically requires the arginine at the carboxy terminus of its recognition helix that forms a bidentate hydrogen-bond with the guanine base (G) in the crystal structure of Zif268 complexed to its DNA operator sequence GCG TGG GCG; (ii) when the guanine base (G) is replaced by A, C, or T, a lower-affinity family (Kd ⩾ 50 nM) can be detected that shows an overall tendency to bind G-rich DNA; (iii) the residues at position 2 on the finger 2 recognition helix do not appear to interact strongly with the complementary 5′ base in the finger 1 binding site; and (iv) unexpected substitutions at the amino terminus of finger 1 can occasionally result in specificity for the 3′ base in the finger 1 binding site. A DNA recognition directory was constructed for high-affinity zinc fingers that recognize all three bases in a DNA triplet for seven sequences of the type GNN. Similar approaches may be applied to other zinc fingers to broaden the scope of the directory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pax proteins, characterized by the presence of a paired domain, play key regulatory roles during development. The paired domain is a bipartite DNA-binding domain that contains two helix–turn–helix domains joined by a linker region. Each of the subdomains, the PAI and RED domains, has been shown to be a distinct DNA-binding domain. The PAI domain is the most critical, but in specific circumstances, the RED domain is involved in DNA recognition. We describe a Pax protein, originally called Lune, that is the product of the Drosophila eye gone gene (eyg). It is unique among Pax proteins, because it contains only the RED domain. eyg seems to play a role both in the organogenesis of the salivary gland during embryogenesis and in the development of the eye. A high-affinity binding site for the Eyg RED domain was identified by using systematic evolution of ligands by exponential enrichment techniques. This binding site is related to a binding site previously identified for the RED domain of the Pax-6 5a isoform. Eyg also contains another DNA-binding domain, a Prd-class homeodomain (HD), whose palindromic binding site is similar to other Prd-class HDs. The ability of Pax proteins to use the PAI, RED, and HD, or combinations thereof, may be one mechanism that allows them to be used at different stages of development to regulate various developmental processes through the activation of specific target genes.