18 resultados para speaker recognition systems
em Indian Institute of Science - Bangalore - Índia
Resumo:
Abstract-The success of automatic speaker recognition in laboratory environments suggests applications in forensic science for establishing the Identity of individuals on the basis of features extracted from speech. A theoretical model for such a verification scheme for continuous normaliy distributed featureIss developed. The three cases of using a) single feature, b)multipliendependent measurements of a single feature, and c)multpleindependent features are explored.The number iofndependent features needed for areliable personal identification is computed based on the theoretcal model and an expklatory study of some speech featues.
Resumo:
The increasing use of 3D modeling of Human Face in Face Recognition systems, User Interfaces, Graphics, Gaming and the like has made it an area of active study. Majority of the 3D sensors rely on color coded light projection for 3D estimation. Such systems fail to generate any response in regions covered by Facial Hair (like beard, mustache), and hence generate holes in the model which have to be filled manually later on. We propose the use of wavelet transform based analysis to extract the 3D model of Human Faces from a sinusoidal white light fringe projected image. Our method requires only a single image as input. The method is robust to texture variations on the face due to space-frequency localization property of the wavelet transform. It can generate models to pixel level refinement as the phase is estimated for each pixel by a continuous wavelet transform. In cases of sparse Facial Hair, the shape distortions due to hairs can be filtered out, yielding an estimate for the underlying face. We use a low-pass filtering approach to estimate the face texture from the same image. We demonstrate the method on several Human Faces both with and without Facial Hairs. Unseen views of the face are generated by texture mapping on different rotations of the obtained 3D structure. To the best of our knowledge, this is the first attempt to estimate 3D for Human Faces in presence of Facial hair structures like beard and mustache without generating holes in those areas.
Resumo:
An adaptive learning scheme, based on a fuzzy approximation to the gradient descent method for training a pattern classifier using unlabeled samples, is described. The objective function defined for the fuzzy ISODATA clustering procedure is used as the loss function for computing the gradient. Learning is based on simultaneous fuzzy decisionmaking and estimation. It uses conditional fuzzy measures on unlabeled samples. An exponential membership function is assumed for each class, and the parameters constituting these membership functions are estimated, using the gradient, in a recursive fashion. The induced possibility of occurrence of each class is useful for estimation and is computed using 1) the membership of the new sample in that class and 2) the previously computed average possibility of occurrence of the same class. An inductive entropy measure is defined in terms of induced possibility distribution to measure the extent of learning. The method is illustrated with relevant examples.
Resumo:
For the problem of speaker adaptation in speech recognition, the performance depends on the availability of adaptation data. In this paper, we have compared several existing speaker adaptation methods, viz. maximum likelihood linear regression (MLLR), eigenvoice (EV), eigenspace-based MLLR (EMLLR), segmental eigenvoice (SEV) and hierarchical eigenvoice (HEV) based methods. We also develop a new method by modifying the existing HEV method for achieving further performance improvement in a limited available data scenario. In the sense of availability of adaptation data, the new modified HEV (MHEV) method is shown to perform better than all the existing methods throughout the range of operation except the case of MLLR at the availability of more adaptation data.
Resumo:
In this paper we propose a hypothetical scheme for recognizing the alphanumerics. The scheme is based on the known physiological structure of the visual cortex and the concept of a short Lino extractor nouron (SLEN). We assumo four basic typca of such units for extracting vertical, horizontal, right and left inclined straight line segments. The patterns reconstructed from the scheme show perfect agreement with the test patterns. The model indicates that the recognition of letters T and H requires extraction of the largest number of features.
Resumo:
We are addressing a new problem of improving automatic speech recognition performance, given multiple utterances of patterns from the same class. We have formulated the problem of jointly decoding K multiple patterns given a single Hidden Markov Model. It is shown that such a solution is possible by aligning the K patterns using the proposed Multi Pattern Dynamic Time Warping algorithm followed by the Constrained Multi Pattern Viterbi Algorithm The new formulation is tested in the context of speaker independent isolated word recognition for both clean and noisy patterns. When 10 percent of speech is affected by a burst noise at -5 dB Signal to Noise Ratio (local), it is shown that joint decoding using only two noisy patterns reduces the noisy speech recognition error rate to about 51 percent, when compared to the single pattern decoding using the Viterbi Algorithm. In contrast a simple maximization of individual pattern likelihoods, provides only about 7 percent reduction in error rate.
Resumo:
Design of speaker identification schemes for a small number of speakers (around 10) with a high degree of accuracy in controlled environment is a practical proposition today. When the number of speakers is large (say 50–100), many of these schemes cannot be directly extended, as both recognition error and computation time increase monotonically with population size. The feature selection problem is also complex for such schemes. Though there were earlier attempts to rank order features based on statistical distance measures, it has been observed only recently that the best two independent measurements are not the same as the combination in two's for pattern classification. We propose here a systematic approach to the problem using the decision tree or hierarchical classifier with the following objectives: (1) Design of optimal policy at each node of the tree given the tree structure i.e., the tree skeleton and the features to be used at each node. (2) Determination of the optimal feature measurement and decision policy given only the tree skeleton. Applicability of optimization procedures such as dynamic programming in the design of such trees is studied. The experimental results deal with the design of a 50 speaker identification scheme based on this approach.
Resumo:
The machine replication of human reading has been the subject of intensive research for more than three decades. A large number of research papers and reports have already been published on this topic. Many commercial establishments have manufactured recognizers of varying capabilities. Handheld, desk-top, medium-size and large systems costing as high as half a million dollars are available, and are in use for various applications. However, the ultimate goal of developing a reading machine having the same reading capabilities of humans still remains unachieved. So, there still is a great gap between human reading and machine reading capabilities, and a great amount of further effort is required to narrow-down this gap, if not bridge it. This review is organized into six major sections covering a general overview (an introduction), applications of character recognition techniques, methodologies in character recognition, research work in character recognition, some practical OCRs and the conclusions.
Resumo:
Polycyclic aromatic hydrocarbons (PAHs) are environmental pollutants as well as well-known carcinogens. Therefore, it is important to develop an effective receptor for the detection and quantification of such molecules in solution. In view of this, a 1,3-dinaphthalimide derivative of calix4]arene (L) has been synthesized and characterized, and the structure has been established by single crystal XRD. In the crystal lattice, intermolecular arm-to-arm pi center dot center dot center dot pi overlap dominates and thus L becomes a promising receptor for providing interactions with the aromatic species in solution, which can be monitored by following the changes that occur in its fluorescence and absorption spectra. On the basis of the solution studies carried out with about 17 derivatives of the aromatic guest molecular systems, it may be concluded that the changes that occur in the fluorescence intensity seem to be proportional to the number of aromatic rings present and thus proportional to the extent of pi center dot center dot center dot pi interaction present between the naphthalimide moieties and the aromatic portion of the guest molecule. Though the nonaromatic portion of the guest species affects the fluorescence quenching, the trend is still based on the number of rings present in these. Four guest aldehydes are bound to L with K-ass of 2000-6000 M-1 and their minimum detection limit is in the range of 8-35 mu M. The crystal structure of a naphthaldehyde complex, L.2b, exhibits intermolecular arm-to-arm as well as arm-to-naphthaldehyde pi center dot center dot center dot pi interactions. Molecular dynamics studies of L carried out in the presence of aromatic aldehydes under vacuum as well as in acetonitrile resulted in exhibiting interactions observed in the solid state and hence the changes observed in the fluorescence and absorption spectra are attributable for such interactions. Complex formation has also been delineated through ESI MS studies. Thus L is a promising receptor that can recognize PAHs by providing spectral changes proportional to the aromatic conjugation of the guest and the extent of aromatic pi center dot center dot center dot pi interactions present between L and the guest.
Resumo:
This work describes an online handwritten character recognition system working in combination with an offline recognition system. The online input data is also converted into an offline image, and parallely recognized by both online and offline strategies. Features are proposed for offline recognition and a disambiguation step is employed in the offline system for the samples for which the confidence level of the classifier is low. The outputs are then combined probabilistically resulting in a classifier out-performing both individual systems. Experiments are performed for Kannada, a South Indian Language, over a database of 295 classes. The accuracy of the online recognizer improves by 11% when the combination with offline system is used.
Resumo:
Acoustic modeling using mixtures of multivariate Gaussians is the prevalent approach for many speech processing problems. Computing likelihoods against a large set of Gaussians is required as a part of many speech processing systems and it is the computationally dominant phase for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We express the likelihood computation as a multiplication of matrices representing augmented feature vectors and Gaussian parameters. The computational gain of this approach over traditional methods is by exploiting the structure of these matrices and efficient implementation of their multiplication. In particular, we explore direct low-rank approximation of the Gaussian parameter matrix and indirect derivation of low-rank factors of the Gaussian parameter matrix by optimum approximation of the likelihood matrix. We show that both the methods lead to similar speedups but the latter leads to far lesser impact on the recognition accuracy. Experiments on 1,138 work vocabulary RM1 task and 6,224 word vocabulary TIMIT task using Sphinx 3.7 system show that, for a typical case the matrix multiplication based approach leads to overall speedup of 46 % on RM1 task and 115 % for TIMIT task. Our low-rank approximation methods provide a way for trading off recognition accuracy for a further increase in computational performance extending overall speedups up to 61 % for RM1 and 119 % for TIMIT for an increase of word error rate (WER) from 3.2 to 3.5 % for RM1 and for no increase in WER for TIMIT. We also express pairwise Euclidean distance computation phase in Dynamic Time Warping (DTW) in terms of matrix multiplication leading to saving of approximately of computational operations. In our experiments using efficient implementation of matrix multiplication, this leads to a speedup of 5.6 in computing the pairwise Euclidean distances and overall speedup up to 3.25 for DTW.
Resumo:
Benzimidazole derivatives are well known for their antibacterial, antiviral, anticonvulsant, antihistaminic, anthelmintic and antidepressant activities. Benzimidazole's unique base-selective DNA recognition property has been studied widely. However, most of the early benzimidazole systems have been targeted towards the binding of duplex DNA. Here we have shown the evolution and progress of the design and synthesis of new benzimidazole systems towards selective recognition of the double-stranded DNA first. Then in order to achieve selective recognition of the G-quadruplex DNA and utilize their potential as future anti-cancer drug candidates, we have demonstrated their selective cytotoxicity towards the cancer cells and potent telomerase inhibition ability.
Resumo:
In this paper, we discuss the issues related to word recognition in born-digital word images. We introduce a novel method of power-law transformation on the word image for binarization. We show the improvement in image binarization and the consequent increase in the recognition performance of OCR engine on the word image. The optimal value of gamma for a word image is automatically chosen by our algorithm with fixed stroke width threshold. We have exhaustively experimented our algorithm by varying the gamma and stroke width threshold value. By varying the gamma value, we found that our algorithm performed better than the results reported in the literature. On the ICDAR Robust Reading Systems Challenge-1: Word Recognition Task on born digital dataset, as compared to the recognition rate of 61.5% achieved by TH-OCR after suitable pre-processing by Yang et. al. and 63.4% by ABBYY Fine Reader (used as baseline by the competition organizers without any preprocessing), we achieved 82.9% using Omnipage OCR applied on the images after being processed by our algorithm.