87 resultados para Letters in word recognition
Resumo:
Joint decoding of multiple speech patterns so as to improve speech recognition performance is important, especially in the presence of noise. In this paper, we propose a Multi-Pattern Viterbi algorithm (MPVA) to jointly decode and recognize multiple speech patterns for automatic speech recognition (ASR). The MPVA is a generalization of the Viterbi Algorithm to jointly decode multiple patterns given a Hidden Markov Model (HMM). Unlike the previously proposed two stage Constrained Multi-Pattern Viterbi Algorithm (CMPVA),the MPVA is a single stage algorithm. MPVA has the advantage that it cart be extended to connected word recognition (CWR) and continuous speech recognition (CSR) problems. MPVA is shown to provide better speech recognition performance than the earlier techniques: using only two repetitions of noisy speech patterns (-5 dB SNR, 10% burst noise), the word error rate using MPVA decreased by 28.5%, when compared to using individual decoding. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The machine replication of human reading has been the subject of intensive research for more than three decades. A large number of research papers and reports have already been published on this topic. Many commercial establishments have manufactured recognizers of varying capabilities. Handheld, desk-top, medium-size and large systems costing as high as half a million dollars are available, and are in use for various applications. However, the ultimate goal of developing a reading machine having the same reading capabilities of humans still remains unachieved. So, there still is a great gap between human reading and machine reading capabilities, and a great amount of further effort is required to narrow-down this gap, if not bridge it. This review is organized into six major sections covering a general overview (an introduction), applications of character recognition techniques, methodologies in character recognition, research work in character recognition, some practical OCRs and the conclusions.
Resumo:
The problem of estimating the three-dimensional rotational parameters of a rigid body from its monocular image data has been considered using the method of moment invariants. Second- and third-order moment invariants are used to construct the feature vector for the scale and orientation independent identification of the camera view axis direction in the body-fixed reference frame. The camera rotation angle about the view axis is derived from second-order central moments. The relative attitude of the rigid body is then expressed in terms of quaternion parameters to model the outputs of a video sensor in attitude control simulations. Experimental results and simulation outputs are presented using the mathematical model of a spacecraft.
Resumo:
Sesbania mosaic virus (SeMV) is a single strand positive-sense RNA plant virus that belongs to the genus Sobemovirus. The mechanism of cell-to-cell movement in sobemoviruses has not been well studied. With a view to identify the viral encoded ancillary proteins of SeMV that may assist in cell-to-cell movement of the virus, all the proteins encoded by SeMV genome were cloned into yeast Matchmaker system 3 and interaction studies were performed. Two proteins namely, viral protein genome linked (VPg) and a 10-kDa protein (P10) c v gft encoded by OFR 2a, were identified as possible interacting partners in addition to the viral coat protein (CP). Further characterization of these interactions revealed that the movement protein (MP) recognizes cognate RNA through interaction with VPg, which is covalently linked to the 59 end of the RNA. Analysis of the deletion mutants delineated the domains of MP involved in the interaction with VPg and P10. This study implicates for the first time that VPg might play an important role in specific recognition of viral genome by MP in SeMV and shed light on the possible role of P10 in the viral movement.
Resumo:
EcoP15I DNA methyltransferase recognizes the sequence 5'-CAGCAG-3' and transfers a methyl group to N-6 of the second adenine residue in the recognition sequence. All N-6 adenine methyltransferases contain two highly conserved sequences, FxGxG (motif I), postulated to form part of the S-adenosyl-L-methionine binding site and (D/N/S)PP(Y/F) (motif IV) involved in catalysis. We have altered the second glycine residue in motif I to arginine and serine, and substituted tyrosine in motif IV with tryptophan in EcoP15I DNA methyltransferase, using site-directed mutagenesis. The mutant enzymes were overexpressed, purified and characterized by biochemical methods. The mutations in motif I completely abolished AdoMet binding but left target DNA recognition unaltered. Although the mutation in motif IV resulted in loss of enzyme activity, we observed enhanced crosslinking of S-adenosyl-L-methionine and DNA. This implies that DNA and AdoMet binding sites are close to motif IV. Taken together, these results reinforce the importance of motif I in AdoMet binding and motif IV in catalysis. Additionally, limited proteolysis and UV crosslinking experiments with EcoP15I DNA methyltransferase imply that DNA binds in a cleft formed by two domains in the protein. Methylation protection analysis provides evidence for the fact that EcoP15I DNA MTase makes contacts in the major groove of its substrate DNA. Interestingly, hypermethylation of the guanine residue next to the target adenine residue indicates that the protein probably flips out the target adenine residue. (C) 1996 Academic Press Limited
Resumo:
The DNA-binding properties of the EcoP15I DNA methyltransferase (M . EcoP15I; MTase) were studied using electrophoretic mobility shift assays. We show by molecular size-exclusion chromatography and dimethyl suberimidate crosslinking that M . EcoP15I is a dimer in solution. While M . EcoP15I binds approx. threefold more tightly to its recognition sequence, 5'-CAGCAG-3', than to non-specific sequences in the presence of AdoMet or its analogs, the discrimination between specific and non-specific sequences significantly increases in presence of ATP. These results suggest for the first time a role for ATP in DNA recognition by type-III restriction-modification enzymes. Furthermore, we show that although c2 EcoPI mutant MTases are defective in AdoMet binding, they are still able to bind DNA in a sequence-specific manner.
Resumo:
Molecular complexes of melamine with hydroxy and dihydroxybenzoic acids have been analyzed to assess the collective role of the hydroxyl (OH) and carboxyl (COOH) functionalities in the recognition process. In most cases, solvents of crystallization do play a major role in self-assembly and structure stabilization. Hydrated compounds generate linear chains of melamine molecules with acid molecules pendant resulting in a zipper architecture. However, anhydrous and solvated compounds generate tetrameric units consisting of melamine dimers together with acid molecules. These tetramers in turn interweave to form a Lincoln log arrangement in the crystal. The salt/co-crystal formation in these complexes cannot be predicted apriori on the basis of Delta pK(a) values as there exists a salt-to-co-crystal continuum.
Resumo:
A fundamental task in bioinformatics involves a transfer of knowledge from one protein molecule onto another by way of recognizing similarities. Such similarities are obtained at different levels, that of sequence, whole fold, or important substructures. Comparison of binding sites is important to understand functional similarities among the proteins and also to understand drug cross-reactivities. Current methods in literature have their own merits and demerits, warranting exploration of newer concepts and algorithms, especially for large-scale comparisons and for obtaining accurate residue-wise mappings. Here, we report the development of a new algorithm, PocketAlign, for obtaining structural superpositions of binding sites. The software is available as a web-service at http://proline.physicslisc.emetin/pocketalign/. The algorithm encodes shape descriptors in the form of geometric perspectives, supplemented by chemical group classification. The shape descriptor considers several perspectives with each residue as the focus and captures relative distribution of residues around it in a given site. Residue-wise pairings are computed by comparing the set of perspectives of the first site with that of the second, followed by a greedy approach that incrementally combines residue pairings into a mapping. The mappings in different frames are then evaluated by different metrics encoding the extent of alignment of individual geometric perspectives. Different initial seed alignments are computed, each subsequently extended by detecting consequential atomic alignments in a three-dimensional grid, and the best 500 stored in a database. Alignments are then ranked, and the top scoring alignments reported, which are then streamed into Pymol for visualization and analyses. The method is validated for accuracy and sensitivity and benchmarked against existing methods. An advantage of PocketAlign, as compared to some of the existing tools available for binding site comparison in literature, is that it explores different schemes for identifying an alignment thus has a better potential to capture similarities in ligand recognition abilities. PocketAlign, by finding a detailed alignment of a pair of sites, provides insights as to why two sites are similar and which set of residues and atoms contribute to the similarity.
Resumo:
In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
This paper presents a new application of two dimensional Principal Component Analysis (2DPCA) to the problem of online character recognition in Tamil Script. A novel set of features employing polynomial fits and quartiles in combination with conventional features are derived for each sample point of the Tamil character obtained after smoothing and resampling. These are stacked to form a matrix, using which a covariance matrix is constructed. A subset of the eigenvectors of the covariance matrix is employed to get the features in the reduced sub space. Each character is modeled as a separate subspace and a modified form of the Mahalanobis distance is derived to classify a given test character. Results indicate that the recognition accuracy using the 2DPCA scheme shows an approximate 3% improvement over the conventional PCA technique.
Resumo:
This paper describes the efforts at MILE lab, IISc, to create a 100,000-word database each in Kannada and Tamil for the design and development of Online Handwritten Recognition. It has been collected from over 600 users in order to capture the variations in writing style. We describe features of the scripts and how the number of symbols were reduced to be able to effectively train the data for recognition. The list of words include all the characters, Kannada and Indo-Arabic numerals, punctuations and other symbols. A semi-automated tool for the annotation of data from stroke to word level is used. It segments each word into stroke groups and also acts as a validation mechanism for segmentation. The tool displays the stroke, stroke groups and aksharas of a word and hence can be used to study the various styles of writing, delayed strokes and for assigning quality tags to the words. The tool is currently being used for annotating Tamil and Kannada data. The output is stored in a standard XML format.
Resumo:
The following topics were dealt with: document analysis and recognition; multimedia document processing; character recognition; document image processing; cheque processing; form processing; music processing; document segmentation; electronic documents; character classification; handwritten character recognition; information retrieval; postal automation; font recognition; Indian language OCR; handwriting recognition; performance evaluation; graphics recognition; oriental character recognition; and word recognition
Resumo:
We address the classical problem of delta feature computation, and interpret the operation involved in terms of Savitzky- Golay (SG) filtering. Features such as themel-frequency cepstral coefficients (MFCCs), obtained based on short-time spectra of the speech signal, are commonly used in speech recognition tasks. In order to incorporate the dynamics of speech, auxiliary delta and delta-delta features, which are computed as temporal derivatives of the original features, are used. Typically, the delta features are computed in a smooth fashion using local least-squares (LS) polynomial fitting on each feature vector component trajectory. In the light of the original work of Savitzky and Golay, and a recent article by Schafer in IEEE Signal Processing Magazine, we interpret the dynamic feature vector computation for arbitrary derivative orders as SG filtering with a fixed impulse response. This filtering equivalence brings in significantly lower latency with no loss in accuracy, as validated by results on a TIMIT phoneme recognition task. The SG filters involved in dynamic parameter computation can be viewed as modulation filters, proposed by Hermansky.
Resumo:
Scenic word images undergo degradations due to motion blur, uneven illumination, shadows and defocussing, which lead to difficulty in segmentation. As a result, the recognition results reported on the scenic word image datasets of ICDAR have been low. We introduce a novel technique, where we choose the middle row of the image as a sub-image and segment it first. Then, the labels from this segmented sub-image are used to propagate labels to other pixels in the image. This approach, which is unique and distinct from the existing methods, results in improved segmentation. Bayesian classification and Max-flow methods have been independently used for label propagation. This midline based approach limits the impact of degradations that happens to the image. The segmented text image is recognized using the trial version of Omnipage OCR. We have tested our method on ICDAR 2003 and ICDAR 2011 datasets. Our word recognition results of 64.5% and 71.6% are better than those of methods in the literature and also methods that competed in the Robust reading competition. Our method makes an implicit assumption that degradation is not present in the middle row.
Resumo:
Restriction-modification (R-M) systems are ubiquitous and are often considered primitive immune systems in bacteria. Their diversity and prevalence across the prokaryotic kingdom are an indication of their success as a defense mechanism against invading genomes. However, their cellular defense function does not adequately explain the basis for their immaculate specificity in sequence recognition and nonuniform distribution, ranging from none to too many, in diverse species. The present review deals with new developments which provide insights into the roles of these enzymes in other aspects of cellular function. In this review, emphasis is placed on novel hypotheses and various findings that have not yet been dealt with in a critical review. Emerging studies indicate their role in various cellular processes other than host defense, virulence, and even controlling the rate of evolution of the organism. We also discuss how R-M systems could have successfully evolved and be involved in additional cellular portfolios, thereby increasing the relative fitness of their hosts in the population.