15 resultados para Word Classes

em Indian Institute of Science - Bangalore - Índia


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a novel heuristic approach to segment recognizable symbols from online Kannada word data and perform recognition of the entire word. Two different estimates of first derivative are extracted from the preprocessed stroke groups and used as features for classification. Estimate 2 proved better resulting in 88% accuracy, which is 3% more than that achieved with estimate 1. Classification is performed by statistical dynamic space warping (SDSW) classifier which uses X, Y co-ordinates and their first derivatives as features. Classifier is trained with data from 40 writers. 295 classes are handled covering Kannada aksharas, with Kannada numerals, Indo-Arabic numerals, punctuations and other special symbols like $ and #. Classification accuracies obtained are 88% at the akshara level and 80% at the word level, which shows the scope for further improvement in segmentation algorithm

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The nature of binding of 7-nitrobenz-2-oxa-1,3-diazol-4-yl-colcemid (NBD-colcemid), an environment-sensitive fluorescent analogue of colchicine, to tubulin was tested. This article reports the first fluorometric study where two types of binding site of colchincine analogue on tubulin were detected. Binding of NBD-colcemid to one of these sites equilibrates slsowly. NBD-colcemid competes with colchicine for this site. Binding of NBD-colcemid to this site also causes inhibition of tubulin self-assembly. In contrast, NBD-colcemid binding to the other site is characterised by rapid equilibration and lack of competition with colchicine. Nevertheless, binding to this site is highly specific for the cholchicine nucleus, as alkyl-NBD analogues have no significant binding activity. Fast-reaction-kinetic studies gave 1.76 × 105 M–1 s–1 for the association and 0.79 s–1 for the dissociation rate constants for the binding of NBD-colcemid to the fast site of tubulin. The association rate constants for the two phases of the slow site are 0.016 × 10–4 M–1 s–1 and 3.5 × 10–4 M–1 respectively. These two sites may be related to the two sites of colchicine reported earlier, with binding characteristics altered by the increased hydrophobic nature of NBD-colcemid.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Equivalence of certain classes of second-order non-linear distributed parameter systems and corresponding linear third-order systems is established through a differential transformation technique. As linear systems are amenable to analysis through existing techniques, this study is expected to offer a method of tackling certain classes of non-linear problems which may otherwise prove to be formidable in nature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A fast algorithm for the computation of maximum compatible classes (mcc) among the internal states of an incompletely specified sequential machine is presented in this paper. All the maximum compatible classes are determined by processing compatibility matrices of progressingly diminishing order, whose total number does not exceed (p + m), where p is the largest cardinality among these classes, and m is the number of such classes. Consequently the algorithm is specially suitable for the state minimization of very large sequential machines as encountered in vlsi circuits and systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A study of radio intensity variations at seven frequencies in the range 0.3 to 90 GHz for compact extragalactic radio sources classified as BL Lacs and high- and low-optical polarization quasars (HPQs and LPQs) is presented. This include the results of flux-density monitoring of 33 compact sources for three years at 327 MHz with the Ooty Synthesis Radio Telescope. The degrees of 'short-term' (tau less than about 1 yr) variability for the three optical types are found to be indistinguishable at low frequencies (less than 1 GHz), pointing to an extrinsic origin for the low-frequency variability. At high frequencies, a distinct dependence on optical type is present, the variability increasing from LPQs, through HPQs to BL Lacs. This trend persists even when only sources with ultra-flat radio spectra (alpha greater than -0.2) are considered. Implications of this for the phenomenon of high-frequency variability and the proposed unification schemes for different optical types of active galactic nuclei are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We examine three hierarchies of circuit classes and show they are closed under complementation. (1) The class of languages recognized by a family of polynomial size skew circuits with width O(w), are closed under complement. (2) The class of languages recognized by family of polynomial size circuits with width O(w) and polynomial tree-size, are closed under complement. (3) The class of languages recognized by a family of polynomial size, O(log(n)) depth, bounded AND fan-in with OR fan-in f (f⩾log(n)) circuits are closed under complement. These improve upon the results of (i) Immerman (1988) and Szelepcsenyi (1988), who show that 𝒩L𝒪𝒢 is closed under complementation, and (ii) Borodin et al. (1989), who show that L𝒪𝒢𝒞ℱL is closed under complement

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include alpha-helices, beta-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 angstrom. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-beta class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving beta-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare beta-turns of type I' and II' are also identified as preferred sites for insertions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Network Intrusion Detection Systems (NIDS) intercept the traffic at an organization's network periphery to thwart intrusion attempts. Signature-based NIDS compares the intercepted packets against its database of known vulnerabilities and malware signatures to detect such cyber attacks. These signatures are represented using Regular Expressions (REs) and strings. Regular Expressions, because of their higher expressive power, are preferred over simple strings to write these signatures. We present Cascaded Automata Architecture to perform memory efficient Regular Expression pattern matching using existing string matching solutions. The proposed architecture performs two stage Regular Expression pattern matching. We replace the substring and character class components of the Regular Expression with new symbols. We address the challenges involved in this approach. We augment the Word-based Automata, obtained from the re-written Regular Expressions, with counter-based states and length bound transitions to perform Regular Expression pattern matching. We evaluated our architecture on Regular Expressions taken from Snort rulesets. We were able to reduce the number of automata states between 50% to 85%. Additionally, we could reduce the number of transitions by a factor of 3 leading to further reduction in the memory requirements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we discuss the issues related to word recognition in born-digital word images. We introduce a novel method of power-law transformation on the word image for binarization. We show the improvement in image binarization and the consequent increase in the recognition performance of OCR engine on the word image. The optimal value of gamma for a word image is automatically chosen by our algorithm with fixed stroke width threshold. We have exhaustively experimented our algorithm by varying the gamma and stroke width threshold value. By varying the gamma value, we found that our algorithm performed better than the results reported in the literature. On the ICDAR Robust Reading Systems Challenge-1: Word Recognition Task on born digital dataset, as compared to the recognition rate of 61.5% achieved by TH-OCR after suitable pre-processing by Yang et. al. and 63.4% by ABBYY Fine Reader (used as baseline by the competition organizers without any preprocessing), we achieved 82.9% using Omnipage OCR applied on the images after being processed by our algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we report a breakthrough result on the difficult task of segmentation and recognition of coloured text from the word image dataset of ICDAR robust reading competition challenge 2: reading text in scene images. We split the word image into individual colour, gray and lightness planes and enhance the contrast of each of these planes independently by a power-law transform. The discrimination factor of each plane is computed as the maximum between-class variance used in Otsu thresholding. The plane that has maximum discrimination factor is selected for segmentation. The trial version of Omnipage OCR is then used on the binarized words for recognition. Our recognition results on ICDAR 2011 and ICDAR 2003 word datasets are compared with those reported in the literature. As baseline, the images binarized by simple global and local thresholding techniques were also recognized. The word recognition rate obtained by our non-linear enhancement and selection of plance method is 72.8% and 66.2% for ICDAR 2011 and 2003 word datasets, respectively. We have created ground-truth for each image at the pixel level to benchmark these datasets using a toolkit developed by us. The recognition rate of benchmarked images is 86.7% and 83.9% for ICDAR 2011 and 2003 datasets, respectively.