18 resultados para Bagam script
Resumo:
Simple formalized rules are proposed for automatic phonetic transcription of Tamil words into Roman script. These rules are syntax-directed and require a one-symbol look-ahead facility and hence easily automated in a digital computer. Some suggestions are also put forth for the linearization of Tamil script for handling these by modern machinery.
Resumo:
We report a hierarchical blind script identifier for 11 different Indian scripts. An initial grouping of the 11 scripts is accomplished at the first level of this hierarchy. At the subsequent level, we recognize the script in each group. The various nodes of this tree use different feature-classifier combinations. A database of 20,000 words of different font styles and sizes is collected and used for each script. Effectiveness of Gabor and Discrete Cosine Transform features has been independently, evaluated using nearest neighbor linear discriminant and support vector machine classifiers. The minimum and maximum accuracies obtained, using this hierarchical mechanism, are 92.2% and 97.6%, respectively.
Resumo:
We present a fractal coding method to recognize online handwritten Tamil characters and propose a novel technique to increase the efficiency in terms of time while coding and decoding. This technique exploits the redundancy in data, thereby achieving better compression and usage of lesser memory. It also reduces the encoding time and causes little distortion during reconstruction. Experiments have been conducted to use these fractal codes to classify the online handwritten Tamil characters from the IWFHR 2006 competition dataset. In one approach, we use fractal coding and decoding process. A recognition accuracy of 90% has been achieved by using DTW for distortion evaluation during classification and encoding processes as compared to 78% using nearest neighbor classifier. In other experiments, we use the fractal code, fractal dimensions and features derived from fractal codes as features in separate classifiers. While the fractal code is successful as a feature, the other two features are not able to capture the wide within-class variations.
Resumo:
This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.
Resumo:
This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.
Resumo:
In this work, we describe a system, which recognises open vocabulary, isolated, online handwritten Tamil words and extend it to recognize a paragraph of writing. We explain in detail each step involved in the process: segmentation, preprocessing, feature extraction, classification and bigram-based post-processing. On our database of 45,000 handwritten words obtained through tablet PC, we have obtained symbol level accuracy of 78.5% and 85.3% without and with the usage of post-processing using symbol level language models, respectively. Word level accuracies for the same are 40.1% and 59.6%. A line and word level segmentation strategy is proposed, which gives promising results of 100% line segmentation and 98.1% word segmentation accuracies on our initial trials of 40 handwritten paragraphs. The two modules have been combined to obtain a full-fledged page recognition system for online handwritten Tamil data. To the knowledge of the authors, this is the first ever attempt on recognition of open vocabulary, online handwritten paragraphs in any Indian language.
Resumo:
Para-Bose commutation relations are related to the SL(2,R) Lie algebra. The irreducible representation [script D]alpha of the para-Bose system is obtained as the direct sum Dbeta[direct-sum]Dbeta+1/2 of the representations of the SL(2,R) Lie algebra. The position and momentum eigenstates are then obtained in this representation [script D]alpha, using the matrix mechanical method. The orthogonality, completeness, and the overlap of these eigenstates are derived. The momentum eigenstates are also derived using the wave mechanical method by specifying the domain of the definition of the momentum operator in addition to giving it a formal differential expression. By a careful consideration in this manner we find that the two apparently different solutions obtained by Ohnuki and Kamefuchi in this context are actually unitarily equivalent. Journal of Mathematical Physics is copyrighted by The American Institute of Physics.
Resumo:
It was proposed earlier [P. L. Sachdev, K. R. C. Nair, and V. G. Tikekar, J. Math. Phys. 27, 1506 (1986)] that the Euler Painlevé equation yy[script `]+ay[script ']2+ f(x)yy[script ']+g(x) y2+by[script ']+c=0 represents the generalized Burgers equations (GBE's) in the same manner as Painlevé equations do the KdV type. The GBE was treated with a damping term in some detail. In this paper another GBE ut+uaux+Ju/2t =(gd/2)uxx (the nonplanar Burgers equation) is considered. It is found that its self-similar form is again governed by the Euler Painlevé equation. The ranges of the parameter alpha for which solutions of the connection problem to the self-similar equation exist are obtained numerically and confirmed via some integral relations derived from the ODE's. Special exact analytic solutions for the nonplanar Burgers equation are also obtained. These generalize the well-known single hump solutions for the Burgers equation to other geometries J=1,2; the nonlinear convection term, however, is not quadratic in these cases. This study fortifies the conjecture regarding the importance of the Euler Painlevé equation with respect to GBE's. Journal of Mathematical Physics is copyrighted by The American Institute of Physics.
Resumo:
This paper addresses the problem of resolving ambiguities in frequently confused online Tamil character pairs by employing script specific algorithms as a post classification step. Robust structural cues and temporal information of the preprocessed character are extensively utilized in the design of these algorithms. The methods are quite robust in automatically extracting the discriminative sub-strokes of confused characters for further analysis. Experimental validation on the IWFHR Database indicates error rates of less than 3 % for the confused characters. Thus, these post processing steps have a good potential to improve the performance of online Tamil handwritten character recognition.
Resumo:
Thermal analysis and interrupted quench experiments have been carried out to study the formation of beta-FeSiAl5 and (Be-Fe)-BeSiFe2Al8 phases in Al-7Si-0.3Mg alloy with and without Be addition. In the base alloy with 0.6% Fe (without Be addition), a needle- and plate-shaped beta-phase is present in the interdendritic regions and is formed by a ternary eutectic reaction. In the Be- added alloy with 0.6% Fe, a Be-Fe phase of Chinese script and polygon shapes grows along with the primary alpha-Al dendrites, leading to superior mechanical properties. It is proposed that this Be-Fe phase is formed by a peritectic reaction. Be addition has also resulted in some grain refinement.
Resumo:
This paper presents the design of a full fledged OCR system for printed Kannada text. The machine recognition of Kannada characters is difficult due to similarity in the shapes of different characters, script complexity and non-uniqueness in the representation of diacritics. The document image is subject to line segmentation, word segmentation and zone detection. From the zonal information, base characters, vowel modifiers and consonant conjucts are separated. Knowledge based approach is employed for recognizing the base characters. Various features are employed for recognising the characters. These include the coefficients of the Discrete Cosine Transform, Discrete Wavelet Transform and Karhunen-Louve Transform. These features are fed to different classifiers. Structural features are used in the subsequent levels to discriminate confused characters. Use of structural features, increases recognition rate from 93% to 98%. Apart from the classical pattern classification technique of nearest neighbour, Artificial Neural Network (ANN) based classifiers like Back Propogation and Radial Basis Function (RBF) Networks have also been studied. The ANN classifiers are trained in supervised mode using the transform features. Highest recognition rate of 99% is obtained with RBF using second level approximation coefficients of Haar wavelets as the features on presegmented base characters.
Resumo:
As research becomes more and more interdisciplinary, literature search from CD-ROM databases is often carried out on more than one CD-ROM database. This results in retrieving duplicate records due to same literature being covered (indexed) in more than one database. The retrieval software does not identify such duplicate records. Three different programs have been written to accomplish the task of identifying the duplicate records. These programs are executed from a shell script to minimize manual intervention. The various fields that have been used (extracted) to identify the duplicate records include the article title, year, volume number, issue number and pagination. The shell script when executed prompts for input file that may contain duplicate records. The programs identify the duplicate records and write them to a new file.
Resumo:
This paper presents a new application of two dimensional Principal Component Analysis (2DPCA) to the problem of online character recognition in Tamil Script. A novel set of features employing polynomial fits and quartiles in combination with conventional features are derived for each sample point of the Tamil character obtained after smoothing and resampling. These are stacked to form a matrix, using which a covariance matrix is constructed. A subset of the eigenvectors of the covariance matrix is employed to get the features in the reduced sub space. Each character is modeled as a separate subspace and a modified form of the Mahalanobis distance is derived to classify a given test character. Results indicate that the recognition accuracy using the 2DPCA scheme shows an approximate 3% improvement over the conventional PCA technique.
Resumo:
In this paper, we present an unrestricted Kannada online handwritten character recognizer which is viable for real time applications. It handles Kannada and Indo-Arabic numerals, punctuation marks and special symbols like $, &, # etc, apart from all the aksharas of the Kannada script. The dataset used has handwriting of 69 people from four different locations, making the recognition writer independent. It was found that for the DTW classifier, using smoothed first derivatives as features, enhanced the performance to 89% as compared to preprocessed co-ordinates which gave 85%, but was too inefficient in terms of time. To overcome this, we used Statistical Dynamic Time Warping (SDTW) and achieved 46 times faster classification with comparable accuracy i.e. 88%, making it fast enough for practical applications. The accuracies reported are raw symbol recognition results from the classifier. Thus, there is good scope of improvement in actual applications. Where domain constraints such as fixed vocabulary, language models and post processing can be employed. A working demo is also available on tablet PC for recognition of Kannada words.
Resumo:
This research is focused on understanding the role of microstructural variables and processing parameters in obtaining optimised dual phase structures in medium carbon low alloy steels. Tempered Martensite structures produced at 300, 500, and 650 degrees C, were cold rolled to varied degrees ranging from 20 to 80% deformation. Intercritical annealing was then performed at 740, 760, and 780 degrees C for various time duration ranging from 60 seconds to 60 minutes before quenching in water. The transformation behaviour was studied with the aid of optical microscopy and hardness curves. From the results, it is observed that microstructural condition, deformation, and intercritical temperatures influenced the chronological order of the competing stress relaxation and decomposition phase reactions which interfered with the rate of the expected alpha -> gamma transformation. The three unique transformation trends observed are systematically analyzed. It was also observed that the 300 and 500 degrees C tempered initial microstructures were unsuitable for the production of dual structures with optimized strength characteristics.