998 resultados para script processing
Resumo:
一次开发多语言使用是国际化软件开发的主要目标。但是世界上的文字多种多样,它们的书写方向也有所不同,除了水平从左向右书写的英文、水平从右往左书写的阿拉伯文外,还有类似蒙古文这样垂直排列的文字,这对计算机图形用户界面提出了更高的要求,现有的计算机系统将这类垂直排列的文字沿水平方向输出,极不符合少数民族人民的习惯。在分析现有Qt库对类似阿拉伯文这样从右向左书写的文字的部分支持机制的基础上,我们设计并实现了支持四种方向模式的国际化的图形用户界面,现在它已经能够适应世界上几乎所有的文字。这对于软件国际化以及民族语言信息处理有重要意义。
Resumo:
复杂文字在显示输出的过程中,表现出极为复杂的语言特征.为此提出了一种基于谓词规则的复杂文字处理模型,模型以谓词规则的方法给出了复杂文字字形布局特征的形式化描述,按照复杂文字处理的过程,设计了实现该模型的软件体系结构,将复杂文字的语言特征从程序控制逻辑中隔离出来,提高了系统的灵活性,便于增加新的复杂文字的支持.在研制蒙古文、藏文、维吾尔文办公套件的应用中表明,该模型是实用有效的.
Resumo:
文档处理是文字处理的关键组成部分,针对多语言混合排版的需求,本文提出了基于“框”的支持不同方向的多语言文本布局的文档处理模型。该模型把时文本布局方向的处理封装在文档格式化模块中,将多文本布局方向的问题规约为文本布局方向为从左向右(水平)的文档格式化的问题,并设计了多文本布局方向文档格式化的递归算法。该模型可以很好支持包括我国民族文字蒙古文、维吾尔文、藏文在内的各种不同书写方向文字的文本布局。
Resumo:
To date, studies have focused on the acquisition of alphabetic second languages (L2s) in alphabetic first language (L1) users, demonstrating significant transfer effects. The present study examined the process from a reverse perspective, comparing logographic (Mandarin-Chinese) and alphabetic (English) L1 users in the acquisition of an artificial logographic script, in order to determine whether similar language-specific advantageous transfer effects occurred. English monolinguals, English-French bilinguals and Chinese-English bilinguals learned a small set of symbols in an artificial logographic script and were subsequently tested on their ability to process this script in regard to three main perspectives: L2 reading, L2 working memory (WM), and inner processing strategies. In terms of L2 reading, a lexical decision task on the artificial symbols revealed markedly faster response times in the Chinese-English bilinguals, indicating a logographic transfer effect suggestive of a visual processing advantage. A syntactic decision task evaluated the degree to which the new language was mastered beyond the single word level. No L1-specific transfer effects were found for artificial language strings. In order to investigate visual processing of the artificial logographs further, a series of WM experiments were conducted. Artificial logographs were recalled under concurrent auditory and visuo-spatial suppression conditions to disrupt phonological and visual processing, respectively. No L1-specific transfer effects were found, indicating no visual processing advantage of the Chinese-English bilinguals. However, a bilingual processing advantage was found indicative of a superior ability to control executive functions. In terms of L1 WM, the Chinese-English bilinguals outperformed the alphabetic L1 users when processing L1 words, indicating a language experience-specific advantage. Questionnaire data on the cognitive strategies that were deployed during the acquisition and processing of the artificial logographic script revealed that the Chinese-English bilinguals rated their inner speech as lower than the alphabetic L1 users, suggesting that they were transferring their phonological processing skill set to the acquisition and use of an artificial script. Overall, evidence was found to indicate that language learners transfer specific L1 orthographic processing skills to L2 logographic processing. Additionally, evidence was also found indicating that a bilingual history enhances cognitive performance in L2.
Resumo:
Studies of orthographic skills transfer between languages focus mostly on working memory (WM) ability in alphabetic first language (L1) speakers when learning another, often alphabetically congruent, language. We report two studies that, instead, explored the transferability of L1 orthographic processing skills in WM in logographic-L1 and alphabetic-L1 speakers. English-French bilingual and English monolingual (alphabetic-L1) speakers, and Chinese-English (logographic-L1) speakers, learned a set of artificial logographs and associated meanings (Study 1). The logographs were used in WM tasks with and without concurrent articulatory or visuo-spatial suppression. The logographic-L1 bilinguals were markedly less affected by articulatory suppression than alphabetic-L1 monolinguals (who did not differ from their bilingual peers). Bilinguals overall were less affected by spatial interference, reflecting superior phonological processing skills or, conceivably, greater executive control. A comparison of span sizes for meaningful and meaningless logographs (Study 2) replicated these findings. However, the logographic-L1 bilinguals’ spans in L1 were measurably greater than those of their alphabetic-L1 (bilingual and monolingual) peers; a finding unaccounted for by faster articulation rates or differences in general intelligence. The overall pattern of results suggests an advantage (possibly perceptual) for logographic-L1 speakers, over and above the bilingual advantage also seen elsewhere in third language (L3) acquisition.
Resumo:
The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.
Resumo:
This paper addresses the problem of resolving ambiguities in frequently confused online Tamil character pairs by employing script specific algorithms as a post classification step. Robust structural cues and temporal information of the preprocessed character are extensively utilized in the design of these algorithms. The methods are quite robust in automatically extracting the discriminative sub-strokes of confused characters for further analysis. Experimental validation on the IWFHR Database indicates error rates of less than 3 % for the confused characters. Thus, these post processing steps have a good potential to improve the performance of online Tamil handwritten character recognition.
Resumo:
This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.
Resumo:
In this work, we describe a system, which recognises open vocabulary, isolated, online handwritten Tamil words and extend it to recognize a paragraph of writing. We explain in detail each step involved in the process: segmentation, preprocessing, feature extraction, classification and bigram-based post-processing. On our database of 45,000 handwritten words obtained through tablet PC, we have obtained symbol level accuracy of 78.5% and 85.3% without and with the usage of post-processing using symbol level language models, respectively. Word level accuracies for the same are 40.1% and 59.6%. A line and word level segmentation strategy is proposed, which gives promising results of 100% line segmentation and 98.1% word segmentation accuracies on our initial trials of 40 handwritten paragraphs. The two modules have been combined to obtain a full-fledged page recognition system for online handwritten Tamil data. To the knowledge of the authors, this is the first ever attempt on recognition of open vocabulary, online handwritten paragraphs in any Indian language.
Resumo:
In this paper, we propose a handwritten character recognition system for Malayalam language. The feature extraction phase consists of gradient and curvature calculation and dimensionality reduction using Principal Component Analysis. Directional information from the arc tangent of gradient is used as gradient feature. Strength of gradient in curvature direction is used as the curvature feature. The proposed system uses a combination of gradient and curvature feature in reduced dimension as the feature vector. For classification, discriminative power of Support Vector Machine (SVM) is evaluated. The results reveal that SVM with Radial Basis Function (RBF) kernel yield the best performance with 96.28% and 97.96% of accuracy in two different datasets. This is the highest accuracy ever reported on these datasets
Resumo:
The objective of the present study was to evaluate the efficiency of the process of biodigestion of the protein concentrate resulting from the ultrafiltration of the effluent from a slaughterhouse freezer of Nile tilapia. Bench digesters were used with excrements and water (control) in comparison with a mixture of cattle manure and effluent from the stages of filleting and bleeding of tilapias. The effluent obtained in the continuous process (bleeding + filleting) was the one with highest accumulated population from the 37th day, as well as greatest daily production. Gases composition did not differ between the protein concentrates, but the gas obtained with the use of the effluent from the filleting stage presented highest methane gas average (78.05%) in comparison with those obtained in the bleeding stage (69.95%) and in the continuous process (70.02%) or by the control method (68.59%).
Resumo:
The influence of weight (W) category of the rainbow trout on processing yield and chemical composition of the entire eviscerated fish and fish fillet was analyzed. A completely randomized design was employed for processing variables (W1 = 300 to 370 g and W2 = 371 to 440) coupled to a 2 x 2 factorial scheme for the chemical composition (W1 and W2 and forms of presentation: fillet and whole eviscerated fish). W1 showed higher yield for entire eviscerated fish (83.00%) and head (13.27%), but a lower yield for the viscera (17.00%), when compared to W2. We did not affect abdominal muscle yield, fillet with or without skin, skin percentage and residues. There were significant differences between W for moisture (W1 = 72.30% and W2 = 71.15%) and lipids (CP1 = 7.96% and CP2 = 9.04%) rates. Fillet moisture contents (73.74%) and crude protein (19.05%) were higher (p < 0.01) than for entire eviscerated fish (69.71% and 17.81%, respectively). Ash (2.15%) and lipid (10.48%) rates were higher (p < 0.01) for entire fish when compared to those of fillets (1.16% and 6.52%, respectively). The slaughter of fish weighing between 300 and 370 g and their fillets are more adequate for the market.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)