2 resultados para Phonetic alphabet.
em Repositório Científico da Universidade de Évora - Portugal
Resumo:
Nesta dissertação, é estudado o Falar de Marvão, um concelho de raia, do Alto Alentejo, com baixa densidade demográfica, população muito envelhecida e uma taxa de analfabetismo acima da média nacional e regional. O presente estudo é composto por cinco capítulos. Nos dois primeiros, são apresentados os estudos dialectológicos realizados no distrito de Portalegre e é caracterizado o concelho de Marvão. O estudo do falar desenrola-se ao longo dos três capítulos principais, dedicados aos aspectos fonético-fonológicos e morfo-sintácticos, bem como ao léxico relacionado com o Homem. O Falar de Marvão está integrado nos dialectos portugueses centro-meridionais, mais especificamente na variedade da Beira Baixa e Alto Alentejo. Assim, apresenta a maior parte das características identificadas pelos linguistas do século XX sobre esta região dialectal, demarcando-se, contudo, por algumas particularidades que o distinguem dos falares dos concelhos circundantes, essencialmente ao nível de alguns aspectos fonético-fonológicos e do léxico. /ABSTRACT: ln this dissertation is presented a study on The Marvão 's Dialect, a bordering district from Alto Alentejo, with a low demographic density, very old population and a rate of illiteracy above the national and regional average. This study is composed by five chapters. ln the two first chapters, are presented the dialectological studies, which took place in the district of Portalegre, and there is also characterized the district of Marvão. The study of the dialect is developed along the three main chapters, which are dedicated to the phonetic, phonologic, morphologic and syntactic aspects, as well as the lexicon related to the human being. The Marvão 's dialect is integrated in the centre-meridional portuguese dialects, specifically in the Beira Baixa and Alto Alentejo’s diversity, presenting the main characteristics identified in this dialectical region by the linguists of the XX century. However, it distinguishes itself by some particularities, which differentiate it from the dialects spoken in the surrounding districts, mainly on the level of some phonetic and phonologic aspects and the lexicon.
Resumo:
Bangla OCR (Optical Character Recognition) is a long deserving software for Bengali community all over the world. Numerous e efforts suggest that due to the inherent complex nature of Bangla alphabet and its word formation process development of high fidelity OCR producing a reasonably acceptable output still remains a challenge. One possible way of improvement is by using post processing of OCR’s output; algorithms such as Edit Distance and the use of n-grams statistical information have been used to rectify misspelled words in language processing. This work presents the first known approach to use these algorithms to replace misrecognized words produced by Bangla OCR. The assessment is made on a set of fifty documents written in Bangla script and uses a dictionary of 541,167 words. The proposed correction model can correct several words lowering the recognition error rate by 2.87% and 3.18% for the character based n- gram and edit distance algorithms respectively. The developed system suggests a list of 5 (five) alternatives for a misspelled word. It is found that in 33.82% cases, the correct word is the topmost suggestion of 5 words list for n-gram algorithm while using Edit distance algorithm the first word in the suggestion properly matches 36.31% of the cases. This work will ignite rooms of thoughts for possible improvements in character recognition endeavour.