Biblioteca Digital

OCR Correction Tool for Linguistic Corpora

**Autoria(s):** Hakkarainen, Jussi-Pekka; Keskitalo, Esa-Pekka
Data(s)	03/07/2014 03/07/2014 10/06/2014
Resumo	Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014 Posters, Demos and Developer "How-To's" We introduce a new tool for correcting OCR errors of materials in a repository of cultural materials. The poster is aimed to all who are interested in digital humanities and who might find our tool useful. The poster will focus on the OCR correction tool and on the background processes. We have started a project on materials published in Finno-Ugric languages in the Soviet Union in the 1920s and 1930s. The materials are digitised in Russia. As they arrive, we publish them in DSpace (fennougrica.kansalliskirjasto.fi). For research purposes, the results of the OCR must be corrected manually. For this we have built a new tool. Although similar tools exist, we found in-house development necessary in order to serve the researchers' needs. The tool enables exporting the corrected text as required by the researchers. It makes it possible to distribute the correction tasks and their supervision. After a supervisor has approved a text as finalised, the new version of the work will replace the old one in DSpace. The project has - benefitted the small language communities, - opened channels for cooperation in Russia. - increased our capabilities in digital humanities. The OCR correction tool will be available to others.
Identificador	http://www.doria.fi/handle/10024/97704 URN:NBN:fi-fe2014070432338
Idioma(s)	en
Relação	Poster Reception Open Repositories 2014 National Library of Finland, Finland
Palavras-Chave	#OCR #editing #linguistic corpora #crowdsourcing
Tipo	Poster

Acesso ao item digital