OCR Correction Tool for Linguistic Corpora


Autoria(s): Hakkarainen, Jussi-Pekka; Keskitalo, Esa-Pekka
Data(s)

03/07/2014

03/07/2014

10/06/2014

Resumo

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Posters, Demos and Developer "How-To's"

We introduce a new tool for correcting OCR errors of materials in a repository of cultural materials. The poster is aimed to all who are interested in digital humanities and who might find our tool useful. The poster will focus on the OCR correction tool and on the background processes. We have started a project on materials published in Finno-Ugric languages in the Soviet Union in the 1920s and 1930s. The materials are digitised in Russia. As they arrive, we publish them in DSpace (fennougrica.kansalliskirjasto.fi). For research purposes, the results of the OCR must be corrected manually. For this we have built a new tool. Although similar tools exist, we found in-house development necessary in order to serve the researchers' needs. The tool enables exporting the corrected text as required by the researchers. It makes it possible to distribute the correction tasks and their supervision. After a supervisor has approved a text as finalised, the new version of the work will replace the old one in DSpace. The project has - benefitted the small language communities, - opened channels for cooperation in Russia. - increased our capabilities in digital humanities. The OCR correction tool will be available to others.

Identificador

http://www.doria.fi/handle/10024/97704

URN:NBN:fi-fe2014070432338

Idioma(s)

en

Relação

Poster Reception

Open Repositories 2014

National Library of Finland, Finland

Palavras-Chave #OCR #editing #linguistic corpora #crowdsourcing
Tipo

Poster