From Digitised Manuscript to Database: Automatic Processing of Civil Registers
Absztrakt
This article presents the first milestone in our research on the systematic processing of Hungarian civil registers, encompassing the stages from digitization to data integration into a structured database, culminating in the dissemination of results via an online platform. The initial phase of the project focused on the handwritten birth, marriage, and death certificates
of the Abony municipality in Pest County, covering the period from 1895 to 1980. During this pilot initiative we created a comprehensive workflow that facilitated the conversion of digitized images into a structured SQL database, leveraging automated processes and machine learning
techniques. The successful completion of this pilot project represents a significant stage, establishing a replicable framework that can be extended to other municipalities, as all components of the workflow are now operational, and can be called sequentially.
Kulcsszavak:
handwritten text recognition, HTR, database, civil registryHivatkozások
Li, M. et al. (2022) TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models (No. arXiv:2109.10282). arXiv. https://doi.org/10.48550/arXiv.2109.10282
Nemeskey D. M. (2021) Introducing huBERT. In: Berend, G., Gosztolya, G., Vincze, V. (szerk.) XVII. Magyar Számítógépes Nyelvészeti Konferencia. p. 3–14. ISBN 978-963-306-781-9. Available at: https://acta.bibl.u-szeged.hu/73353/ (Accessed: 2025. 08. 06)
