From Digitised Manuscript to Database: Automatic Processing of Civil Registers

Authors

  • Kata Ágnes Szűcs
    Affiliation
    National Archives of Hungary, digital humanist
  • Noémi Vadász
    Affiliation
    National Archives of Hungary, digital humanist
  • Zsolt Béla Záros
    Affiliation

    National Archives of Hungary, senior developer

  • Zoltán Szatucsek
    Affiliation

    National Archives of Hungary, director

  • Zsolt István Bánki
    Affiliation
    National Archives of Hungary, head of department
https://doi.org/10.3311/celisr.40895

Abstract

This article presents our research on the systematic processing of Hungarian civil registers, encompassing the stages from digitization to data integration into a structured database, culminating in the dissemination of results via an online platform. The initial phase of the project focused on the handwritten birth, marriage, and death registers of the Abony municipality in Pest County, covering the period from 1895 to 1980. During this pilot initiative we created a comprehensive workflow that facilitated the conversion of digitized images into a structured SQL database, leveraging automated processes and machine learning techniques. The successful completion of this pilot project represents a significant milestone, establishing a replicable framework that can be extended to other municipalities, as all components of the workflow are now operational.

Keywords:

databases, htr, handweitten text recognition, civil registry

References

Li, M. et al. (2022) TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models (No. arXiv:2109.10282). arXiv. https://doi.org/10.48550/arXiv.2109.10282

Nemeskey D. M. (2021) Introducing huBERT. In: Berend, G., Gosztolya, G., Vincze, V. (szerk.) XVII. Magyar Számítógépes Nyelvészeti Konferencia. p. 3–14. ISBN 978-963-306-781-9. Available at: https://acta.bibl.u-szeged.hu/73353/ (Accessed: 2025. 08. 06)

Published Online

2025-09-29

How to Cite

Szűcs, K. Á., Vadász, N., áaros, Z. B., Szatucsek, Z. and Bánki, Z. I. (2025) From Digitised Manuscript to Database: Automatic Processing of Civil Registers , Central European Library and Information Science Review (CELISR), 2(3). https://doi.org/10.3311/celisr.40895

Issue

Section

Studies