Transcribing our way to improved searchability


It has been a good couple of months for the Archives’ in the online space, the Montevideo Maru website received considerable attention and continues to provide a great source of research for users both here and overseas. Destination Australia continues to grow during its beta phase and just recently we released arcHIVE (the HIVE), our new transcription platform.

By transcribing documents in the HIVE volunteers help the Archives improve the discover-ability of documents by making its contents searchable.

Originally dubbed 'The Consignment list project' (a list of records), the HIVE came about as a way to improve access and search-ability of the Archives’ holdings. Over 75% of the Archives’ collection is not listed at item level, making searching the lists a difficult and time consuming task. A large portion of those records would not be accessible for many years, if ever, if not for assistance provided by the public.

We are partnering with our Brisbane office who is undertaking the scanning of consignment lists amongst other activities. This partnership has given us the opportunity to trial new work processes including the trailing of open source technologies and different digitisation standards.

This experimentation led us to scan images at higher pixels per inch. The increased resolution of the images produced a better candidate for Optical Character Recognition (OCR), subsequently providing more accurate results for typed or printed documents. Using the Tesseract OCR engine the scanned documents yielded an accuracy level of 80% to 90%. Handwritten documents continue to give poorer results, training the engine to improve this accuracy is on our list of to do's.

Primarily, the HIVE allows members of the public to help transcribe records. Those records become searchable making the HIVE a valuable research tool for traditionally undiscovered or unlisted records! The full text searching quickly searches titles, series numbers and the transcribed text.

All that hard work is not without its rewards.

The beginnings of a game system have also been built into the HIVE. We know what it’s like for users to do hard work and not be rewarded for it. Users of the HIVE are able to build up points and then claim prizes for hitting the 50,000 mark and more perhaps personally have files scanned when they hit 500,000.

Comments are closed.