Compelling PDFs to Give Up Their Text

With the advent of large language models, large collections of text are more crucial than ever nowadays, and PDFs are abundant and important sources of Maltese text. But how do you reliably extract clean Maltese text, given all the challenges with doing so? The NOMOCRAT project seeks to do just that – extract Maltese text while leaving out errors.

Continue reading