By Dr Ġorġ Mallia

Compelling PDFs to Give Up Their Text
With the advent of large language models, large collections of text are more crucial than ever nowadays, and PDFs are abundant and important sources of Maltese text. But how do you reliably extract clean Maltese text, given all the challenges with doing so? The NOMOCRAT project seeks to do just that – extract Maltese text while leaving out errors.



Comments are closed for this article!