Can I extract data from an image based PDF using some sort of OCR method. Does PP have that function readily available?
Unfortunately, that’s not a feature of OL Connect.
Some users have reported success when calling a well known 3rd party open-source environment named Tesseract. It performs OCR on images and can even create a PDF containing only text as one of its output options.
Thanks Phil. Would you know how I would go about implementing/calling Tesseract into datamapper?
I think one would use Workflow to handle the interaction with Tesseract and OCR the original the PDF, and then you’d Data Map the post-OCR PDF.