OL Learn

Extract area of PDF with Alambic API

I am trying to extract the datamatrix in the attached PDF and save it to disk as an image. The format of the image is not important, it could be PDF, JPG, TIFF, PNG, SVG; as long as it’s an image I can reuse in Connect.

The datamatrix is located at approximately 7.1cm from the lafet of the page and 2.66 cm from the top and it has a height of 1.2cm

I have about 10 000 pdfs. All I need to to do is get the PDFs and extract the datamatrix into an image file and save. Someone hinted I could use the PDF Alambic API in Workflow to achieve this.

Is this possible in any way in Connect Workflow? Can anyone assist with this?
Here is the file whi I converte dto jpg as it’s not possible to upload PDF to this forum

Our tools for editing images are rather limited, I’ll admit. But there would be another way of approaching this.

Within the Workflow, there is a Barcode Scan plugin which can be used to read the value of the barcode. This information could then be stored in a variable and passed into Connect by way of the datamapper.

At this point, Connect could recreate that barcode on any document you’re creating with it.

Sadly the AlambicEdit API doesn’t allow to crop or convert to bitmap at this moment.

Alternatively, Connect 2019.2, which will be released very soon, introduces a new PDF-to-bitmap plugin that allows to convert a PDF to a bitmap. This will get you half way there.

To crop the resulting bitmap, one can use an image editing library to reduce the image to just the box around the barcode, assuming of course that it’s location is stable. For example, the ImageMagick library can be called using the Run External Program task in Workflow and features a “-crop” option that will do just that.

Hi fortiny,

Will 2019.2 only convert to BMP? Not JPG/TIF etc.

Regards,
S

The plugin will allow you to pick between PNG and JPG output. The term “bitmap” does not refer to the BMP file format, which is only one of many types of bitmap formats.

Hi Phil,

Will tiff/tif ever make the cut? Also multi page tiff?

Regards,
S

We don’t have any immediate plans to add TIFF support at this stage, unless of course we start getting significant demand for it.

It must be noted that there are tens of variants of the TIFF format, many of them very obscure while other specific formats are required for specific use, so implementing TIFF support always turns into quite the project.

hmmm, I am not just extracting the datamatrix but there is also a piece of text that circles and goes around the data matrix which is difficult to reproduce. In addition, the barcode is not always a datamatrix but can sometimes be a 39 code, 128 code and may be in color or contain smileys, icons or other images.:frowning_face:

I was hoping to use the alamic.pdfrect to extract the region, but have been unable to get this work reliably:frowning_face:

May I ask what is the end goal for all this manipulation? Maybe there is another way to achieve what you want that doesn’t involve image extraction?

The PDFRect is not for that use case in mind, it is used to set the page size when creating a new page in a PDF.

However, there is merit in your use case, to extract visual content or change page size. I’ll add this to our list for a future improvement.