OL Learn

Handling blank pages in PDFs

I need to create a job which will take PDF input, set the boundary to 4 pages, check if the third page in the set is blank and if it is discard pages 3 and 4. Any resulting sections with 2 pages should be printed simplex and any with 4 pages should be duplex.

I am happy that if I can determine if page 3 is blank I can then set a field to either Simplex or Duplex and can use this with a Retrieve Items action in the workflow to handle the print. My questions are -

How can I determine if a PDF page is empty? And how can I delete pages 3 and 4 of a set where page 3 is blank?

You can’t check if a page is blank, but you can look for a whole region (the whole page), trim that and compare it to an empty string. If it is true, you have an empty page. However, images in that page would not return anything as they are not characters, so your best bet would be to look for the absence of something that is always there, like a page number or something.

Thank you. I had thought that looking at the whole page and comparing this with an empty string might be the way to go. The problem that I have with that is that the pages are completely “free-style” so there is no predictable content that is always there. The page can also contain images.

Could pages just have images and no text at all? If not, then the “compare to empty string” would work.
If yes, then I am afraid that the only way to do this would be to purchase Adobe libraries and script yourself coded to check each page of these PDFs.

Unfortunately the content of the pages I need to look at is completely unpredictable and can be text only, images only, a combination of the two, or totally blank.

I’m suggesting to the customer that when they create the PDF they determine at that point if an individual section should be simplex or duplex and, if simplex, don’t include the blank pages. There is a marker on each first page so I could use this to define boundaries and set a variable called simplex or duplex based on the number of pages. I could then use a Retrieve Items action to pick up all the simplex ones and then another Retrieve Items to pick up the duplex ones.