Back to all How-tos

Manipulating PDF Files with Metadata

In this How-To we’ll be taking a look at how to manipulate metadata in Workflow, without any sort of document or template. We’ll be using Metadata tools exclusively to sort, split, and merge a PDF job.

Why not do this in Connect? Yes this could be done in Connect, with job and output presets. However, when the document does not need to be changed it is faster and less resource-intensive to use Metadata exclusively!

Step 1 - Creating and Setting Up the Process

Since we’ll be working exclusively with a PDF file and metadata, let’s start by setting up a brand new process, and set up howto-pdf-metadata-sample.pdf as a debug file:

  • In the Home tab, click on the Process button.
  • In the Debug tab, click the Select button then browse to the howto-pdf-metadata-sample.pdf file you saved on your hard drive.
  • You can verify that the PDF is loaded by clicking View as PDF in the Debug tab. It should display 261 pages.

Step 2 - Generating Metadata and Splitting

Our following step is to create Metadata from the PDF, and then separate it into proper invoice batches. This is done exclusively through native Metadata tools available in Workflow.

  • In the Metadata Related category of the Plug-in Bar, locate the Create Metadata task and drag it in between the input and output.
  • Select the None/ Do Not Use a Password option in the list of documents.
  • Click OK to confirm adding the task.

At this point we can do our first Debug to see what happens after the task. In the Debug tab, click Step until after the Create Metadata task. When that heppens the View Metadata button turns on. Clicking it will reveal a single job, containing a single group, with a single document, that has 261 pages. Obviously, that’s not at all what we want! So let’s separate the whole batch by Document.

  • Click Stop to end the debug mode.
  • Drag the Metadata Level Creation after the existing Create Metadata task.
  • In the Document line, select Delimiter as Begin When.
  • Click in the Rule box, then on [...]
  • Click Insert a Condition at the top.
  • Set the left operand to be the the Page 1 of text on the page. Right-click on the left operand box, select Get Data Location, draw a box around the Page 1 ofand click Ok
  • Change the operator to Contains and the right operand to the text Page 1 of. This gives a preview condition that looks like (region(?,7.26041,1.10416,8.05208,1.35416,KeepCase,NoTrim) Contains Page 1 of )
  • Click OK, and then OK again to save and close.

If you Step through the process again, after the 2 plugins you can go back to View Metadata and find that there are now 100 documents in the group, with a variable number of pages. Awesome!

Step 3: Adding Fields from the Data

Now it’s time to add some fields to the Metadata tho help us group and sort it.

  • Add in a new task, Metadata Fields Management, after the Level Creation.
  • With the action left to Add, click Field Information
  • Put the Level to Document, and the Field Name to CustomerID
  • For the Field value, right-click and select Get Data Location.
  • Draw a box around the variable Account # on the right (CU53615165) , and click OK.

Now, we’re in a little bit of a tricky situation – we need to add a Rule so that the field is only added from the first page of the document – otherwise even-numbered documents would have their last page (the Terms & Conditions) act as the source and not contain the right data!

But that rule can only be added when there is active metadata. So let’s do that right now.

  • Close the dialog with OK, to save our current progress (the extraction).
  • Step through until after the Metadata Level Creation but before it reaches the Metadata Fields Management we just added.
  • Double-click on Metadata Fields Management. Now we have its properties, with an active metadata file.
  • In the line we added earlier, click in Rules then click the [...] button.
  • Click Insert a Condition at the top.
  • On the left operand, right-click and select Get Metadata Location. Switch to the Metadata tab at the top. Then navigate to the first Data Page entry on the left list. Click Data Page Index in Document then click OK.
  • Change the Operator to Not Equal
  • In the right operand, enter 1
  • Hit Ok and Ok again to save everything.
  • Click the Step button once more to run the task with those changes.

Now if you go in the View Metadata window, every Document should have a proper CustomerID field that refers to the data from the page.

However, let’s think ahead – our next tasks are going to require active metadata too. Not only can we save debug time as well as complexity, by saving our metadata file and load it in memory, like we would a debug file. In the View Metadata dialog, click the Save button on the right of the filename. Save it somewhere you’ll remember, then click OK and stop debugging.

Step 4: Grouping and Sorting

Our next step is to group & sort the documents. Both these steps are done while the metadata is loaded.

  • With Debugging stopped, click on Select in the Debug tab.
  • Go to the Metadata tab at the top, click on the Open a metadata file button at the right and load your file.
  • You should now see the metadata appear in the selector. Click Ok.
  • Drag in the Metadata Sorter after the fields management task.
  • In the Document line, click in the Sort By column. Choose the CustomerID field at the bottom (it will appear with a different icon as it’s a property we added)
  • Click Ok to save.
  • “Grouping” in this case is the use of another Metadata Level Creation task. Drag one in after the sorter.
  • This time it’s the Group that we need to change to Begin When. The Rule should be as such:
    • The Left Operand should be a Metadata Selection, where the selection is the CustomerID field at the Document level.
    • The Operator should be Value Changed
  • Click Ok and then Ok again.

Now we can start the debug again – this time, specifically at the point we want to. Because our loaded metadata was created after the Fields Management, we can start at the following task. – Click on the Metadata Sorter task, then in the Debug tab, click on Step From Here to start the debug mode. – Click on Step to go through the Metadata Sorter (you can examine the metadata to confirm it’s been sorted) – Click on Step again to go through the Metadata Level Creation. The Metadata should now show 20 groups.

Great! Now you can see just how much faster that was, since the Creation and Field Management step are the slowest in this process due to what they have to do.

At this point, you might possibly want to complement your metadata with eternal sources, such as getting an email address from a CSV file based on the Customer ID, or other awesome things. Refer to the Complement Metadata from Workflow how-to for details on how to achieve this!

Step 5 - Generating Separate PDF Files

Now on to the last step – let’s create PDF files for each customer, with all their invoices in them. “But how?” you ask. Super simple!

  • Add the Metadata Sequencer task.
  • Leave the Metadata Level to Group, and set the Following number of occurences of the level option to 1
  • Click Ok
  • Add the Create PDF task. It’s in the Actions category!
  • Leave the Document to “None” (passthrough).
  • Click Ok.
  • Change the output to a Send to Folder, found in the Output category.
  • Select a folder where to output the PDF files.
  • Change the Filename to a Metadata Selection, which should be, again, our CustomerID located in the Document level. Add .pdf in the end (for good measure).

Now you’re ready to launch the whole thing! If nothing blew up in your mind, simply click the Run button in the Debug tab and just… Watch it go! After a few moments, you should have 20 PDFs in your chosen folder, named with the Customer ID. Success!

Leave a Reply

Your email address will not be published. Required fields are marked *