We have an invoice process where we receive a number of invoices in XML format. A PDF is created for each XML.
For the invoices there are 2 possibilities:
1. Test prints - These are test invoices that do not go to the customer. The finance department can check them and remove any that they want from the batch (e.g. invoices in dispute)
2. Final prints - Whenever the test print batch is ok they run the batch as final prints where invoices are created that go to the customer
For both options the process goes through each individual XML file separately to create the PDF.
For option 1 each individual PDF is merged into 1 big PDF and sent to the finance person so they can check the batch. Individual invoices are not stored permanently in our document storage system.
For option 2 each individual PDF is saved for later, together with some meta data about the receiver until the batch is complete and then sent out to each customer and each invoice is also stored in our document storage system.
In both cases the creation of the invoices take a considerable amount of time and I want to shorten this.
I already found out that if I merge all the XML files into 1 big XML file in the beginning of the process and then pass the big file to my All in One Data mapping configuration it saves me a lot of time.
This however only works for option 1 but for option 2 I need the individual PDF files because most of them are emailed separately to each customer.
I am trying to find out if there is an efficient way to speed up option 2.
One way would be to split the resulting PDF file based on the page number by searching for a specific text in a certain region of each page.
But I was wondering if it is also possible to create meta data for each pdf invoice and store it in the PDF where I can then later extract it to get the correct boundaries?
At the moment I store the customer information (name, email address, etc) in a separate small XML file that I feed, together with the PDF, to my email process to send everything out to the customer.
If I can store this in the PDF as meta data I would no longer need to store it separately and can then extract it from the invoice process.
If it is possible to store the meta data in each pdf invoice, would it then be faster to extract the meta data and use it for splitting invoices than extract a region of the PDF instead?
Also how would I go about this as currently I do not do much with meta data?