OL Learn

Most efficient way to extract XMP data from a PDF into Workflow

I am trying to extract XMP data from a PDF to use as my data source in Workflow, any ideas??

You could use the “Embeb/Extract PlanetPress Suite Metadata” plugin to extract said XMP. That would become the metadata of your process. Then using a script, you could convert the metadata into a text file (your choice) and use it as your input.

Here is an example made from one of our Guru in Professional Service departement.

Hope that helps.

Hi hamelj

I had tried the Extract plugin but with no joy, I get the following error:

W3976: Task could not find any metadata to extract.
Embed/Extract PlanetPress Workflow Metadata: W1603 : Plugin failed - 16:22:18 (elapsed time: 00:00:00:003)

The meta data is there:

I’m pretty sure this is going to require custom coding through the Acrobat Javascript API. I’ll start you down that rabbithole here: https://acrobatusers.com/tutorials/get_set_metadata

This eventually got me to their page about metadata which gives, at least, some concrete code examples of how to read/write this data. Sadly, I cannot link directly to it as copying/pasting the URL just takes you back to the table of contents located here: http://help.adobe.com/livedocs/acrobat_sdk/10/Acrobat10_HTMLHelp

You can find by following the breadcrumbs though: JavaScript > JavaScript for Acrobat API Reference > JavaScript API > Doc > Doc properties > metadata

Have a look at ExifTool (http://owl.phy.queensu.ca/~phil/exiftool/). It is command line utility so you could run it in External Program Plugin.

Alternatively, you could script it - there are some libraries available (see http://blog.matt-swain.com/post/25650072381/a-lightweight-xmp-parser-for-extracting-pdf)

PDFlib and verypdf also have some command line tools on offer.

Looking further deep into this, turns out that I was mistaking XMP with our Workflow metadata.

Benoit, one of our pre-sales, came out with this script using the Alambic API:

Option Explicit
dim myPDF, myXMP
Set myPDF = Watch.GetPDFEditObject
myPDF.Open Watch.GetJobFileName, false
myXMP =myPDF.getXMP
Watch.SetJobInfo 9, myXMP
Watch.log myXMP, 2
myPDF.Save False

In your Workflow, you then use the CreateFile plugin and add the %9 jobinfos in it. then use the created file as your input (XML).

Can you try this with your PDF?

Perfect, worked like a charm, thanks Hamel