OL Learn

Extract metadata to csv in Workflow

I need to create a companion file for each PDF generated by PReS with the following fields:

Docno
start-page
end-page
Customer Name
Number of Images
Number of paper pages

I think this needs to be done as after the PDF output is generated since that would be the only way to determine the start-end pages and number of paper pages for each record.

Any help/tips would be greatly appreciated.

Can you clarify what you mean by “Number of Images” and “Number of paper pages”?

For the rest, you can certainly get data about media into/out of MetaData. Heres’ a script that I use, I think Phil provided it originally.

var xhttp = new ActiveXObject("Microsoft.XMLHTTP");

var  metaFile = new ActiveXObject("MetadataLib.MetaFile");
metaFile.LoadFromFile (Watch.GetMetadataFilename());
var metaJob = metaFile.Job();
var metaGroup = metaJob.Group(0);

var totalPages = 0;
var documentPages = 0;
var documentCount = 0;
var mediaName;
var mediaCount;
var mediaMap = {};

var contentsetid = Watch.ExpandString("GetMeta(_vger_contentset_id[0], 10, Job.Group[0])");
var url = "http://localhost:9340/rest/serverengine/entity/contentsets/" + contentsetid + "/pages?detail=true";

xhttp.open('GET',url,false);
xhttp.setRequestHeader('Content-Type', 'application/json');
xhttp.onreadystatechange = handlerGetDataRecord;
xhttp.send();

Watch.SetVariable("totalPages", totalPages);
Watch.SetVariable("sheetCount", totalPages / 2);
Watch.SetVariable("documentCount", documentCount);
Watch.SetVariable("mediaCount", JSON.stringify(mediaMap));

function handlerGetDataRecord()
{
  if (xhttp.readyState == 4) {
    if (xhttp.status == 200) {
      var contentsetPageDetails = JSON.parse(xhttp.responseText);
      var contentsetPageDetailMap = {};

      documentCount = contentsetPageDetails.length;

      for (var idx=0; idx < contentsetPageDetails.length; idx++) {

        for (var pdx=0; pdx < contentsetPageDetails[idx].pages.length; pdx++) {
          documentPages += contentsetPageDetails[idx].pages[pdx].count;
          mediaName = contentsetPageDetails[idx].pages[pdx].media.name;
          if(isNaN(mediaMap[mediaName]))
          {
            mediaCount = 0;
          }
          else
          {
            mediaCount = mediaMap[mediaName];
          }
          mediaCount += contentsetPageDetails[idx].pages[pdx].count;
          mediaMap[''+mediaName] = mediaCount;
        }
        contentsetPageDetailMap[''+contentsetPageDetails[idx].id] = documentPages;
        documentPages = 0;
        mediaName = '';
        mediaCount = 0;
      }

      for (var idx=0; idx < metaGroup.Count; idx++) {
        var contentItemId =  metaGroup.Item(idx).FieldByName('_vger_contentitem_id');
//        metaGroup.Item(idx).Fields.Add('_vger_fld_pndPAGECOUNT', contentsetPageDetailMap[contentItemId]);
        totalPages += contentsetPageDetailMap[contentItemId];
      }

      metaFile.SaveToFile (Watch.GetMetadataFilename());
    }
  }

I create my own JSON file via Create File after the Run Script task:

{"filename":"%{filename}","pages":%{totalPages},"sheets":%{sheetCount},"documents":%{documentCount},"media":%{mediaCount}}

Thanks for the feedback!

I am getting the following error:

[0012] W3602 : Error 0 on line 24, column 1: Script.NamedItemWatch: Variable does not exist.

Any ideas?

Very much appreciate any assistance.

Thanks!

The script sets the values of the various counts into Watch Local Variables. So, you need to create all the referenced Local Variables.

That did the trick!

I needed the information at the record level (detail per record) rather than at the job level (Summary), so I created a few arrays and saved the array to the local variable.

Can you think of a more efficient way to approach this?

Not off hand, no. Whatever works!