Back to all How-tos

XML: Using XPATH in the Repeat step

XML is, shall we say, a very versatile file format. Personally, I think it eclipses all other file formats in terms of simplicity of use and the variety of structures it can support. From simple addresses to multi-level jobs containing thousands of batches, clients, and invoices, there’s always a good use for XML.

But enough boasting the format: today we’re going to talk about a very specific sub-feature of XML: XPATH. More precisely, we’ll be exploring the use of XPATH to handle flat-level XML files that we want to add a bit of structure to.

How is this useful? HTTP POST data coming in from the HTTP Server Input task in Workflow. Because of the way HTTP POST works, there is no structure in the incoming data – there is only a series of fields in the order they appear in the form that sends the data, all at the same level.

For the HTTP Server Input in Workflow, this means you at least get an XML data file with these fields identified as values, which we can process, somewhat, in the DataMapper.

Step 1: Identifying your base node

So let’s say you have a web form that includes a table, with a variable number of lines in it – but always the same fields for each lines, of course. When you receive that data in the HTTP Server Input task, it will look something like this:

<?xml version="1.0" ?>
<request type="POST">
<paths count="0"/>
<values count="15">
<active_0>on</active_0>
<active_1>on</active_1>
<active_2>on</active_2>
<active_3>on</active_3>
<active_4>on</active_4>
<text_0>Item text 1</text_0>
<text_1>The quick brown fox</text_1>
<text_2>jumped over the lazy dog</text_2>
<text_3>manha manha, tu tuu tu du du</text_3>
<text_4>We are the champions my friend!</text_4>
[...]
</values>
[...]
</request>

Here, there are 2 different fields we need to put in the table: active and text. We will need to choose which one will be used to indicate how many fields we have.

Checkboxes don’t always appear
Checkboxes, by design, are the only HTML Input form element that does not appear at all in the data, if the box has not been checked. This means that if someone doesn’t check the item “3”, you wouldn’t get <active_3>off</active_3>. You would simply not get that field at all.

So in the example above, we can safely say that it’s the text field we’ll need to use.

Step 2: Creating the initial loop

So now we need to create a loop on our base field and extract it to a new detail table. This is where we will be using XPATH!

  • Create a new data mapping configuration based on an XML data source.
  • Open an XML file that contains the above type of structure.
  • Extract all the regular fields that are not part of a detail table (you know the drill, select them all and drag them to the Data Model pane!).
  • Click on any of the fields in the <values> node, and click to add a new Repeat step.
  • Leave the Repeat Type to For Each. But we’ll need to change one thing: the Collection field determines what the loop will go through. Normally, this is every iteration of the current node, which is fine when you have a node that repeats and has children. In our case however, we’ll use XPATH magic and change this collection to: ./values/*[starts-with(name(), 'text_')] .
  • Then we’ll add an extract on the first text_0 field, which will extract all 5 iterations of the text in 5 lines of the detail table. Yay for the first loop!

Now, what does the XPATH value mean? We’re using a very special function that does the following: Loop through each field in the values node (./values/*) of which the name (name()) being with (starts-with('string', 'otherstring')) the value text_. This of course loops through all 5 fields from text_0 to text_4, and then extracts that value.
Note: this example uses the starts-with() function. For an overview of XPath functions, see Mozilla: XPath Functions.

Good! On to the next step!

Step 3: Extracting additional fields

In our example above, we also need to extract the other field: active. However, we can’t simply add an extract field to our existing extraction, which specificallyonly loops on the text nodes only. We need to do a tiny bit of gymnastics to get this to work: we’ll use a Record Property and a little bit of Javascript in our configuration:

  • In the Steps pane, click on the Preprocessor step.
  • Under the Properties section, click Add, and use the following values:
    • Under Name enter fld_index
    • Under Scope choose Each record
    • Under Type choose String
    • Finally under Default Value just leave '';
  • We’ll use this new field to save the current field index which will be the value 0 to 4.
  • Between the Repeat and Extraction steps in your Steps pane, add a new Action Step.
  • Under the Run Javascript task, enter the following code: sourceRecord.properties.fld_index=data.extract('name()').split('_')[1];
  • In the Extraction step in the repeat loop, add a new extract field (the Add javascript field button)
  • In the Expression box, add the following code: data.extract('../active_'+sourceRecord.properties.fld_index);
  • Change the name of the field to something like Active.

So what are we doing, here?
Essentially, the Source Record property is set by the Action Step in each iteration of the loop, where its value becomes 0123 and 4, for each respective iteration. This value is stored in sourceRecord.properties.fld_index. Then, the extraction attempts to extract a value from active_0 to active_4.

Expanding to more data

Now that you know how to extract XML data using XPATH, let’s explore what’s necessary to expand this trick:

  • If you want to extract more fields in the same table, simply add more extract fields based on Javascript extractions, changing the name of the beginning of the field. For example, to extract gpsdata_ fields 0 to 4, just use data.extract('../gpsdata_'+sourceRecord.properties.fld_index);! You can repeat this for as many fields as you have.
  • If you want to create more than one table, you can still do this but you will need to add new Source Record properties – one for each table. You’ll also need a different Repeat loop, with its own Action step that sets the value of the new index, and Extract Step that extracts the current base node and its siblings using the JavaScript extract.

But what if my data is more complex?
You might be wondering what happens if you have multiple levels of data (tables within tables) or a more variable structure (such as a variable number of detail tables). Those can be supported in the DataMapper module, but will require additional processing before getting the extraction. This can be done with XSLT transforms or by using a JSON data structure in your submitted data and converting it to XML.

Complex POST data is difficult to deal with and thus goes beyond the scope of training & support. This means if you can’t handle it yourself, you’ll need to trust our Professional Services department to handle it for you. It’s worth it to avoid a permanent headache.

Tags
xml

Leave a Reply

Your email address will not be published. Required fields are marked *