Creating sample data

OL Learn Blog Performance

This article explains how to create dummy data using, a free online data generator. The resulting data can be processed via the OL Connect DataMapper or in the case of JSON data, imported in the JSON Sample Data editor of OL Connect Designer. With these skills in hand, you can create realistic-looking data and start working on your data mapping configuration and templates.

When creating an OL Connect template, one typically starts with sample/test data. This data simulates the data used in a live/production environment and is designed to cover personalization variations, capture exceptions, and test output performance. Crafting such data can be time-consuming and often involves modifying a data export from a Line Of Business application or even worse, recreating the model by hand.

Note! Dummy data is not only a useful aid when designing or testing a template; when anonymized, it can be provided along with a Tech Support request, or even shared on a forum post.

Mocking data allows you to quickly construct randomly generated data. It has an intuitive user interface and comes with a rich set of built-in field types, including an option to repeat rows for detail lines (array). The free version lets you generate 1 000 rows of data, which can then be exported to popular formats like XML, CSV, and JSON.

In order to save your Schema, you’ll need to create an account first. This unlocks features like managing Datasets, data APIs, and Projects.

The Mockaroo user interface

Visiting the Mockaroo website automatically creates a new schema. This schema contains several pre-defined fields as shown in the image below. default schema

Fields have a Name, Type, and Options that can be used to adjust the data. The Type defines what kind of data is going to populate that field. Click the Type field to view all available data types; some examples are First Name, Last Name, Full Name, Date, Boolean (true/false), Lists, Repeating Elements, etc. The Options available for a field are defined by the selected Type and include a way to do custom calculations in the Ruby programming language.

You can use the gripper on the left-hand side to change the order and the x-icon on the right-hand side to delete the field.

The bar below the fields table lets you Preview and Download the data. Specify the number of records for your sample and select the desired output format (for example CSV, XML, Excel, or JSON). download bar

The following image shows the schema for the sample template used in the Conditional print sections blog post. It contains fields of various types and some useful formulas. Fields prefixed with two underscore characters are hidden from the output.

The following shows the sample output in respectively CSV, XML, and JSON formats.

Lucho,Louiset,Sauthoff 727,3502 HD,Utrecht,3502HD727,true,21
Nita,Medlin,Birchwood 416,7409 XG,Deventer,7409XG416,false,19
<?xml version='1.0' encoding='UTF-8'?>
    <address>Upham 378</address>
    <zip>8204 ZN</zip>
  "first_name": "Royal",
  "last_name": "Waind",
  "address": "Fairfield 23",
  "zip": "5924 CV",
  "city": "Venlo",
  "kix": "5924CV23",
  "admitted": false,
  "age": 22


Let’s have a closer look at some of the formulas used in the admission letter example of the Conditional print sections blog post.

Address format

Mockaroo’s Street Address field type generates addresses prefixed with the house number/street number. In many European countries, it is common to put the house number after the street name. In our sample data, we use the house number field for the address but also need it for the postal barcode data. For this, we used the following setup.

A field for the house number was added using the Number field type. As the field is not required in the final output, it is prefixed with a double underscore. This Number field type returns a random number which is also handy to generate unit prices for invoice data (in that case, you’d set the decimals option to 2).

Next, a field was added of type Street Name. This field uses a formula to concatenate the street name and the value of the __house_number field. We cannot simply concatenate these two fields as they both need to be strings. The .to_s method is used to convert the house number data to a string. Skipping this would cause a Ruby engine error as string concatenation is only possible with string data. The formula looks like this:

this + " " + __house_number.to_s

Alternatively use Ruby’s string concat method but again the house number needs to be converted to a string:

concat( this, " " , __house_number.to_s )

In our case, the house number is stored in a separate field because its value is used in the formula for the postal barcode field, so keeping it separate allows for reusing the value. When you don’t need a separate field the house number could be generated in the formula of the Street Name field using the random(min,max) function.

this + " " + random(1,999).to_s

Zip/postal code

The country field was set to type Country with its value set to ‘Netherlands’. This forces fields of type Postal Code and City to use Dutch city names and the Dutch postal code-like notation (a sequence of four digits followed by two uppercase letters, for example, 1234 AB). Unfortunately, the resulting postal codes lack the two uppercase Latin characters. Let’s have a look at how this can be fixed using a formula.

The following formula solves the problem by appending two hard-coded uppercase characters to the value returned in the Postal Code field

this + " AB"

But the following code randomizes the two characters, making the postal codes a bit more realistic.

this + " " + ( ('A'..'Z').to_a + ('A'..'Z').to_a).sample(2).join 

All roads lead to Rome! An alternative approach would be using a field of type Character Sequence with the following pattern: #### ^^. This field type returns a simple sequence of characters, numbers, and symbols using a pattern where # returns a random digit and ^ a random uppercase letter. The downside of this approach is that the postal code no longer belongs to the sample city. But as you are generating sample data, this is probably not relevant.

Postal barcode string

The data model of our example also contains a postal barcode string. In the Netherlands, this is typically the KIX barcode, which is based on the Royal Mail 4 State Code (RM4SC). In its most basic form, it is simply a concatenation of the postal code and the house number. The following formula combines both variables; it also removes the space from the postal code and casts the house number to a string (just like we did in the formula for the address field):

zip.delete(' ') + __house_number.to_s

Detail tables

Back to generating data. Detail tables play an important role in many OL Connect templates. Typical examples are line items for an invoice or policy information for insurance policy documents.

The simplest way to create a detail table is by adding the Repeating Element field. It generates a random number of lines by defining a minimum and maximum for the repeats.

Subsequently, add one or more fields where each field name is prefixed with the name of the Repeating Element. The following shows a basic detail table set up for an invoice.

A simple expression was added to calculate the total price for each product. The formula multiplies the values in details.quantity and details.unitprice (quantity * unitprice). The following shows the result in JSON format.

  "details": [
      "quantity": 17,
      "description": "Artichoke - Hearts, Canned",
      "unitprice": 16.05,
      "total": 272.85
      "quantity": 19,
      "description": "Pepper - Yellow Bell",
      "unitprice": 12.31,
      "total": 233.89

Custom product names

To go the extra mile with detail tables, you could upload a dataset with real product names, product codes, and additional attributes like prices and product information. This information can be dynamically referenced as part of the repeat process.

Adding such data is done by uploading a CSV file in the Datasets section of the Mockaroo website. The following dataset was created by uploading a data file containing laboratory glassware products.

In order to select the product description from the dataset, you will need to change the field type of that field to Dataset Column. Subsequently, select the data set and the desired field. Repeat these steps for the unit price.

The data from the dataset is returned as a string. In order to use the unit price in our formula it needs to be cast to a floating-point number using .to_f.

quantity * unitprice.to_f

To create nested detail rows, add a new field and prefix it with the name of the previously defined repeating element. Set the field type of the new element to Repeating Element. Subsequently, add fields to the nested level using the dot notation as shown in the image below.

Nesting non-repeating data

To create non-repeating nested data write field names using the dot notation.

<?xml version='1.0' encoding='UTF-8'?>
      <name>Becky Margerison</name>


Mockaroo has proven to be of great help for many OL Connect projects. The various field types, their options, custom formulas, and features like datasets let you create representative data. Don’t let the Ruby programming language for formulas scare you off, there are many examples and code snippets available online (Google is your friend!).

As mentioned previously, the free Mockaroo account is limited to 1000 records. With the help of a text editor like NotePad++ or Visual Studio Code, it is relatively easy to create larger datasets in case you want to test your project with bulk data.

Data generated by Mockaroo can be used to develop a Data Model in OL Connect’s DataMapper tool or utilized in the JSON Sample Data editor of the Designer directly. The JSON Sample Data editor is an excellent tool to design the ideal Data Model for your template and mock data helps you test the different template variations. Once you have a Data Model, use the DataMapper to transpose the production data into the desired structure.

Tagged in: anonymize, data, datamapper

Leave a Reply

Your email address will not be published. Required fields are marked *