OL Learn

Workflow and Connect Performance Guide

I really need a guide or utility for tuning Workflow and Connect for best possible performance.

How many of each type of engine should I run? What memory settings should I use for the various engines? How does the “self-replicating” setting in Workflow really work, which type of processes should get this setting, and what should the “threshold” value be?

If I have the Weaver Engine setting too low, I get “glyph” errors in Output Creation. Too high and I get “IPC connection” errors, both random.

Is there a utility I can run on my server that will profile my server and make suggested settings?

I suggest you to have a look at this link:
https://learn.objectiflune.com/howto/connect-performance-optimization-megathread/
You could also find a description of self replicating processes in the help file: https://help.objectiflune.com/EN/pres-workflow-user-guide/2020.2/Default.html#Workflow/Interface/Process_Properties.html#toc-0
It replicate the process in order to process several files at once

Those mainly describe Connect Design optimization. I want to understand the Connect Server and Engine performance settings at more than an intuitive or gut level. There has to be settings that work best depending on CPU numbers, memory amount, and so on.

I also understand what multithreading is in Workflow, but how the “threshold” works is a mystery and how the thread instances relate to the Connect Engine settings is also unclear.

There has to be settings that work best depending on CPU numbers, memory amount, and so on
Unfortunately, there aren’t because one of the greatest impacts on performance comes form the type of jobs you are running. Lots of small jobs require different settings than a smaller number of large jobs. And of course, a mix of large and small jobs require further fine tuning.

Starting with Connect 2019.2, the Connect Server Configuration application makes it easier to properly configure those settings by selecting an appropriate preset. However, if you still want to customize the settings, here are a few rules of thumb:

Engine memory

  • DataMapper: engine memory should only be increased when dealing with large individual records. The size of the entire job is irrelevant. Each job is processed by a single engine but we have found that having more than 2 DM engines running concurrently can actually have an adverse effect on the overall performance as it puts an undue load on the database.
  • Content creation: engine memory should be increased when dealing with large individual documents. The total number of documents is usually not relevant. Each individual document is always handled by a single engine, but a batch of documents can be handled by multiple engines running concurrently. There is no limit to the number of Content Creation engines you can run simultaneously, other than the physical limitations of the underlying hardware.
  • Output creation: these engines are used for paginated output as well as to provide conversion services to the rest of the system. The amount of memory to reserve for each engine depends on the overall size of the job and the complexity of the operations requested (for instance, additional content that relies on pre-processing the entire job will require more memory). In most environments, having 2 or 3 Output creation engines is more than sufficient to handle a steady load of incoming jobs.

Hardware

  • You should always consider that each engine uses one CPU thread (that is not strictly true, but we’re generalizing, here). So if you have an 8-Core CPU (with 16 threads), you could for instance have 2 DM + 6 CC + 2 OC engines, which would use up 10 of the 16 threads. You want to keep a certain number of cores available for the rest of the system (Workflow, MySQL, Windows, etc.)
  • Make sure you have enough RAM for each engine to run. I usually recommend 2GB of RAM per CPU thread (in our example above, we would therefore have 32GB of RAM). That doesn’t mean to say you should assign 2GB of RAM per engine: in most instances, all engines work efficiently with less than 1GB each. But the rest of the system requires memory as well and if your system is under a heavy load, you want to make sure your don’t run out of RAM because then the system will start swapping memory to disk, which is extremely costly performance-wise.

Self replicating processes
The most likely process candidates for self-replication are high-availability, low processing flows. For instance, web-based processes usually require the system to respond very quickly, so they are prime candidates for self-replication. Another example is processes that intercept print queues (or files) and simply route them to different outputs with little or no additional processing.

Processes that handle very large jobs are least likely to benefit from self-replication because each process might hog a number of resources from the system (disk, RAM, DB access) and having several processes fighting for those system rsources concurrently can had an adverse effect on performance.

By default, Workflow allows a maximum of 50 jobs to run concurrently and each self-replicating process can then use a certain percentage of that maxiumum. Obviously, if you have 10 processes that are set to use up to 20% of that maximum, then your processes will be fighting amongst themselves for the next available replicated thread. Note that with modern hardware (let’s stick with our 16vCore/32GB RAM system), you can easily double and even triple that max. On my personal system, I have set my max to 200 concurrent jobs, and that works fine. But that’s because my usage is atypical: I run a lot of performance tests, but I don’t need to prioritize one process over another.

Final note on self-replication: if you are heavily using the PlanetPress Alambic module to generate/manipulate PDFs directly from Workflow, then you should be careful when self-replicating those processes because there is a limit to the number of Alambic engines that can run concurrently. Having 10 cloned processes fighting for 4 instances of Alambic will create a bottleneck.

Conclusion
Remember that all of the above are general rules of thumb. Many other users (and in fact, many of my own colleagues!) would probably tell you that from their own experience, the values I quoted are not the most efficient. But hey, gotta start somewhere!

Hope that helps a bit.

Adding on Phil’s very thorough explanation, Workflow is still a 32-bit app (we’re working on that - can’t say more), which means that it can address at most 1.8 GB of memory in total. So the number of replicated processes should be balanced with processors and al., but also with the size of the data they are going to be processing concurrently.

Going back to those 200 replicated processes Phil mentioned… If they are small jobs, or Connect jobs for which you are using only the IDs in metadata, it should be fine. But downloading an entire DataSet in the metadata for a 100 MB XML file, this might get problematic very quickly.

This can be sometime hard to gauge for the same reasons that Phil stated at the beginning of his comment. Fortunately Workflow is usually pretty good at managing its memory efficiently. The vast majority of its plugins will stream the data rather than loading it entirely in memory. For the metadata, the size of the file is 1:1 with its size in memory. The biggest memory hog is XML which, for reasons out of scope of this discussion (read up on “DOM memory size” if you feel adventurous), can take 10x its file size in memory when being read.