Upland OL User community

HTTP server translates UTF-8 accented chars to Unicode?

Running version 2022.2.1.5127. Using Postman to POST an XML file.
The UTF-8 character é (hex C3 A9) is translated somehow to U+00E9 (Unicode) causing an error in the XSLT object.
Did the same on another server running 2020.1.0.64373: works like a charm. (Same Postman version, same project, same datafile).

I noticed that only StylusStudio gave me an error on the XML file that I captured AFTER the HTTP server input: A character not allowed by the current encoding (0xE9) has been found.

Checked all settings (not that there are many), they are identical to the settings on the 2020 server.

Did I run into a bug? Am I missing a setting?

Have you checked the Form Data Encoding setting in Preferences>Plug-in>Inputs-HTTP server 2?
I believe we added that parameter in version 2019.1.

yes … set to UTF-8 … no difference :frowning:
upgraded to 2022.2.3.5503 … same issue.

so basically it translates double byte to single byte (C3 A9 to E9), causing an XSLT engine error.

Set it back to “System language”. This will cause the HTTP server to save the file with the default Windows encoding. Could it be that your other system (on which the same process works) has a different default encoding?

You can compare the request files that are being received by both servers. The encoding is specified on the very first line of the request file that the HTTP server generates.

done. both systems have

<?xml version="1.0" encoding="windows-1252"?>

and both have the same settings.

now we are getting somewhere: the datafile is UTF-8 encoded. So I did a search/replace from

utf-8 to windows-1252 on the datafile … and now the error is gone.

so Workflow writes the soap envelope in windows-1252 but also the data! Without changing the header … it should not do this as UTF-8 has many more characters than windows-1252 :wink:

when I set UTF-8 there is no encoding info in the soap envelop AND still the data is converted.

Workflow is not fully unicode-compliant.

We are working on alternative solutions for that, but at least, now you know how to address the issue.

yes … however sometimes you may need the UTF-8 encoding so you will run into problems when using characters not present in, in this case codepage 1252. Strange that the 2020 version did not have this issue? Was the datafile written “as is” to disc, so binary?

Actually the old versions had a problem: they wrote using the local code page, but didn’t specify that code page in the encoding, which by default means it’s UTF8. In more recent versions, we have made a few changes here and there to try and eliminate some of those odd behaviours, but in doing so, we apparently broke something that was working well for you.
Sorry about that.

Note that you should use the NodeJS Input server now instead of the HTTP server. It is much more robust and efficient, although it still cannot fix the whole unicode-compliancy issue all by itself.