Upland OL User community

SOLVED: XML exported with illegal characters in vbscript

In my workflow I have created a script that merges several smaller XML files into 1 big XML file.
The merging works fine but afterwards I noticed that it contains illegal characters like a ’ (which should be converted into '). Note that the source XML does not contain these characters.

I have found some additional code on internet to save it in a different way but unfortunately this has no effect for the output. Here is a small snippet for just loading and saving a XML file:

Dim SourceXML
Set SourceXML = CreateObject("MSXML2.DomDocument.6.0")

if SourceXML.Load(full_path_to_source_file) then
        FormatDocToFile SourceXML, full_path_to_output_file
end if

Sub FormatDocToFile(xmlDom , sFileName)
    Set strm = CreateObject("ADODB.Stream")
    With strm
        .Type = 1 'adTypeBinary
        set writer = CreateObject("Msxml2.MXXMLWriter")
        With writer
            .omitXMLDeclaration = False
            .standalone = True
            .byteOrderMark = False
            .encoding = "ISO-8859-1"
            .indent = True
            .output = strm
            .disableOutputEscaping = False
            set reader = CreateObject("Msxml2.SAXXMLReader.6.0")
            With reader
                Set .contentHandler = writer
                Set .dtdHandler = writer
                Set .errorHandler = writer
                .putProperty "http://xml.org/sax/properties/lexical-handler", writer
                .putProperty "http://xml.org/sax/properties/declaration-handler", writer
                .parse xmlDom
            End With
        End With

        .SaveToFile sFileName, 2
    End With
End Sub

The source XML for example contains this tag:

<equiptype>20&apos; Tank Container</equiptype>

The output file will contain:

<equiptype>20' Tank Container</equiptype>

Is there a way to make sure that illegal characters are properly converted in my output file?

Couldn’t you simply use the Save() method from the DOMDocument object?


// Create object and load job file
var myXML = new ActiveXObject("MSXML2.DomDocument.6.0");

// Add some random element to the existing XML DOM
var newNode = myXML.createNode(NODE_ELEMENT, "SomeDummyElement", "");

// Save the file

That method preserves the escaped version of characters that have to be escaped.
(That example is in JS, by the way… I really try not to do VB anymore… :stuck_out_tongue: )

Unfortunately that does not work. I used your example, removed the code in the middle as I only want to use the load and the save to make sure that the issue is with either the load or the save or both.
I assume the load function will create an XML tree where it also translates any html characters into regular characters and the save would be responsible to do the reverse.
Unfortunately the reverse does not happen and a ’ (single quote) is not translated back into ’ (sorry, the editor automatically translates the full character & apos; into the single quote)
when the XML is saved and thus generates an invalid XML.

Well as soon as you use the DOM to manipulate the XML, then some transformation occurs. In this case, escaped characters that are not mandatory (i.e. that are not required by XML to be escaped) are converted to their text equivalent (&apos; is not a mandatory escape sequence in XML).

So your only option to prevent that from happening would be to process those files as pure text files. But that is going to make your process a bit more complex because you will have to insert elements as text (including the appropriate XML syntax), as opposed to inserting an XML structure.

Thank you for the clarification.
I was totally sure that this was an illegal character but it is not. It is only illegal when using attributes and only if there is already a ’ (single quote) character there. For tag values it is recommended to be converted but not required.
I will leave my code then as is.