Hi to all,
I’m here to describe a new problem found using DI with XML files.
I’ve a DI job that downloads a XML file that use the UTF-16 encoding. I’ve defined the XSD schema with these line, to set the encoding:
<?xml version="1.0" encoding="UTF-16"?>
The same line is in the XML read by the job.
So, I think that the encoding is correctly specified, isn’t it?
The answer is: no.
If in the file there are 16bit characters, such as “ä”, I’ve got the following runtime error:
|Dataflow MKTDF_TRP_01|Reader READ MESSAGE MKT_SCHEMA_TRP_01 OUTPUT(MKT_SCHEMA_TRP_01)
XML parser failed: Error: <An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2 (t) of a 3-byte sequence.> at line <1023>, char <109> in file <E:\...\TRP_01.xml>.
|Dataflow MKTDF_TRP_01|Reader READ MESSAGE MKT_SCHEMA_TRP_01 OUTPUT(MKT_SCHEMA_TRP_01)
XML parser failed because message is not encoded in valid UTF-8. Please see previously displayed error message.
The question is: why the XML encoding specified both in the XSD schema and in the XML file is ignored by the job’s file format? Maybe must I set other “values” in the environment or in the db that hosts the DI repository? I don’t understand where the problem is…
By now, I’ve fixed this problem by replacing the “UTF-16” string in the file with “ISO8859-1” at download time. In this way, the job exits successfully without any error. But this is not a good solution.
Does anyone have an idea?
Kindest regards,
bye
marbis (BOB member since 2009-01-09)