XML encoding: UTF-16 does really work ?

Hi to all,

I’m here to describe a new problem found using DI with XML files.

I’ve a DI job that downloads a XML file that use the UTF-16 encoding. I’ve defined the XSD schema with these line, to set the encoding:

<?xml version="1.0" encoding="UTF-16"?>

The same line is in the XML read by the job.

So, I think that the encoding is correctly specified, isn’t it?
The answer is: no. :hb:

If in the file there are 16bit characters, such as “ä”, I’ve got the following runtime error:

|Dataflow MKTDF_TRP_01|Reader READ MESSAGE MKT_SCHEMA_TRP_01 OUTPUT(MKT_SCHEMA_TRP_01)
XML parser failed: Error: <An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2 (t) of a 3-byte sequence.> at line <1023>, char <109> in file <E:\...\TRP_01.xml>.

|Dataflow MKTDF_TRP_01|Reader READ MESSAGE MKT_SCHEMA_TRP_01 OUTPUT(MKT_SCHEMA_TRP_01)
XML parser failed because message is not encoded in valid UTF-8. Please see previously displayed error message.

The question is: why the XML encoding specified both in the XSD schema and in the XML file is ignored by the job’s file format? Maybe must I set other “values” in the environment or in the db that hosts the DI repository? I don’t understand where the problem is… :reallymad:

By now, I’ve fixed this problem by replacing the “UTF-16” string in the file with “ISO8859-1” at download time. In this way, the job exits successfully without any error. But this is not a good solution.

Does anyone have an idea?

Kindest regards,
bye


marbis :it: (BOB member since 2009-01-09)

Could you find resolution for this?I am facing the same problem.Please help


sriprameela :india: (BOB member since 2009-10-22)

What version of DI is that?

Eventhough the Source XSD is defined for UTF-16, what codepage the job would use? Ideally it should switch to UTF-16 I believe. Are you seeing any CodePage switch message in the Job run?


ganeshxp :us: (BOB member since 2008-07-17)