How to handle source files with different code page

system · June 13, 2013, 5:06am

Hi Gurus,

We are facing an issue while loading the data from a .csv file.

Problem Description
Source files are coming as pair of .csv and .dat files from different markets and also contains language specific characters. While opening the .csv file from BODS, getting file open error or not loading the data properly. When verifying the Code Page of each files (using BODS), found that Code Pages are different for each files and even code pages are not consistent for the same country received in different periods/months.

We cant modify the BODS/SQL Server Settings because it may affect other jobs currently executing in same servers. So hope that, from source itself we can correct the same issue. Please help us by providing your suggestions or any other workaround for this issue.

TAT (BOB member since 2012-09-20)

system · June 17, 2013, 11:03am

I cannot give you an answer as such, only point you to some helpful documentation (if you are not already aware of them, and others who may come by this thread)

http://wiki.sdn.sap.com/wiki/display/EIM/Multiple+Codepages

Darth Services (BOB member since 2007-11-20)

system · June 17, 2013, 2:47pm

A file is a large set of bytes. So what does the byte value 0xC0 mean? Is it a L character (codepage 855)? Or a À (ISO Latin 1)? Or a beautiful Thai character?

How do you(!) know what the correct character should be?

Werner Daehn (BOB member since 2004-12-17)

system · June 18, 2013, 2:57pm

I would recommend using Unicode within the SQL Server database. This means setting the BODS codepage to utf-8 (or maybe utf-16) on the datastore configuration and ensuring all columns that can contain language text are defined as nvarchar in the database.

For files the codepage settings are specific to each file format. I would probably try using utf-8 to start with and see if DS works it out. If your data suppliers can’t be persuaded to find a standard and stick to it you will be using specific file formats and dataflows for each encoding.

dastocks (BOB member since 2006-12-11)