We are facing an issue while loading the data from a .csv file.
Problem Description
Source files are coming as pair of .csv and .dat files from different markets and also contains language specific characters. While opening the .csv file from BODS, getting file open error or not loading the data properly. When verifying the Code Page of each files (using BODS), found that Code Pages are different for each files and even code pages are not consistent for the same country received in different periods/months.
We cant modify the BODS/SQL Server Settings because it may affect other jobs currently executing in same servers. So hope that, from source itself we can correct the same issue. Please help us by providing your suggestions or any other workaround for this issue.
I cannot give you an answer as such, only point you to some helpful documentation (if you are not already aware of them, and others who may come by this thread)
A file is a large set of bytes. So what does the byte value 0xC0 mean? Is it a L character (codepage 855)? Or a À (ISO Latin 1)? Or a beautiful Thai character?
How do you(!) know what the correct character should be?
I would recommend using Unicode within the SQL Server database. This means setting the BODS codepage to utf-8 (or maybe utf-16) on the datastore configuration and ensuring all columns that can contain language text are defined as nvarchar in the database.
For files the codepage settings are specific to each file format. I would probably try using utf-8 to start with and see if DS works it out. If your data suppliers can’t be persuaded to find a standard and stick to it you will be using specific file formats and dataflows for each encoding.