The bulkloader options in the Target had double quotes as text delimiter,
which is the default value. So, the bulkloader assumes that when it sees a double quote, any character after it is a value for a next column.
Double quotes are not delimiters, but text field indicators. Which means DB loader load contents in quotes as string characters.
There are a few things you could do:
1. Remove the default of double quotes for string characters from DB loader
2. Remove any double quotes from text fields if it does not impact meaning of text in DI transform, e.g. Query Transform
3. Using fixed length for your file format instead of using delimiter.
I cannot remove the double quotes in the column nor use fixed length format. I cannot choose another character as delimiter too. So on this one, I will just have to discuss options with my client.
Anyway, I’m adding the following to this post, in case it may be useful to others in the future.
When a default double quote text delimiter (that is at least what DI calls it) is removed from bulkloader options, DI just puts it back in there.
When bulkloader is set to NONE, columns with double quotes in the text are not a problem, they are copied as they are.
HI,
Sounds like you need to escape the " character. I am having a similar issue with strings containing backslashes. Only thing I could come up with was using regular expressions to search for characters that need to be escaped.
What is your database and how are you connecting? You might be able to do something with ANSI quote settings. Although, re-reading your post it sounds like DI is doing the truncating of the string? I am going to start a thread on my own escaping issue and will try to follow up if I get anything useful. Luck.
The csv format is very very very simple to implement, and first-year computer-science students are usually not presented with such simple tasks.
The only interesting part of it is that there are text fields that may contain arbitrary characters, including the characters used for delimiting columns, rows, or escaping strings. So it is not enough to define an escape mechanism for column delimiters like “put them in double-quotes”, but you also need an escape mechanism for those double-quotes, in case they appear in a string.
There are several simple solutions to this problem, and every time I run into a software product that has made a shot at them, I find it to be an epic fail. I shouldn’t be surprised any more.
But, seriously? BODS as well? An ETL tool that some customers probably buy exclusively for the purpose of handling csv files?
Please tell me I am wrong and I missed something simple and essential.
Assume that you are part of the product development group of BODS. Or another application vendor?
What would your solution be?
Hint: BODS does support alternative seperators out of the box. But as long as people/architects still use , and ; etc as seperators or think that “” is nice to use…
by the way, this is not about csv, but the usage of a bulk loader. BODS uses the bulkloader as provided by the database vendor…
Me? I am just a consultant who ended up helping companies with BODS by random chance.
(Not sure if I understand your last remark correctly. Sorry if this is off-topic. My concern is that I want to read / write data from / to files in a format that I can process further with my own code. CSV is just an obvious choice, but if there are any others I’m open to them.)
The point is not about any particular choice of separator symbols, but about data that may contain any symbol in some column, including column separator, line separator, quote, and double quote.
The solution Excel offers is this:
if strings contain column or line separators or double quotes, they are wrapped in double quotes.
if the wrapped string contained double quotes, each contained double quote is doubled once more (from " to “”). this way, a parser can recognise the end of the string by finding one double quote followed by another symbol.
If I have not missed anything, BODS supports 1., but not 2. Therefore, no matter what quotation symbol I choose, if it ends up in a string, there is no sound way for a software reading the generated csv file to decide whether a string ends, or whether it just contains that symbol.
So I guess in short, my question is this: If I have no information about what characters can occur in some text column, how do I export it to a file, and with what algorithm can I parse it into my own software?