Varchar source column with double quotes in value

system · December 16, 2007, 9:59pm

Hi again,

Never mind … found out what’s wrong.

The bulkloader options in the Target had double quotes as text delimiter,
which is the default value. So, the bulkloader assumes that when it sees a double quote, any character after it is a value for a next column.

Hi,

I have dataflow with the following steps:

SQL Source>Query Transform>Table Comparison>Key Generation>Target

One source column has double quotes in the column value, for example,
COLUMN1 = “A” BCDE
COLUMN1 = ABC "DE

For the two examples above, the dataflow will insert
1 row with COLUMN1 = NULL
1 row with COLUMN1 = ABC

It appears that the dataflow truncates any character after the double quotes.

However, a dataflow that only has these steps will capture the full COLUMN1 values (including the double quotes).
SQL Source>Query Transform>Target

So, now I suspect that it is either at the table comparison or key generation transforms when the truncation happens.

Thanks in advance for any input on this.

lenni (BOB member since 2007-07-12)

system · December 17, 2007, 3:43pm

Double quotes are not delimiters, but text field indicators. Which means DB loader load contents in quotes as string characters.

There are a few things you could do:
1. Remove the default of double quotes for string characters from DB loader
2. Remove any double quotes from text fields if it does not impact meaning of text in DI transform, e.g. Query Transform
3. Using fixed length for your file format instead of using delimiter.

Hope this is clear for you.

Thanks,
Billy

Billy_qing (BOB member since 2007-11-16)

system · December 17, 2007, 8:51pm

Thanks for your reply.

I cannot remove the double quotes in the column nor use fixed length format. I cannot choose another character as delimiter too. So on this one, I will just have to discuss options with my client.

Anyway, I’m adding the following to this post, in case it may be useful to others in the future.

When a default double quote text delimiter (that is at least what DI calls it) is removed from bulkloader options, DI just puts it back in there.
When bulkloader is set to NONE, columns with double quotes in the text are not a problem, they are copied as they are.

lenni (BOB member since 2007-07-12)

system · December 19, 2007, 3:34pm

HI,
Sounds like you need to escape the " character. I am having a similar issue with strings containing backslashes. Only thing I could come up with was using regular expressions to search for characters that need to be escaped.
What is your database and how are you connecting? You might be able to do something with ANSI quote settings. Although, re-reading your post it sounds like DI is doing the truncating of the string? I am going to start a thread on my own escaping issue and will try to follow up if I get anything useful. Luck.

ficuscr (BOB member since 2007-12-18)

system · February 27, 2014, 9:49am

Seriously?

The csv format is very very very simple to implement, and first-year computer-science students are usually not presented with such simple tasks.

The only interesting part of it is that there are text fields that may contain arbitrary characters, including the characters used for delimiting columns, rows, or escaping strings. So it is not enough to define an escape mechanism for column delimiters like “put them in double-quotes”, but you also need an escape mechanism for those double-quotes, in case they appear in a string.

There are several simple solutions to this problem, and every time I run into a software product that has made a shot at them, I find it to be an epic fail. I shouldn’t be surprised any more.

But, seriously? BODS as well? An ETL tool that some customers probably buy exclusively for the purpose of handling csv files?

Please tell me I am wrong and I missed something simple and essential.

deskremark (BOB member since 2012-11-01)

system · February 27, 2014, 10:14am

Assume that you are part of the product development group of BODS. Or another application vendor?

What would your solution be?

Hint: BODS does support alternative seperators out of the box. But as long as people/architects still use , and ; etc as seperators or think that “” is nice to use…

by the way, this is not about csv, but the usage of a bulk loader. BODS uses the bulkloader as provided by the database vendor…

Johannes Vink (BOB member since 2012-03-20)

system · March 4, 2014, 7:22am

Me? I am just a consultant who ended up helping companies with BODS by random chance.

(Not sure if I understand your last remark correctly. Sorry if this is off-topic. My concern is that I want to read / write data from / to files in a format that I can process further with my own code. CSV is just an obvious choice, but if there are any others I’m open to them.)

The point is not about any particular choice of separator symbols, but about data that may contain any symbol in some column, including column separator, line separator, quote, and double quote.

The solution Excel offers is this:

if strings contain column or line separators or double quotes, they are wrapped in double quotes.
if the wrapped string contained double quotes, each contained double quote is doubled once more (from " to “”). this way, a parser can recognise the end of the string by finding one double quote followed by another symbol.

If I have not missed anything, BODS supports 1., but not 2. Therefore, no matter what quotation symbol I choose, if it ends up in a string, there is no sound way for a software reading the generated csv file to decide whether a string ends, or whether it just contains that symbol.

So I guess in short, my question is this: If I have no information about what characters can occur in some text column, how do I export it to a file, and with what algorithm can I parse it into my own software?

Thanks!

deskremark (BOB member since 2012-11-01)