BODS 4.0 : Overflow file option limitation

Hi All,

I have today hudge load between SAP and MSSQL (more than million records for then biggest table) with about 500 tables loaded every weeks.

I’m using the oerflow file option in order to catch every type of error :

  • Duplicate
  • Null value in key column
  • constraint …

In that way i’m not obligated to use validation transform as I don’t know every type of error before loading.

As the SAP is not very clean I have many and many record in the overflow file.

Issue : the disk space became full, and the job server shutdown … due to that. And over job can’t be executed as usual …

I’m looking for a way to limit the number of record in those job and make them fail …

My questions :

  • Is there an option to limit the number of record rejected with the overflow file option ?

  • Is there an over way to get wrong record without making the flow failed (except validation functions)

Thank you,
C.


cthybert :switzerland: (BOB member since 2009-05-18)

If the problem isn’t one huge overflow file but the sum of the overflow files getting to big you have multiple options:

  • Use auditing to let the load fail if the error count gets too high
  • Re-read the overflow files into a database table (without constraints) and delete the file afterwards

If you generate too many errors in just one flow you probably want to do some simple validation transforms to at least catch some of the problems and reduce the error count.

Btw. of what use are overflow files if they get this large? Who is going to look through all the problems there?


HerdplattenToni (BOB member since 2011-05-13)

We can have 10Go overflow file sometime … and the jobserver is 100% CPU and everything is slow …

How do you set the audit option for error limitation ? it could be helpfull …

By saying : overflow file record limitation is 100 (like bulk load, or file format source)

What is important in the overflow file is the sql statement that has been rejected … to get the issue easier …


cthybert :switzerland: (BOB member since 2009-05-18)

In your dataflow click on the “audit” button in the menu bar. Choose your target table and audit function “Count” and in the rules tab set your rule to raise an exception if the error count variable is too high.
Make sure your job is executed with auditing enabled.

Btw. I’m not sure if this applies to BODS 4.0, we are still on an older version here.


HerdplattenToni (BOB member since 2011-05-13)

What do you mean by : error count variable

is it : table_attribute(’.’, ‘Number_Of_Rows_Rejected’)

I dont see any other error count variable

---- edit ----

Sorry I didn’t the proposed error variable … but issue with overflow file : the rejected record isn’t in error … so no errors …


cthybert :switzerland: (BOB member since 2009-05-18)

If you enable auditing on an object (e.g. a table) it generates two variables for you to use in the rules. Default is $Count_YOUR_TABLE_NAME and $CountError_YOUR_TABLE_NAME. As said, no idea if it is done the same way in 4.0, you might want to read the matching chapters in the technical manuals.
audit.png


HerdplattenToni (BOB member since 2011-05-13)

theoretically speaking the solution is OK for me

But I tried with a limit of 10 error in the audit (if 10 error job fail) with 1000 record wrong … the overflow file is writing anyway every wrong (1000) record.

And the job fail after writing every record in the overflow file…

and it takes a while …
Capturer.JPG


cthybert :switzerland: (BOB member since 2009-05-18)

I don’t normally recommend using an overflow file when there will be a large number of rows rejected. The overflow file for most of my clients will rarely have any rows written to it. The overflow file is the exception rather than the rule.

The above approach leads me to ask the obvious question: Why do you have so many rejected rows?

In some systems I’ve worked on the ETL is not allowed to reject ANY rows. To prevent rejection the ETL will provide (valid) dummy values and the ETL will go back the next day and try to provide real values in place of the dummy values.


eganjp :us: (BOB member since 2007-09-12)