Processing time of query and loading

system · May 21, 2014, 2:05pm

Hi All,

Is it possible to get the processing time of query (from source db), loading (to target db) and transforming (DS) from a finished job?

Appreciate your ideas.

CK

ckyy (BOB member since 2011-03-28)

system · May 21, 2014, 11:04pm

The AL_STATISTICS table will have most of that information. It does track information at the transform level.

eganjp (BOB member since 2007-09-12)

system · May 22, 2014, 6:14am

Yes, AL_STATISTICS table contains information about
the Execution statistics of transforms ( row count, cache size, execution time etc. ) within dataflows.

Regards,
Prasna

pnr6kor (BOB member since 2014-05-14)

system · May 22, 2014, 6:26am

Thanks a lot, eganjp and Prasna.
But it seems that the information in AL_STATISTICS table similar to the job monitor log, all of them do not separate the query time (from source db) and load time (to target db)

I created did two tests,
job 1: source table → query → target table (costs 100 seconds)
job 2: source talbe (same with 1) → query-> map_operation(discard all record)->target table (80 seconds, I suppose this is query time)

From AL_STATISTICS, for job 1, I can see the query from source table cost 100 seconds, and load to target table cost 100 seconds as well, witch should be around (100 - 80 = 20 seconds). So, How to separate the query time (from source) and load time (to target) …?

Any idea about it?

Thanks,
CK

ckyy (BOB member since 2011-03-28)

system · May 22, 2014, 1:22pm

Well, I did say it would have most of what you wanted.

You may have to parse the monitor log to get the lag time between query execution and query result set return.

eganjp (BOB member since 2007-09-12)

system · May 23, 2014, 2:47am

Thanks!

ckyy (BOB member since 2011-03-28)

system · May 23, 2014, 5:03am

You may not be thanking me later. Parsing the monitor log isn’t much fun. I wrote a job a few years back to parse the trace log and it was…interesting.

eganjp (BOB member since 2007-09-12)

system · May 23, 2014, 1:33pm

If it’s important enough, I typically festoon my workflows & dataflows w/ custom statistics capturing code. To take one simple example:

In your initial source reading Query transform, I’d add a “now” datetime stamp column. Downstream of that query, into the “stats-collecting branch” would go to another Query transform where I’d take the min() and max() of that timestamp, along w/ a count(*) for the record count. Downstream of that query, which would have only one record w/ three columns, I’d add some more columns w/ descriptive information – the context, the hierarchical path from the job down through workflows and dataflows to wherever you are (as an XML snippet, actually), a job identifier (so I could match the statistic as a child to the job as a parent, and then find the job-level information that’s certainly relevant to the statistics, like the system configuration in force, parameter values, etc.), and some other fields required for the “statistics subsystem.” A full explanation would require a lengthy blog post, but the point is: you can roll your own stats collection.

This is only going to start measuring records when they’re returned from your database server. If you wanted to figure out (manually) how long it was taking before your first record arrived to BODS, you could add a row_gen transform to your dataflow, timestamp it, and then stick it in the stats table. (The row_gen can be isolated.) You could also collect a timestamp in a parent workflow of the dataflow, but there’s a bit of a difference between a timestamp collected in a parent workflow and from the “top” of the dataflow itself.

AL_STATISTICS is OK, but it was never adequate to what I wanted. Maybe it’s gotten better of late – haven’t checked.

JeffPrenevost (BOB member since 2010-10-09)