Restrict data to 1000 rows

Hi All,

I have 10 million records in source and I want to just just 1000 of them in my target. These 1000 rows are to be loaded on sample basis without any specific business rule. I know how to implement ‘where’ logic in query transformer to restrict the data but that is not my requirement.
Please help.

Thanks


avbaby :india: (BOB member since 2009-05-09)

What DB is your Source is?


ganeshxp :us: (BOB member since 2008-07-17)

How about the idea in this post:-


Nemesis :australia: (BOB member since 2004-06-09)

I do this a couple different ways, both using python scripts.

the python random module has a choice method, that returns a non repeating random selection for a given range. You simply generate N number of these into a list, sort them, then write them out to be joined on.

the other way is of course to use math and calculate mathmatically which record you want to select, based on total record/record selection then incremented based on an offset. The python generators work really well for this.

if you know the Nth offset, Id imagine you could also use a row gen transform and multiply the offset by the row number, and round it to get which record to select. You would have to do some checking to make sure you selected enough records and recursively Nth out the remainder. :crazy_face:

I spent a lot of time learning to Nth. :yesnod:


jlynn73 :us: (BOB member since 2009-10-27)

Let us ask our Forum Admins to create a seperate area to put in all your Python scripts help in a seperate Sub Forum !!! :hugs:

I love those handy scripts and used them!!!

Keep helping us!!! :wave:


ganeshxp :us: (BOB member since 2008-07-17)

My source is HANA DB


avbaby :india: (BOB member since 2009-05-09)