How to get the latest file from the folder into DS

Dear Experts,

I have a folder where users keep the files with date . When my job runs ,I need to get the latest file. how can I compare the date at the end of the filename and get it into DS to run the job. Please guide me.

Thanks,
Siri


SBENT (BOB member since 2012-03-30)

Create a Global Variable that stores both file name and today’s date and check for the file in the respective folder with the global variable name in script.

Global VaribaleFormat is: FileName || to_char( sysdate( ),‘YYYY/MM/DD’)

Hope this helps…


kannar (BOB member since 2011-08-01)

What is the purpose of using todays date. Files will come any day or time into the folder and when we run the job ,it will pick the latest file. I asked the users to append the data at the end of each file. Lets say I have files like below in a folder

file1_04042012.csv
file1_04062012.csv
file1_04132012.csv

My script has to pick “file1_04132012.csv” and process

First of all ,how can I keep the file name from this folder into a global variable. Can I use wild card in the script or file format editor.

Thanks,
Siri


SBENT (BOB member since 2012-03-30)

Moderator Note: If someone can please tell me where this topic should go, I will move it. I don’t have any idea on DI.


Jansi :india: (BOB member since 2008-05-12)

Glenn, who reported this for wrong forum should be able to suggest where to move these.

Thanks for your suggestion Glenn.


SBENT (BOB member since 2012-03-30)

My concern is the topics being posted in the top level are really areas covered by the sub forums and their purpose would be to ensure consistently aligned conversations down a stream.

For example anything to do with job design, flow or techniques should probably be in the Designer subforum. Likewise anything around management console, administration, connecticvities would be in that subforum.

In the logn run I dont want to be seen as policing it is a public community to generate discussion; it just been nice seen more generic posts around Data Services in the top folders and more focused ones in the lower sub folders.

Cheers

Glenn


GlennL :australia: (BOB member since 2005-12-29)

One way to get the latest file is as follows (which assumes you’re working in a Windows environment).

In a DS script, using the exec() function, run

"cmd /c dir *.csv > dircsv.txt

(I think that will work – can’t test for you right now) to send a list of the files names and their last modified dates and sizes into dircsv.txt, overwriting the file. Make a DS file format for it, and read it into a table. It’ll complain about the summary lines, if you actually setup the format to read the four separate columns; you can either 1) not care about that bit of slop, 2) get a 3rd party utility to generate your directory in a cleaner way (i.e., without summary lines), 3) configure your file format to read each line as one field and then parse the fields out in DS, later, or 4) something else. (Myself, I’d take, and have taken, choice #2.) Once you have the filenames and last modified dates in a table, the rest is easy: setup a WHILE loop, read the “latest” filename into a variable (you’ll need to decide if “latest” is determined by the file’s last modified attribute or that embedded date in the filename), use the variable as the filename of a file format that actually reads the CSV, etc.

Hope somebody has a more elegant solution, though, as this seems a lot of hoopla for a simple task.

Btw, when reading data from a file, you can include the filename as a column. So, in your case, yet another approach would just be to read all the data, from all the CSV files, at once, into a table, including the filename, and then figuring out “the latest” afterwards. (Maybe that’s the “more elegant solution”!)


JeffPrenevost :us: (BOB member since 2010-10-09)

Jeff,

DS is on Unix server. I should have mentioned it earlier.If it is windows your solution works . Thank you so much . Can you please give me some guidence on how I can do this on Unix.

Thanks,
Siri


SBENT (BOB member since 2012-03-30)

Move this Topic to “DI: Designer and Job Design”


Tarunvig :us: (BOB member since 2011-01-18)

If you’re going to rely on the date-time encoding in the filename itself, then why not just take the solution whereby you read all the data in, from all the files, including the filename as a column, and then process “the latest” data afterwards?

If you wanted to take the first approach, write the equivalent in Unix. The “ls” command has many options – am sure you can figure it out.


JeffPrenevost :us: (BOB member since 2010-10-09)