Data Services enhancement & improvement ideas

system · January 15, 2010, 3:04pm

(Here’s a reorganized list, as of January 2010. Wdaehn, if any of these have been already done, let us know and we can remove them from this post.)

Built-in transforms and Functions

How about a Lookup transform. It’d be similar a Query transform, except the point in every column mapping would be a checkbox (lookup or not), and if lookup, expedite the process of doing the lookups – make it really easy for the designer.
Aggregator Transform to enable aggregation of columns based on grouping of select columns.
Some of the Table Compare options should be set-able through global variables, to allow dataflows that handle both Incremental loading as well as full re-loads. In particular, if you could set the “Detect Deleted Rows” and “Comparison Method” options to also read a variable, you could have a single dataflow that does incremental (no deletion detection; comparison method is row-by-row), and a full re-load (deletion detection; cached comparison table). Without this, you need two separate dataflows.
A new transform, something different than History Preserve, more like “Existing Column Preserve”, that does the following:

If the data in the target table has a value, leave it alone
If the data in the target table is null, then fill in the incoming value

In this transform, you could click on each column and the tab/dialog area below would have options for:

Normal (pass-through)
Preserve if not not null
User-specified condition (and you could put in your own WHERE-like criteria)

This is useful for some columns where you always want to preserve a value like “initial load date”, even if the row is later Updated (rather than inserted). But you want to have the same dataflow, and not have to resort to bringing in the target table as a source. In fact, now that I think of it, this could be added into the Table Compare transform. Since you already have to read the entire target table’s row for that transform to work, you could have another tab in the table compare that lets you specify this column-by-column behavior.

A “Top” transform: I had a requirement to compute an aggregated fact table, and only show the Top customers or Top products in the result. This was a

minor pain to do, as it required several Query steps in succession:

Aggregate the measures
Sort by the customer/product and then by the Revenue, in DESC order
Add a “gen_row_num_by_group” column
Then filter out just the top 10.

If there was a “Top” transform, it could do all of the steps above. Just specify which columns are the dimensions, which are the measures, and whether you want the highest or lowest values, and how big the TOP should be. It could do all of these in one step.

The Merge transform should have a radio button for “Merge All” versus “Merge Distinct”. Similar to the MERGE ALL vs MERGE in SQL itself. Yes, you can add a Query transform after the Merge with the Distinct checkbox checked, but a radio button choice would make for a simpler dataflow design (fewer steps) and would make it clearer to the new DI user what was going on.
Lookup with formulae - being able to define a formula on both sides of the lookup fields
How about better support for IN / NOT IN in the WHERE clause of your query? One way might be to make a new function called “is_in_query” that would take parameters:

Datastore and table to look in (including an in-memory datastore from #11 above)
IN or NOT IN
Input column to be compared
column – or formula – of the lookup table to compare to
Filtering clauses of the lookup table (to get a subset or to match)

At run-time, DI would evaluate the datastore of the Input colum and the datastore of the IN column. If the same, it would push this down to a regular IN in the SQL. If they were different… it would get much more complicated. I suppose at that point DI could turn it into a regular lookup() and then add in an in-memory filtering step. The idea here, overall, is to not have to do Pushdown_SQL so that you have better table impact analysis etc.

Validation transform: i’d like to be able to add a column to output that contains my error code so i can write this to a table for analysis or use a lookup table to obtain a description of error. Also useful to add a severity code

DI Designer Usability and Developer Productivity

Reusable column-level mappings (or at least LOOKUPs()). Lookups are a pain to code, and the UI wizard isn’t much faster. So often you’re doing the same lookup in many dataflows, say, to the Customer table. It would be nice to create a library of lookups which can be used which simply “paste” into your mapping with a click. Related: The second idea of getting a library of look_up is very interesting, and maybe, the user could build his own library and to access it, he could simply click on a dedicated tab in the object library window.
Different icons on the canvas if the output has the delete or drop & create option set
Convert embedded data flow objects to canvas objects.
Sort by owner within a datastore. They are ordered when importing metadata, but not on object display
Ability to Reimport a table right within the dataflow. Right-click on a table and choose Reimport.
In a Query step, you can click on a column on the right pane and the “used” source columns will get highlighted on the left pane. How about doing the reverse, too? Click on a source column and quickly see which target columns are using it.
If columns are used only in the join/WHERE clause of a Query (and not in a mapping), there’s no indication that they are used or mapped. A separate kind of highlight would be helpful on those source columns.
The ability to change a large number of mappings in a Query Transform to a constant value (say, null) in one step.
More keyboard shortcuts – some things like “validate all” don’t have keyboard commands, which is bad from a 508 (accessibility) perspective as well as productivity.
I’d REALLY like an “undo” feature for those times when I want to back out a series of changes I’ve made to a flow. Or at least an Undo of the “last” step/change, if a multi-layered Undo is too difficult.
I wish Designer handled database connection interruptions more gracefully. I’m in and out of VPN connections all day long (I’m a roving consultant), on and off of networks intermittently and sometimes I just kick my Ethernet patch cable out of the wall. Any disconnection from the network and you are totally screwed. We recently went through an intermittant network glitch that would drop everyone’s network connection. The Oracle and SQL Server DBA’s that I sit around would just hit their Reconnect buttons, while I got to watch Designer crash and burn. I was saving work every few seconds during this time period or cursing myself for not doing it when was in “deep development mode” and neglected to.
How about the ability to have multiple instances of the Designer open at the same time, with each one pointing to a different repository? For example, I’m in the development environment and want to go look at something in the production repository. Currently I have to exit Designer and restart it, point it to the prod system, look at what I need to, then lather-rinse-repeat to go back to my development environment.
I think the graphic ‘readability’ of Data Flows would be enhanced if the lines from the Inner Source(s) to a Query containing an Outer Join could be displayed as dashes ‘-------’ or dots ‘…’, while the lines from the Outer Source remained (as now) solid ‘_______’. This would avoid having to click into the Query’s Outer Join tab to discover the nature of the join. Ideally, this facility would be configurable in Tools > Options.

Connectivity, Database support, SQL generation

Identity Insert I would like to able to use SQLSvr IDENTITY columns as designed. This would allow me to parallel load tables, and do INSERT into … SELECT … statements with full pushdown. Related: Key Generation transform to use database max value, not in-memory calculation, to increment the key for every row. Related: This would also work for ORACLE with a default ‘sequence’.NEXTVAL
Support for a RANK function with pushdown.
Seperation of JOIN conditions and WHERE conditions, to allow you to perform an outer join and a filter within a single query. (Granted, there are ways around it, but it would be nicer to be able to make one query do the job. See Outer join and data filtering and

Using sub-queries)

Better management of joins/outer joins. We spend a lot of time creating views on the source DB simply because we can’t get DI to create an efficient query.
Move Join Rank to the query instead of (or in addition to) the source table setting.
Easier ability to turn off implicit transactions, both for targets and for SQL() script calls.
Fixing DECODE to translate to one CASE statement instead of a series of nested CASE statments in SQL Server
Ability to push down a HAVING clause.
ADAPT01129419 - support for reading Excel 2007 native files
An equivalent of pushdown_sql for column mappings - i.e. just pass the text thru to the database and map the the result as a column.
The ability to specify an Oracle Home to use for a DataSource would be useful. If the server is Oracle 10g, but has the need to connect to an older version of Oracle Server if I install an Oracle 8.1.7 client it would be nice to have the ability to stipulate the oracle Home to pass the SQL Net statement down.
I would really like to be able to use Active Directory as a data source/target! I have several ETLs that need to extract data from Active Directory, modify it, then reload it into AD. As it currently stands, I had to write a VB script to read AD into a CSV, then read that. In order to load AD, I had to write another DF to create an LDIFDE upload input file then execute LDIFDE within the WF.

Lineage and Impact Analysis

It would be convenient if from within DI Designer, you could right-click on a datastore table and do not just a “view where used”, but a “View Lineage/Impact”. This menu option would launch the DI web client and bring you right to the lineage/impact information for the object in question. I’d like to see the entire lineage when I do a “view where used” for an object. For example, if a data source table is used in a few data flows, and each data flow is used in a couple of work flows, and those are used in a couple of jobs, I’d like to be able to see the entire “tree”, navigate up and down the branches, and not have to go all the way back to the beginning (data source) to explore another path.
Have each DI transform considered an object inside a Dataflow, so that if you wanted to do a “view where used” of a particular DI transform, you could find that too.
How about “View Where Used” ability for variables!
Ability to do a “show where used” on an object in the central repo, and have it list other places in the central repo where used (not the local repo)

Source code management

Check in objects from the Local repository, vs. having to find it in the (much longer) central repo. You can do an “Add” from the local repo, but not a check-in.
When looking at objects in the Local Object Library be able to identify if they are checked in or out, who by and date of check in or out.

*Use usernames rather than database schema names when showing who has checked something out

ADAPT01130292 - In the central repo, doing a right-click to view History of the object – show details of check-outs (ones that weren’t Undone), not just check-ins, especially if the object is currently checked out. Prompt for Comments which get recorded at check-out time, not just check-in.
The ability to drag a dataflow from the Central repo window onto a Job or Workflow workspace, and have DI prompt you with the Checkout options at that time. Rather than making you go the central repo, check out there, then back to your local repo window, find the object, and drag that into your workflow.
Give a super-admin the ability to undo a checkout in the central repo… At the moment, you have to hack the database with SQL scripts, which is a time-waster and is dangerous.
Ability to tag objects in the central repo as “deprecated” (unused/retired), which would show them as greyed out. (And if you did a Show Where Used and that deprecated object was still being referenced, it would note it as such in the dialog box.)
The ability to display the attributes date created / data modified for an object in the Local Object Library, or Central repo, (in the list of column
headers) and sort by them so I can find out what has most recently been worked on or new additions without going through each items properties!
Add additional attributes to objects. Such as category, and display/sort/filter by these in the Local Object Library (which also covers grouping mentioned by previous posters).

Job migration, execution, & scheduling

Disabling workflows/dataflows - Well I would be interested in disabling objects (DF/WF) within a job with a simple right-click! This function could be very usefull especially when you have complex WF/DF and you want to test part of it. Related: If I am not mistaken, COgnos Decision Stream has this option and shows a big red ‘X’ when any component is disabled. Informatica too has a disable session task option.
A schedule migration process that would make migrating ETL process from one environment to another painless.
Scheduling of jobs from within DI Designer interface
The facility to lock a datastore so that it won’t be overwiten by an ATL import - maybe an “Are you sure?” prompt.
A job DI scheduler that actually takes in consideration the job dependencies if you had more than one job. We have many jobs that have dependencies, and had to develop a database solution to our dependency issue. Each DI job calls a workflow in which each job checks to see if all jobs it is dependent upon have run. If so, it writes its start time to the database. If not, the database function sleeps until all dependencies are met, then allows the job to continue. After the DI execution is complete, the job calls another workflow which writes its ending time to the database, thus allowing any dependent jobs currently sleeping to run. Yes, we have had situations were our cluster was quite full of sleeping jobs because a couple jobs ran longer than normal; it has indeed brought our servers to a grinding halt before. Having job dependencies built into DI could save us some serious headaches…
Related - Job chaining: The ability to call jobs on the same and different jobservers from other jobs - with success, fail, waiting, etc status and associated error alert/reporting options. This allows you to have master/child jobs as for some of the EDW’s I’ve build it just isn’t possible to stuff everything in a single monster job. You can already do this, using commandline execution and all, but it’s quite some work to develop a framework around this and it would be very useful if Data Services would offer this “natively”.

Other general functionality

In-Memory Datastore for Batch jobs - Lookup to flow - several times I’ve created my own result set after a query or two and really need to lookup the result of that from somewhere else further down the stream. I have to write this out to a table and then start a subsequent dataflow to be able to lookup from it. If this is a set in memory already I would like to be able to “fix” a position in a flow as a named set and use this as a lookup table without having to write it out (if you write it you only need to read it later, if it’s already in memory and it’s a transient set why bother). Related: Essentially, an in-memory datastore that works with Batch jobs (not just realtime). Write out the results from your first step to a “table” that lives in a datastore which is defined as being pinned in memory. Right now you can only write out to a persistent cache datastore, which is on-disk; we haven’t found the local persistent on-disk cache datastores to be any faster than re-retrieving the data from the database.
A New Variable Output column in the Query transform, which will hold, temporary results. Most of the time, we are calculating some values, for doing this, we need to add one more QUERY transform. I think a concepts very similar to VARIABLE PORT in Informatica.
Ability to use a query transform to write results to variables. If you need to set a few variables to various columns from a query you have to do this in a sql command in a script. This means you lose the visibility of the table having been used for starters, and secondly you need to run the same sql multiple times to get a different field back into a different variable. In order to set a variable to a value from a query you need to use a script task to set the variable = sql( datastore, query) It would be nicer to be able to use a query transform (or similar) where the output is set to a variable instead of a field in the output stream.
Flat Files: - be able to define a description and other metdata for columns in a flat file definition the same way you can for columns in database tables.
Flat Files: writing multiple files. Input file is 10 million lines long and it would be good like to write multiple output files of X lines long. This is so that we can restart the job at point of failure when it fails halfway through. This is similar to the “commit every 10000” rows when writing to a DB. Our workaround is to split input file and then use some complex looping workflow to go thru all files in the folder - works but headache to maintain and debug.
Flat files: a template flat file like a template table or xml
Flat files: ability to automatically ignore columns to the right of those defined in the flat file format
In the SQL Transform have a larger field for the Datastore name (we have some problems we long names).
Suppress some data type conversion warnings: a mapping of 0 in a decimal(18,0) generate a warning like Conversion of to ?
A trunc function like the Oracle one to strip off the time from a date time.
ADAPT01130309 - CMS Collector (metadata reporting) - does not pull underlying tables if Aliased in a universe, and, does not consider table joins as a dependency. Enhancement would be for DI to record the underlying table (vs. the alias name) for full dependency checking, and, for DI to consider objects used in Joins (but not as a universe object itself) as a dependency.
ADAPT01117847 - Ability to query repository for post-load SQL on target tables. (Probably means overhaul of how ATL is stored in repository – or maybe an export to XML format that is more human-readable.)
Rather than DI abruptly crashing as it hits the 32-bit 2Gb barrier… it would be nice if you exceed 1.9Gb of memory usage, DI logged a warning in the job console. And if you exceed 1.99Gb of usage, DI shuts down the job gracefully (ie. with an error code rather than a crash dump). This could avoid a lot of support issues with tech support and on BOB!
Create and include in the DI installation, a BO universe to report on the DI repository. This would seem like a slam-dunk integration point between these two BO products…

Management Console (Web Admin)

It sure would be nice to be able to use the Management Console for everything. Once the job server and Management Console have been installed and are up and running I would like to be able to use the Management Console for:

Creating the repository or even upgrading
Configuring the job server to talk to the repository
Importing an ATL file to the repository - no more designer required for production.
Selecting System Configuration settings.

I would like to see all DI configuration and administration available within the Web Admin. This would include:

DSCONFIG
Job Server setup
Repo maintenance/setup
Calculating dependencies/mappings
Ability to stop/start DI services

Since we’re asking for the moon, let’s also have this rolled into the BO Enterprise CMC entirely, so that there’s a single security model across all BO products… That way we can use AD authentication and not have to maintain separate account schemes. And then you can let the BOE installation process worry about web/app servers, Java dependencies, etc.
While using the Web-based Management Console, looking at the Batch Job status… I’d like to have a hover-over menu on the Job Name, that had options for:

Show History - this would link you over to the Operational Job Execution History page (/odreports/odJobList.do) with that particular job already filtering the results
Auto Documentation - this would link you over to the Auto Doc section (/smartdoc.jsp) with that job already chosen

dnewton (BOB member since 2004-01-30)

system · January 21, 2010, 3:58pm

How about from the Local repository the ability to delete object and dependants. To make this even better it would check if the dependants were used by any other parent object and give you a warning of such with an option to delete or not.

We work with central repositories and when i have rechecked a job back into the repository i would like an easy way to get rid of all the objects for a Project once the developer is finished with it without having to go through each level deleting all the objects in that particular level.

davc4 (BOB member since 2009-07-03)

system · January 28, 2010, 4:30pm

In DI Designer, if you pick “Show generated SQL”, it only shows the SQL that will be generated from Source Table objects/joins.

The problem is, performance tuning the database for lookup functions (wizard or lookup_ext()), table compares, or History Preserve, really means you need to have the SQL for those operations too. Some of this you can get through tracing, but that’s a pain.

What I’d love to see is, when you do a Show Generated SQL, you get a collapsible/tree list, with these buckets:

Source Table Queries
SQL queries (from SQL transforms)
Table Compare SQL
Lookup SQL
History Preserve SQL
Pre-load SQL
Post-load SQL

If you expand each node, you’d see the SQL that will be generated and which query step or object it will be issued from. There should be an easy way to copy this info (a single item or the whole list), too, into the clipboard for pasting.

While we’re at it – if this feature could be chosen at the Job or Workflow level, and it would recurse all dataflows and show all SQL – it would be fantastic for upgrades/migrations. You could export all of it to a spreadsheet and compare versions, say, the SQL between DI 11.7 and DI 12.2, to see if there were behavior changes.

dnewton (BOB member since 2004-01-30)

system · March 10, 2010, 8:19am

and should, if possible, push the appropriate UNION down to the database. In fact it would be nice to generate other set operations like MINUS and INTERSECT without having to put it into an SQL transform or creating a view on the source DB, so maybe there is a case for a new transform, leaving the existing the merge transform for operations which have to performed within the job engine

dastocks (BOB member since 2006-12-11)

system · March 10, 2010, 2:42pm

If I may add one to your flat file list…

When specifying wildcards in the flat file name, the ability to specify the order in which the filenames should be processed. And instead of DI processing all the files in a single run, it would loop through the data flow once for each file or at least put a “break point” in the dataflow so that DI knows it can treat each group as an independent group.

Example: We have a stream of 2 years of files from the billing system each with a date stamp in the name. After a major revision to the fact table we wanted to rerun all the files. The hope was add the filename to the query, sort on the filename, and let the job run. This hit the 2 gig limit either in the sort or the table comparison at the end.

We had to put in a while loop wrapper in the WF to increment through the files with a directory list. Not a bad option, but definately limited the usefulness of wildcards in my opinion.

hellmanj (BOB member since 2007-09-14)

system · March 16, 2010, 3:31pm

The ability to change the default behaviour of DI when creating dataflows. In 12.2 BODI assumes that if you have the “Use Database Links” property ticked that even if you dont have any database/store links setup it will try and create them if two datastores are on the same server.
By default when you create a new dataflow BODI ticks this box for you. Given the changed impact of having this ticked it should be off unless you specifically tick it.
Some visual way of seeing orphan objects in a central repository. Since there isn’t a usage count in the central repo window and you may want to tidy up the central repo.

ScoobyDoo (BOB member since 2007-05-10)

system · March 16, 2010, 11:21pm

More OS-related functions (file move/copy/rename), plus functions to help with routine batch scripting functions (like archiving files including a timestamp).

More built-in widgets for easier connectivity to SCP, FTP, and so on. Check out all the widgets this company makes for, ahem, a competing product: http://www.cozyroc.com/

dnewton (BOB member since 2004-01-30)

system · March 18, 2010, 11:48am

A different target table icon whenever it is set to truncate before load.
Also a different one when pre and/or post load commands are present.

ScoobyDoo (BOB member since 2007-05-10)

system · March 24, 2010, 3:16am

A function to calculate CRC64 checksums would be very handy - so TC would only have to compare an indexed CRC64 column rather than a long number of attribute columns.

Of course the CRC64 should be calculated on the attribute set of each primary key set - not over the entire data set as this will lead to an increased collision chance.

ErikR (BOB member since 2007-01-10)

system · March 24, 2010, 6:58am

For testing a User Custom Function, I need to create a job and execute it.

Instead I would like to test the User Custom Function in the Function Wizard itself. There should be Test/Execute option in the Function Window.

sameer81 (BOB member since 2007-09-15)

system · March 24, 2010, 12:14pm

Nice!

Werner Daehn (BOB member since 2004-12-17)

system · March 28, 2010, 10:41pm

Extend Lookup_Ext functionality to return a count of matching records found. This can be helpful to detect SCD2 lookup problems and lookup-counts are very useful when using Data Services for data migration work.
The ability to switch tables from permanent back to template without using the current workaround (delete table from data store repository and re-create as template table).
Did I mention CRC64 checksums already?
in the DI Designer, when you close a window (say a transform in a DF), the previous tab seems to gain focus. Which is great if you “drilled” into your DF in sequence (say: Job -> WF -> DF -> Query Transform).

But I noticed that if the tab order got messed up, closing a Query Transform causes the DI Designer to put focus back on the Job or WF but not the DF. It would be very helpful to either have the ability to arrange the tab’s in the right order or for the DI Designer to automatically focus on the parent object regardless of the tab sequence.

The ability to export to real Excel files. This is very handy for data migration work. (Yes CSV’s can be opened in Excel but it’s not really the same thing and can cause formatting problems).
Support for Excel 2007/2010.
Support unstructured data sources like Word Documents, scanned invoices etc. Perhaps even a OCR capability to process documents?

Updated 2010.03.30:

“Find in Repository” feature. Often I am looking at an object on screen as part of a large Job, say a Workflow or Dataflow and I want to replicate or export that particular object … I now have to start searching throw my repository for that object by name. Since my repository is quite large and contains many objects, this is can be somewhat annoying.

I wish there was a feature similar like “View where used”, where it would say “Find in repository” and once clicked, the DI Designer would immediatley highlight and focus on that particular object in your repository (if present).

ErikR (BOB member since 2007-01-10)

system · April 7, 2010, 11:20pm

Another issue that I constantly forget to mention but has been bugging me since Data Integrator 11.x:

When you start the DI/DS Designer on a Windows workstation, the Designer does not appear as an application in the task bar until it receives focus.

If I have multiple windows open, I have to Alt-Tab to the Designer application to cause it to appear on the task bar.

I have seen this with Data Integrator Designer 11.5.0 through 11.5.3 and Data Services 12.0 through to 12.2.2. I can’t recall if I had this problem with Data Integrator 11.7.2 and 11.7.3 as I didn’t use this version as much as 11.5.x and 12.x.

In most cases, the Designer was running under Windows XP with various versions of applied XP Service Packs. I did try the Designer under both Windows Vista 64-bit and Windows Server 2008 64-bit but I cannot recall from memory if this issue also exists with these 64-bit operating systems.

ErikR (BOB member since 2007-01-10)

system · April 8, 2010, 12:00am

Yes, I’ve had this problem since 11.x through 12.x on Win XP (32-bit).

dnewton (BOB member since 2004-01-30)

system · April 8, 2010, 12:59am

Did you experience this problem also on Vista / Win7, either 32 or 64 bit?

I would like to see better search features for your local object library.
For instance, the ability to ignore certain object name prefixes.

Example: in our naming convention, all workflow names start with WF_, all dataflow names start with DF_ etc.

If I want to search for a workflow (or dataflow) called WF_FACT_MORTGAGE or WF_DIM_CUSTOMER etc, I have to scroll through the entire repository to find it.

While the Designer supports keystrokes to search for all workflows starting with ‘M’ (Mortage) or ‘C’ Customer - it is of little use when all workflows start with the same prefix! (WF_)

If I could set the Designer to ignore ‘WF_FACT_%’, ‘WF_DIM_%’, ‘WF_STG_%’ etc, it would be very easy to find items by hitting the keys for the first letter of ‘Mortage’ or ‘Customer’ etc.

ErikR (BOB member since 2007-01-10)

system · April 8, 2010, 7:52am

Following this thread:

https://bobj-board.org/t/147310

It would be nice to be able to control the number of processes a Job Server can try and execute in parallel rather than doing this at Job level.

The vast majority of BODI sites I’ve visited have gone down the approach of many small compact jobs, rather than large jobs that do everything. Therefore we should let the Job Server balance the load of work rather than the individual jobs which have no clue about what else is running on the server.

The ability to be able to rank objects somehow when performing parallel exection. At the moment when you have a large number of workflows/dataflows running in parallel it seems random which ones it executes “first” (keeping in mind the usual 8 al_engine limit) It would be nice to be able to control this order and could tie in to my first request too

ScoobyDoo (BOB member since 2007-05-10)

system · April 8, 2010, 12:57pm

Support for Windows (NTLM) authentication for Web Service datastore connections… so we can get data out of MS SharePoint lists. (ADAPT01398029)

dnewton (BOB member since 2004-01-30)

system · April 8, 2010, 9:42pm

Scripting language/environment improvements:

a Reserve (VB: Right) Substring function to extract N number of characters from the end of the string.
Support for Arrays of various datatypes (Varchar, Date/Datetime, Decimal)
Reverse Index function: find first occurrence of search string LEFT of starting position

Anyone who transforming long text/CLOBS into SAP Long Text LSMWs will see how useful this is to make clean breaks.

Rightnow, I am processing each line, storing the latest break space into a local var until I exceed the maximum line lenght in SAP R/3 (standard 72 characters), exit the loop and then break the line on the latest recorded space location. A Reverse Index would be much cleaner and replace 3 lines of code with a single statement.

And some functions to make the generation of date dimensions a little easier:

Multi language date part name function to return the names of months and weekdays in English, French, Spanish, German, Dutch, Russian, Maori etc
Public holiday function to mark/list all known public holidays per country.

ErikR (BOB member since 2007-01-10)

system · April 8, 2010, 9:49pm

Job execution improvements:

The ability for one job to call another job in the same or other repository, without having to use scripts for batch files/command file and with the option to execute a-sync or in-line.

Within SQL DTS (a long time ago, in a galaxy far far away…) you could call other DTS packages and that was very, very useful to break up processes in managable pieces.

This is especially useful when you have a large EDW ETL framework where not all pieces can be executed in a single nightly batch job or you want to execute certain parts at different intervals etc.

ErikR (BOB member since 2007-01-10)

system · April 8, 2010, 9:57pm

Yes, a function like:


 run_job ( jobname, sysconfig, checkbox_bits, global_param1, global_param2, ...)

dnewton (BOB member since 2004-01-30)