Post your ideas here, and see what others might have to say (would others use the suggested functionality too? If so it might give more “weight” to the idea within BO.)
===================
Here is a collection of ideas from a previous thread, updated and duplicates removed, grouped into themes.
Usability, Developer Productivity & Built-in transforms
-
My ‘enhancement’ would be for BusObj to concentrate on improving the UI of Designer - I’ve found it particularly unintuitive, clumsy, slow and most significantly, highly unstable. The last point is very frustrating and will prevent me from using DI again or indeed recommending it to anyone else until the product improves. It crashes/hangs on me at least 4 or 5 times a day. As far as i can tell it seems to be whenever DI needs to read/write to local or central repositories.
-
Reusable column-level mappings (or at least LOOKUPs()). Lookups are a pain to code, and the UI wizard isn’t much faster. So often you’re doing the same lookup in many dataflows, say, to the Customer table. It would be nice to create a library of lookups which can be used which simply “paste” into your mapping with a click. Related: The second idea of getting a library of look_up is very interesting, and maybe, the user could build his own library and to access it, he could simply click on a dedicated tab in the object library window.
-
How about a Lookup transform. It’d be similar a Query transform, except the point in every column mapping would be a checkbox (lookup or not), and if lookup, expedite the process of doing the lookups – make it really easy for the designer.
-
Different icons on the canvas if the output has the delete or drop & create option set
-
Copy & paste of canvas objects, on the same and between canvas. Related: -More “copy/paste” of transforms and scripts; and, drag-and-drop dataflows into workflows. (see Why IF and WHILE conditional cannot be reuse? )
-
Convert embedded data flow objects to canvas objects.
-
Sort by owner within a datastore. They are ordered when importing metadata, but not on object display
-
Ability to Reimport a table right within the dataflow.
-
Check in objects from the Local repository, vs. having to find it in the central repo.
-
In a Query step, you can click on a column on the right pane and the “used” source columns will get highlighted on the left pane. How about doing the reverse, too? Click on a source column and quickly see which target columns are using it.
-
If columns are used only in the join/WHERE clause of a Query (and not in a mapping), there’s no indication that they are used or mapped. A separate kind of highlight would be helpful on those source columns.
-
Improvements to version control team development. EG:
- When looking at objects in the Local Object Library be able to identify if they are checked in or out, who by and date of check in or out.
- Use usernames rather than database schema names
-
ADAPT01130292 - In the central repo, doing a right-click to view History of the object – show details of check-outs (ones that weren’t Undone), not just check-ins, especially if the object is currently checked out. Prompt for Comments which get recorded at check-out time, not just check-in.
-
The ability to change a large number of mappings in a Query Transform to a constant value (say, null) in one step.
-
It would be convenient if from within DI Designer, you could right-click on a datastore table and do not just a “view where used”, but a “View Lineage/Impact”. This menu option would launch the DI web client and bring you right to the lineage/impact information for the object in question.
-
Agregator Transform to enable aggregation of columns based on grouping of select columns.
-
More keyboard shortcuts – some things like “validate all” don’t have keyboard commands, which is bad from a 508 (accessibility) perspective as well as productivity.
Connectivity, Database support, SQL generation
-
Identity Insert I would like to able to use SQLSvr IDENTITY columns as designed. This would allow me to parallel load tables, and do INSERT into … SELECT … statements with full pushdown. Related: Key Generation transform to use database max value, not in-memory calculation, to increment the key for every row. Related: This would also work for ORACLE with a default ‘sequence’.NEXTVAL
-
Support for a RANK function.
-
Seperation of JOIN conditions and WHERE conditions, to allow you to perform an outer join and a filter within a single query. (Granted, there are ways around it, but it would be nicer to be able to make one query do the job. See Outer join and data filtering and Using sub-queries)
Better management of joins/outer joins. We spend a lot of time creating views on the source DB simply because we can’t get DI to create an efficient query.
-
Ability to turn off implicit transactions, both for targets and for SQL() script calls.
-
Fixing DECODE to translate to one CASE statement instead of a series of nested CASE statments in SQL Server
-
Ability to push down a HAVING clause.
-
ADAPT01129419 - support for reading Excel 2007 native files
-
An equivalent of pushdown_sql for column mappings - i.e. just pass the text thru to the database and map the the result as a column.
Job migration, execution, & scheduling
-
Disabling workflows/dataflows - Well I would be interested in disabling objects (DF/WF) within a job with a simple right-click! This function could be very usefull especially when you have complex WF/DF and you want to test part of it. Related: If I am not mistaken, COgnos Decision Stream has this option and shows a big red ‘X’ when any component is disabled. Informatica too has a disable session task option.
-
Migration - Another general point would to improve the migration of Jobs/DF/WF within differents environements (the actual method is not intuitive to me… but I’m still working with the 6.5 versions, I don’t know about the XI!)
-
We should able to “PAUSE” a job, I don’t know why we need this, but it can become handy, in some situations.
-
A job DI scheduler that actually takes in consideration the job dependencies if you had more than one job. We have many jobs that have dependencies, and had to develop a database solution to our dependency issue. Each DI job calls a workflow in which each job checks to see if all jobs it is dependent upon have run. If so, it writes its start time to the database. If not, the database function sleeps until all dependencies are met, then allows the job to continue. After the DI execution is complete, the job calls another workflow which writes its ending time to the database, thus allowing any dependent jobs currently sleeping to run. Yes, we have had situations were our cluster was quite full of sleeping jobs because a couple jobs ran longer than normal; it has indeed brought our servers to a grinding halt before. Having job dependencies built into DI could save us some serious headaches…
-
A schedule migration process that would make migrating ETL process from one environment to another painless.
-
Scheduling of jobs from within DI Designer interface
Other general functionality
-
Lookup with formulae - being able to define a formula on both sides of the lookup fields
-
In-Memory Datastore for Batch jobs - Lookup to flow - several times I’ve created my own result set after a query or two and really need to lookup the result of that from somewhere else further down the stream. I have to write this out to a table and then start a subsequent dataflow to be able to lookup from it. If this is a set in memory already I would like to be able to “fix” a position in a flow as a named set and use this as a lookup table without having to write it out (if you write it you only need to read it later, if it’s already in memory and it’s a transient set why bother). Related: Essentially, an in-memory datastore that works with Batch jobs (not just realtime). Write out the results from your first step to a “table” that lives in a datastore which is defined as being pinned in memory. Right now you can only write out to a persistent cache datastore, which is on-disk; we haven’t found the local persistent on-disk cache datastores to be any faster than re-retrieving the data from the database.
-
A New Variable Output column in the Query transform, which will hold, temporary results. Most of the time, we are calculating some values, for doing this, we need to add one more QUERY transform.
I think a concepts very similar to VARIABLE PORT in Informatica.
Ability to use a query transform to write results to variables. If you need to set a few variables to various columns from a query you have to do this in a sql command in a script. This means you lose the visibility of the table having been used for starters, and secondly you need to run the same sql multiple times to get a different field back into a different variable.
In order to set a variable to a value from a query you need to use a script task to set the variable = sql( datastore, query) It would be nicer to be able to use a query transform (or similar) where the output is set to a variable instead of a field in the output stream.
- How about better support for IN / NOT IN in the WHERE clause of your query? One way might be to make a new function called “is_in_query” that would take parameters:
- Datastore and table to look in (including an in-memory datastore from #11 above)
- IN or NOT IN
- Input column to be compared
- column – or formula – of the lookup table to compare to
- Filtering clauses of the lookup table (to get a subset or to match)
At run-time, DI would evaluate the datastore of the Input colum and the datastore of the IN column. If the same, it would push this down to a regular IN in the SQL. If they were different… it would get much more complicated. I suppose at that point DI could turn it into a regular lookup() and then add in an in-memory filtering step. The idea here, overall, is to not have to do Pushdown_SQL so that you have better table impact analysis etc.
-
Flat Files: - be able to define a description and other metdata for columns in a flat file definition the same way you can for columns in database tables.
-
Flat Files: writing multiple files. Input file is 10 million lines long and it would be good like to write multiple output files of X lines long. This is so that we can restart the job at point of failure when it fails halfway through. This is similar to the “commit every 10000” rows when writing to a DB. Our workaround is to split input file and then use some complex looping workflow to go thru all files in the folder - works but headache to maintain and debug.
-
Validation transform: i’d like to be able to add a column to output that contains my error code so i can write this to a table for analysis or use a lookup table to obtain a description of error. Also useful to add a severity code
-
Enable Dynamic Lookup
-
In the SQL Transform have a larger field for the Datastore name (we have some problems we long names).
-
Suppress some data type conversion warnings: a mapping of 0 in a decimal(18,0) generate a warning like Conversion of to ?
We just put Zero in a field, I’m sure that work.
Why a substr(xxx, 2, 1) generate a warning when we put the result in a varchar(1). BODI just look the input column and the output column. I’m sure that my substr() will return 1 character or null.
-
A months_between function [added in DI 12.0 / DS] and a trunc function like the Oracle one to strip off the time from a date time.
-
ADAPT01130309 - CMS Collector (metadata reporting) - does not pull underlying tables if Aliased in a universe, and, does not consider table joins as a dependency. Enhancement would be for DI to record the underlying table (vs. the alias name) for full dependency checking, and, for DI to consider objects used in Joins (but not as a universe object itself) as a dependency.
-
ADAPT01117847 - Ability to query repository for post-load SQL on target tables. (Probably means overhaul of how ATL is stored in repository – or maybe an export to XML format that is more human-readable.)
-
Rather than DI abruptly crashing as it hits the 32-bit 2Gb barrier… it would be nice if you exceed 1.9Gb of memory usage, DI logged a warning in the job console. And if you exceed 1.99Gb of usage, DI shuts down the job gracefully (ie. with an error code rather than a crash dump). This could avoid a lot of support issues with tech support and on BOB!
dnewton (BOB member since 2004-01-30)