I have a Dataflow that uses two tables as sources. Each table is from a different Datastore. The query transform that uses both of the tables has an outer join in it.
What I’m seeing is that DI is generating SQL that has BOTH tables in it. It’s blowing up of course. Oracle returns a “Table does not exist” error.
I’ve tried setting different join ranks, cache/no-cache, etc.
DI should be generating two queries unless you specified that they are linked datastores in the properties of the datastore. If you did not, I would open a ticket with DI Support immediately.
I tweaked a lot of setting yesterday trying to get this to work. I think I changed the linked datastores but just to make sure, this settting is on the properties of the Dataflow, right?
Nope. I was actually talking about the properties of the Datastore. Right click, Edit the datastore, then click the Advanced button and scroll down to Linked Datastores.
Neither of the Datastores involved had the Linked Datastore property checked on.
When I do the “Display Optimized SQL” I see only one Datastore in the left frame and the SELECT statement on the right frame has both tables/Datastores in there.
While in the Dataflow I can click on the magnifier button to view the data for each table and I see the correct results.
I think I understand what is going on. Both Datastores involved in this Dataflow using the same TNSNames entry. Data Integrator is making a fatal assumption that because they both use the same TNSNames entry that connecting to one Oracle schema will give access to the other Oracle schema. These two schemas are distinctly different and they may not exist on the same server in the future.
I added a new TNSNames entry that uses the same connection information and switched one of the Datastores to use the new entry. Then everything started working right. Grrr.
Ahhhh. Performing as designed. There are other folks who use this feature for logical separation of physical schemas. And if the TNS entry is the same, I was pretty sure that the two cannot reside on different servers (since the TNS specifies the server name, port and service)?
Right. But in this case connectivity to one schema doesn’t assure you of the ability to query the other schema. In my case being logged in to schema A does not give you any rights to query from schema B.
My options are:
Keep using two TNSNames entries
Set up a login that has rights to query both schemas
Figure out why DI is doing this when all the options I’ve set are suppose to tell it to NOT do it that way.
Within the DI Tech Manual it has this description for the job server option “Use Explicit Database Links”:
Jobs with imported
database links normally
will show improved performance
because Data
Services uses these
links to push down processing
to a database.
If you set this option to
FALSE, all data flows
will not use linked datastores.
The use of linked datastores
can also be disabled
from any data flow
properties dialog. The
data flow level option
takes precedence over
this Job Server level
option.
However, I don’t think this applies in this situation. I’m still looking for other switches to flip.
I would call support to ask if there is perhaps a DSConfig setting to do it. But using 2 TNS names seems to me to be both a harmless and easy to maintain solution? Why do you find it unworkable?
I don’t like the two TNSNames entry work around because it shouldn’t be necessary and I’ll have to update the TNSNames file for all developers and job servers.
Okay. Anyone else know a workaround? I would still ping support, they sometimes know of DSconfig entries that are not documented that help stuff like this.
Jim, could you please send me screenshots of the two datastores, screenshot of the dataflow and the display-optimized-sql? I wanna beat the developer with that. They told me this new behaviour we have since years.
The current behaviour is that if tnsnames AND the effective user (after aliasing) is the same, then we implicitly link datastores. The wanted behaviour is that user does not matter but the dataflow property “use database link” is considered.
In 12.0.0.0 there is a bug that if aliases are used, even to alias the user to the same username, implicit database links are not used. Try that.
The TNSNames entry is the same but the user is definitely different. As far as I know I’m not using any aliasing - I’ll have to look that up. I had played around with the Dataflow property “use database link” and I had the same problem with both the ON and OFF setting.
I’ll have to get back to you on screen shots. I’m not at the client site today.
I believe this is the same issue I struggled with in my first posts. If memory serves me, the point at which it breaks is when you alias the datastore. I, on the other hand, want BODI to pushdown one query since both datastores are readable by the each user…
As of 11.7, datastore aliases break pushdown of a query accessing two datastores using the same tns name.
In 12.1 we use the “effective user” to decide if we pushdown or not.
So no matter what the table owner is according to the repo, after the alias got applied we check if the owners are the same and if they are, we pushdown, otherwise we don’t.
Not too happy with that constraint, because maybe the users are different but I have setup the grants so that the one user can see the other users tables. I have asked development to ignore the user but acknowledge the dataflow flag “use database links”. If that is set we implictly push down, otherwise we don’t.