The optimize SQL screen is executing the same code as the engine itself, so very likely a bug. But you can try yourself, execute the dataflow and set the trace options in the execution dialog trace_sql_reader=yes; show_sql_only=yes.
This will print all SQLs submitted by the readers into the trace log.
Even I faced similar situation.
The scenario was:
Table A from DB X
Table B from DB Y
My Query has to be like below:
select A.*, B.col3 from A,(select B.col1,B.col2,max(B.col3) as col3 from B group by B.col1,B.col2) B where
A.col1=B.col1
and A.col2=B.col2;
When I designed the job and clicked on Optimized dataflow, I got 2 separate SQL from 2 datastores (because table A and B are from different DB). The 2 queries were
select * from A;
select B.col1,B.col2,max(B.col3) as col3 from B group by B.col1,B.col2;
I gave ranks and tried couple of things, but nothing worked. I was not getting Al_variable in the query (which i was expecting). Strangely, I gave distinct for Table B query and when i saw the optimized query, I could see the Al_variable1 and Al_variable2. Is this a bug, or is this the way it works?