This is an interesting discussion on the 2 top ranked open source ETL tools http://blogs.ittoolbox.com/bi/websphere/archives/wiki-wednesday-comparing-talend-and-pentaho-kettle-open-source-etl-tools-16294
MikeD (BOB member since 2002-06-18)
This is an interesting discussion on the 2 top ranked open source ETL tools http://blogs.ittoolbox.com/bi/websphere/archives/wiki-wednesday-comparing-talend-and-pentaho-kettle-open-source-etl-tools-16294
MikeD (BOB member since 2002-06-18)
Hello,
I would like to add to this post.
As time goes by, we see more open source solutions appearing on the market. To me, a few names comes to mind when talking about open source data integration: Pentaho or Talend are two. Talend has been on the market for a few years now and i tend to personally use it more often than Kettle.
Mainly because Talendās open source Talend Open Studio is able to perform data integration without being an expert. Even though the software is powerful and robust, it is easy to use, fast to operate and free to download. Talend has an active community able to solve problems and a special debugging team to quickly change features on updates and a team.
[Moderator Discloser: This person appears to work for Talend, so take that into consideration when reading this post, thanks. ā Dave ]
Tuguri (BOB member since 2009-01-23)
I donāt much care for FOSS, but the willy-waving contest between Talend and Pentaho in the comments on that article are worth a read for a laugh
cashworth (BOB member since 2005-02-09)
I tried doing stuff in Kettle and it got all too Javaery and Opensourcy for me, so I did it all in SSIS quicker.
Not many enterprises I work with will touch either of them - that and their technically not freeā¦not in an enterprise deployment anyway.
ABILtd (BOB member since 2006-02-08)
Agreed. Thereās a certain mindset behind ETL tools, and it seems like these were developed by Java programmers vs. database programmers. Itād probably be easier to grasp if you had no competing-tool ETL background and were familiar with IDEs like Eclipse.
I do think Pentaho has a lot of nice little ETL widgets in it.
dnewton (BOB member since 2004-01-30)
There is an interesting performance benchmark among Talend, Pentaho & CloverETL - : http://www.cloveretl.org/_upload/clover-etl/Comparison%20CloverETL%20vs%20Talend%20and%20Pentaho.pdf
pavlisd (BOB member since 2009-02-23)
Hi pavlisd,
Please tell us about your connections with CloverETL tool. The thing is that your post looks like an advertisement which is not allowed on BOB:
https://bobj-board.org/tos#heading--dont-advertise
Thanks.
Marek Chladny (BOB member since 2003-11-27)
Iāve used Kettle on two live projects in Production. No complaints and plenty of features. Some error messages are in the form of Java stack dump, but they arenāt too hard to figure out if you just scroll down. Iām not even a real Java programmer.
The sample job it has to create a date dimension table we even used as a base for our datwarehouse
Namlemez (BOB member since 2005-03-14)
I donāt think I have ever seen an ETL benchmark test that was fair and balanced and this particular Clover ETL benchmark is more rigged than most. If they really wanted a fair benchmark they would have invited the other vendors to take part rather than clumsily create jobs themselves.
vincent.mcburney (BOB member since 2007-04-14)
Well I was asked to explain my relationship to CloverETL - I am the founder of the project.
Regarding the test of CloverETL, Talend & Kettle. We have also tested Informatica & DataStage using the same approach. We canāt openly publish the results for these two - enough to say that Informatica, DataStage outperformed all the OSS variants (in the order mentioned ).
As for the āclumsily created jobsā - not sure that this really applies here. We have asked experienced ETL developer to put together the transformations.
For sure, he had our expertise with CloverETL at hand, but if you really check the transformations - they all look quite similar - regardless whether it is Informatica, Talend or CloverETL. If he was able to create well-performing transformations in Informatica, DataStage or Clover, why would he failed so miserably with Talend & Kettle ?
An for sure, we would welcome anybody with vast experience in Kettle or Talend to verify the results. The document provides exact definition of what was processed in terms of data - the TPC-H benchmark is well known and represents certain type of transformation which is usually done by ETL tool.
pavlisd (BOB member since 2009-02-23)
I think the forum members are taking offense at the underhand way you tried to introduce āyourā benchmarks into the discussion.
I initiated the thread with no connotations or intentions and have no relationship to any of the tools. If I had, I would have said so.
I have used Talend, Kettle, DataStage, SSIS and will probably get around to having a look at CloverETL sometime, but I have to concur with the previous posts that most toolset comparisonās are slanted to the point of being useless. I will still read them, and even post them, primarily from an information perspective as to whatās happening on that side of the toolset front, aAnd as a possible example of how people are comparing products as many of us end up having to make toolset selections at some point in time.
Thereās a very good reason why the likes of Gartner and http://www.etltool.com/ etc will sell a product comparison - and why people will in fact buy it ā¦
Thanks for the document link, but I donāt see a similar post by you in the Pentaho and/or Talend communities, where I think you should be posting this if you wanted a genuine technical interaction of product comparisons involving the correct skillsets and inherent product know how.
MikeD (BOB member since 2002-06-18)
@pavlisd ⦠you should probably fire your kettle āexpertsā ⦠The ExtHashJoin of Clover is not done with a sort/merge join in Kettle, itās done with a stream lookup (which is very similar to your hashjoin). After that you should get similar speeds between kettle and clover.
Any reason why stream lookup wanāt used in Kettle⦠ignorance or malice?
ogmios (BOB member since 2009-03-09)
I think this whole thread speaks to the need for standardized performance comparisons/benchmarks. Similar to the TPC-type of benchmark. I think what Clover did was an interesting idea, to base it on TPC, but their execution seemed flawed.
It seems as though one could define a benchmark pretty easily, with source and target tables, some transformation logic, etc., and then let each vendor figure out how to design an optimal job to get the data to move.
The problem is, as Clover indicated, eliminating the variables due to databases (and the database hardware) that each vendor would have.
For what itās worth, here is MSā recent claim to have top performance:
dnewton (BOB member since 2004-01-30)