Replication versus ETL

Hi everyboby,

We work with SqlServer 2000

I would like to create a new datawarehouse

1)What is the best between the Sql Server Replication or an ETL tool?

Note : Today, we use a replication system to fill a datawarehouse
but it’s not the best (slowly and heavy process)?

  1. Have you an experience in the use of ETL ?
    What is your opinion about ?

Regards
Chris


Maitre B (BOB member since 2004-09-02)

I think you have to consider a lot of variables to come up with a solution about this, I think you should as your self / team the following:

  • Does the information you need for the DWH is collected from all the tables of the database?
  • Can the company afford the storage of the transactional database? (Duplicate use of hard disk)
  • What are the frame times that we can use to extract information from the transactional database?
  • Does the company need an ODS?

Well, I think those are some questions you have to answer, but I think every situation is very different; therefore you have to analyze this carefully.


elijar (BOB member since 2002-08-26)

  1. SQL server replication will allow you to transfer data straight as 1 to 1 copy. Full stop. No data modification is possible.
    ETL on the other side allows you to collect your own datasets as you require. Most often it is used to extract from an ER (Entity Relationship) type, transform it (best) into a DM (Dimensional Model), clean, apply business logic etc and then load into a table that can be used as fact table.
    This is the main work of it over night (or day) as dimensional tables should not change much.
    ETL has the advantage that you can restrict the data sets to what is needed instead of taking along all data as the replication process does.

  2. If you have SQL Server 2000 then you already have an ETL tool - it’s called Data Transformation Services.
    It’s for free but from all ETL tools I have evaluated it is one of the most powerful but comes with a few hooks.

:yesnod: Advantages:

  • any data source set up as ODBC is available
  • easy graphic interface
  • fairly easy translation between UI and VB code if needed
  • program flow documentation

:nonod: Disadvantages:

  • can get very messy very fast without team discipline
  • no in-built documentation other than text boxes and screen prints
  • one step can only work on one data provider if not VB coded

ETL tools have a nicer UI interface and hide a bit of the complexity of the job. Worth $5000+++ ? That would be up to your requirements and skills available.
Any good database administration team with a few programming skill should be able to use DTS easily.

The advantages of a DWH, if set up correctly, are improved query times based on business process grouping of the data instead of ER data (which improves transactional processing).

The main question are your requirements.
ETL helps really if you have multiple data sources that ought to be one reporting data source. While possible to put views over the ER model this normally results in performance decreases.
Transforming the data into a DM model will result in better query times for the cost of nightly transformations.

Personally I use DTS to good effect and it saved my company around £70K within the last 2 years. And my company is not big.

Crucial to the success of my DWH project are at the moment:

  • Does the data presented to the users is shown in business logic ?
    If the users can’t find the data, they won’t use it.
  • Is all data available ?
    Certain data types in certain types of BusObj are not supported. Datetimes and nvarchar/varchar can be a pain. I am still finding out things as I go along.
  • Is there enough time in the night ?
    Looks an easy question but I find to my cost that with data transfer, transformation, integrity checks and backup I am running out of minutes for the data still to come.
  • Training on the job or skills already there ?
    This will have a big impact on how fast and how good you will be able to deploy your DWH.
    One strength I have is that I know most business processes from how they are done, how the managers think (what data they expect) and how and where the data from these processes is. It enables me to work out data presentation to the users with minimum consulting.
    Most of them have never used a BI tool so they learn to see the benefits as they “play” around.

KMB :uk: (BOB member since 2004-02-11)

Thanks for your answers

I have a last question about the refresh of the data.

When do you refresh your data ?

Does the refresh disturb the users ?

Do you destroy and recreate your table when you refresh your DW ?

My problem is that I have every time (the morning, the night) users that works ont he DW :frowning:

Regards
Chris


Maitre B (BOB member since 2004-09-02)

When do you refresh your data ?

During over night, after the database backups and it takes about 2 hours to load the data into fact tables.

Does the refresh disturb the users ?

No

Do you destroy and recreate your table when you refresh your DW ?

No, we make and “incremental load”, it means that the ETL processes take the information of the last day only.

As mentioned, your data load will depend on how important is “fresh data” to the users you have.


elijar (BOB member since 2002-08-26)

It is because, you hav’nt users during the night
or because the process of refreshment of the ETL is cheapest (in sense of CPU time)? :smiley:

Chris


Maitre B (BOB member since 2004-09-02)