BusinessObjects Board

200 million records in Fact table !!

Hello Experts,

We have recently deployed a universe on live.
Currently we are having around 1 million records in fact tables. But in future, it is expected that will grow to around 200 million records.

Could anyone please let me know, would my report run successfully with this huge amount of data ?

What measures or changes we should do in database or the universe to take care of this huge amount of data so that our reports run successfuly.

Regards,
Priyanka


priyankt :india: (BOB member since 2009-07-30)

Well it depends on if you want all 2million records returned to your report. One would seriously hope not. I would be inclined to believe that your report would simply fail to open.

I it is required that some sort of calculation occur on all 2 million rows, then I would suggest some form of aggregation of the data using sums on measures and even consider using the aggregate_aware function.


plessiusa :netherlands: (BOB member since 2004-03-22)

This can help:


Marek Chladny :slovakia: (BOB member since 2003-11-27)

First thing to review is your table design.
The table will perform best with its dimensions if:
All joins are inner joins
All join columns are integers

The three things that are the biggest wins for you are:
1/ Indexing - creating and maintaining indexes on such a table will be essential to performance. You are probably already seeing a slight performance dip if you haven’t been indexing. Index all joins and any columns that are often used in conditions.

2/ Partitioning - creating and maintaining sensible partitions are essential. I’ve working on a 3.2billion row fact table that was partitioned at a day level (c. 2.5 million rows per day). Most major vendors support partitioning these days.

3/ Aggregates - look at which queries don’t need the full level of granularity that the fact table provides - is there something that the all summarise to? Say you load 200,000 rows a day, how many rows would this become if you were to aggregate to a daily level for example?

If you are recording branch, product, salesperson and date/time for sales transactions, do a simple check to see how many unique rows there are for a given day and see how many rows there would be if you aggregated to date level. You should be looking at a reduction of at least 90% in the number of rows - e.g. 200,000 down to a maximum of 20,000 rows per day.

Regards,
Mark

Hi Mark,

Can you please explain the Partitioning part again? I didn’t understand what is meant by c. 2.5 million rows per day etc. I have never done this before and thought it may be useful to know for future :slight_smile:


Smith85 (BOB member since 2009-12-10)

c. means approximately; it’s an abbreviation of the Latin word circa.

This link to Wikipedia links on to the relevant DBMS for you:

And…

4/ Regular collection of stats on relevant table columns. There has to be a careful balance between the time that it takes to collect the stats versus the improvement in query performance. I’m working with day level fact tables with 6 billion records, and if the stats are missing the impact on query times is massive.


anorak :uk: (BOB member since 2002-09-13)