Tuesday, November 10, 2020

Table Partitioning in Fact tables

Table partition is to break down the big table into smaller sub-tables, called partitions. Without table distribution, although there may be still single processing node, we may get performance gain from multi-threading and parallel IO. We may achieve better performance from using a multi-cored CPU and distribute the partitions in different or parallel disk device (e.g. SAN, RAID 10, etc.).

We may also get the following benefits from table partitioning

  • Parallel processing in each partition by multithreading and parallel IO.
  • Reduced data read operations through partition elimination from the query filtering conditions
  • Data Loading on the fly through a staging table with partition switching operation
  • Data purging or archiving using partition switch / drop operations
  • Avoid indexes fragmentation in existing loaded partitions if you load data into new partition only (e.g. daily batch loading in date partitioned table)

To enjoy the above benefits, it is important to decide the proper grain and key of the partition table. Data are usually loaded, purged and archived by date or month. So, we usually choose date or month key as the partition key of the fact table. 

And be remember that we may not get any benefit or even worse if the size of the table is too small or wrong partition key is chosen. The overhead will become significant when there are lot of partitions to be processed. Too many threads may be needed and they queue to wait for the available resources to start their processes. So, partition elimination is important and it is why we should choose the partition key carefully. And for the same reason, it doesn't make sense to partition a small table. That's why, usually, dimension tables are not justified to apply table partitioning.

Some quick guidelines for fact table partitioning

  • Proper Partition Key for fact tables
    • querying perspective
      • enjoy partition elimination
      • single int/bigint column preferred
      • no or less null value
      • usually date or month key
    • operations perspective
      • same grain as the frequent operation (e.g. loading)
      • usually date or month key

No comments:

Post a Comment

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...