Wednesday, October 14, 2020

Junk Dimension

Junk Dimension is also known as Garbage Dimension. There may be a lot of miscellaneous, low-cardinality flags and indicators (e.g. TransactionType, TransactionStatus, VoidFlag, ActiveFlag, etc.) in a transaction. 

To deal with this challenge, the first potential solution is to make separated tiny dimensions for each flag and indicator. Its drawback is that there will be lots of tiny dimension table and dimension foreign key in the fact table. The second alternative is to degenerate such dimension attributes into the fact table. This way can avoid the creation of such a lot tiny dimension tables. However, the degenerated fields still increase the row length and total size of fact table, especially when the degenerated field values are lengthy long text. 

This last one is our recommended solution. We usually create a single (or sometime a few) junk dimension consolidating these attributes together. In the above case, we may name it as Transaction Profile Dimension. It is a good technique to reduce the number of dimension tables (which allow a thin fact table) and complexity (improve understandability) of our data model.

If those attribute values are limited, we may pre-generate the whole Junk Dimension table by Cartesian product for all possible combination of values. To avoid the junk dimension grows too huge in size, we may sometimes create a few junk dimension tables (e.g. when too many fields; 10 binary flags may come to 2^10 rows) 

No comments:

Post a Comment

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...