Friday, October 16, 2020

Date Dimension

A special type of dimension to represent dates of business process event. Per the stability nature of calendar, we may simply use meaningful integer YYYYMMDD as its primary key. I believe that it is the only exceptional scenario that we accept meaningful primary key. It contains those date elements for displaying. It also contains flattened hierarchy information for filtering and grouping (e.g. calendar month, calendar year, fiscal month, fiscal year). We may also add some tailor made fields to the date dimension for our business. (e.g. holiday flag)

We recommend using star schema but not snowflake. So, we have one Date Dimension containing flattened hierarchy information stored in it without creating separated Month/Quarter Dimensions. We pre-generate rows for the date dimension table (e.g. recent 20 years). Similar to other dimensions, we also insert at least one row to represent unknown/null date (e.g. surrogate key = -1) 

We always have only one physical date dimension table. We may implement different views (see Role Playing Dimension) to help showing different semantic context (e.g. selling date vs ordering date). Similarly, to handle different time zones, we may have separated date key fields in the fact table to represent local and global date referencing the same physical date dimension. 

On the other hand, we usually don't create a separated time (of day) dimension table. It is because we are not likely to have complex filtering or grouping by time of day in the analysis. So, in most cases, a degenerated time field (e.g. integer field in HHMMSS format) is usually good enough. One of the exceptional cases is that you need to group and represent a period of time like "rush hour" or "first shift". You may create the time (of day) dimension in such cases.

No comments:

Post a Comment

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...