Tuesday, October 20, 2020

Multivalued Dimension

Per the structure of dimensional schema, each dimension attached to a fact table has a single value. However, in reality, there are number of situations that dimension may contain multiple values. For example, in family insurance policy, multiple insured-person (family members) are attached to a policy. So, there are some techniques to handle such situation.

1. Keep a simple dimension and ignore the multiple values

We simply eliminate the multi-valued dimension challenge by ignoring it. We usually use this method if such multi-valued attributes are not useful in the analysis. We may simply choose one particular value and leave the others. Or we may insert a dimension row to represent the mixed values while leaving their individual attributes.

2. Add multiple columns to represent the individual values if values are known limited

If the number of values are known limited, we may use this method. We add fixed number of columns to store their individual attributes. However, it is not really recommended. It is because the data will be hard to be analyzed when attributes stored in both way horizontally and vertically.

3. Use bridge table with proper weighting factors. 

This method uses the bridge table which is a N-to-N relationship table between the fact and dimension tables. It is how we maintain multiple values in the dimensional schema. To manage the risk of double counting, we use a weighting factors field to represent the contribution ratio of each value. In each transaction (row of fact table), the sum of the corresponding weighting factors should be equal to 1. This method should be used carefully, all the measure calculation should associate with the weighting factors. Otherwise, double counting will still be happened. 

Multivalued Dimension using Bridge Table
Figure: Multivalued Dimension using Bridge Table


No comments:

Post a Comment

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...