Sunday, September 27, 2020

BI/DW Best Practice #12 - Manage data integrity

The data warehouse / data mart are usually implemented in RDBMS. RDBMS provides lots of features to manage data which are useful for both operational and analytical systems. However, per the different natures between operational and analytical systems. We have different practices to manage our data.

Operational system may have concurrent DML from different users and different applications at any time. So, the constraints and triggers are useful to maintain the data integrity. We may also use the normalized data model to keep the data consistency. 

In Analytical system, performance is king. We are not likely to use some of these features, for example the foreign key constraints nor triggers. Only ETL jobs modify the data in the data warehouse /data marts. We keep the data integrity by the carefully implemented and tested ETL jobs instead of using constraints and triggers. They drop the DML performance. We use denormalized data model to improve the performance of user queries while the data consistency is maintained by ETL.

Apart from carefully implemented ETL jobs, data integrity are also checked by set of ETL processes or error reports. Value range check, unique checking of business key, referral checking of dimension key are usually done by verification ETL or error reports.

No comments:

Post a Comment

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...