Wednesday, September 30, 2020

BI/DW Best Practice #16 - Start small, release in batches

Try to release reports batch by batch in smaller portions. Users and production supports will get crazy if you release too much and too fast. We should give enough time for the users to change their working procedures with the new reports and environment. Otherwise, users may stick to their old procedures instead of switching to the new system.

Tuesday, September 29, 2020

BI/DW Best Practice #15 - Be careful to be pilot project

The application defeats and stability of the tools using always impact to our projects. So, try to avoid them as much as possible.

Monday, September 28, 2020

BI/DW Best Practice #14 - Flexible, rerunable and fault tolerance ETL processes

There are several different source systems. So, we may have errors from different reasons. We may have three options to handle. Halt, Proceed but reject failed records and Proceed with notification.

BI/DW Best Practice #13 - Keep measures additive if possible

BI tools rollup the measures, so we keep the measures additive if possible. For example, we won't pre-calculate and keep the KPI "Unit per Transaction (UPT)" in single measure column as a ratio. Instead, we keep two measures, "Number of Units" and "Number of Transactions".

Sunday, September 27, 2020

BI/DW Best Practice #12 - Manage data integrity

The data warehouse / data mart are usually implemented in RDBMS. RDBMS provides lots of features to manage data which are useful for both operational and analytical systems. However, per the different natures between operational and analytical systems. We have different practices to manage our data.

Saturday, September 26, 2020

BI/DW Best Practice #11 - Decouple with source systems

Traditionally, there are data file interfaces between the source system and the DW. Extraction process extract the data into data files. The subsequence ETL process read the extracted data file. However, supported by many ETL tools today, it is possible to extract the source data through direct connection and streamline the data, transform and load into the target database. It enables the whole ETL process streaming the data in memory without intermediate file IO which maximize the performance. Should we take this approach? My answer is no.

Wednesday, September 16, 2020

BI/DW Best Practice #10 - Handle unknown and null

NULL columns sometime give unexpected query results to users in user point of view. It happens in filtering (i.e.  condition [COLUMN1 = 0 or COLUMN1 != 0] doesn't return all records, variance from the records with COLUMN1 is NULL). We should avoid those unexpected outcome by avoiding NULL in all dimension columns and dimension foreign key.

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...