Try to release reports batch by batch in smaller portions. Users and production supports will get crazy if you release too much and too fast. We should give enough time for the users to change their working procedures with the new reports and environment. Otherwise, users may stick to their old procedures instead of switching to the new system.
Wednesday, September 30, 2020
Tuesday, September 29, 2020
BI/DW Best Practice #15 - Be careful to be pilot project
The application defeats and stability of the tools using always impact to our projects. So, try to avoid them as much as possible.
Monday, September 28, 2020
BI/DW Best Practice #14 - Flexible, rerunable and fault tolerance ETL processes
There are several different source systems. So, we may have errors from different reasons. We may have three options to handle. Halt, Proceed but reject failed records and Proceed with notification.
BI/DW Best Practice #13 - Keep measures additive if possible
BI tools rollup the measures, so we keep the measures additive if possible. For example, we won't pre-calculate and keep the KPI "Unit per Transaction (UPT)" in single measure column as a ratio. Instead, we keep two measures, "Number of Units" and "Number of Transactions".
Sunday, September 27, 2020
BI/DW Best Practice #12 - Manage data integrity
The data warehouse / data mart are usually implemented in RDBMS. RDBMS provides lots of features to manage data which are useful for both operational and analytical systems. However, per the different natures between operational and analytical systems. We have different practices to manage our data.
Saturday, September 26, 2020
BI/DW Best Practice #11 - Decouple with source systems
Traditionally, there are data file interfaces between the source system and the DW. Extraction process extract the data into data files. The subsequence ETL process read the extracted data file. However, supported by many ETL tools today, it is possible to extract the source data through direct connection and streamline the data, transform and load into the target database. It enables the whole ETL process streaming the data in memory without intermediate file IO which maximize the performance. Should we take this approach? My answer is no.
Wednesday, September 16, 2020
BI/DW Best Practice #10 - Handle unknown and null
NULL columns sometime give unexpected query results to users in user point of view. It happens in filtering (i.e. condition [COLUMN1 = 0 or COLUMN1 != 0] doesn't return all records, variance from the records with COLUMN1 is NULL). We should avoid those unexpected outcome by avoiding NULL in all dimension columns and dimension foreign key.
Extract: Performance Tips
Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...
-
If we are using dimensional data model, make sure to build atomic grain base fact tables. It is true that they takes more space and performs...
-
Extract is the first step in the ETL process which obtains data from the source systems. This is our first question. How much data we need t...
-
Measures are the values in your fact tables which provide numeric meaning to the business process event. There are three kinds of numeric me...