Wednesday, September 30, 2020

BI/DW Best Practice #16 - Start small, release in batches

Try to release reports batch by batch in smaller portions. Users and production supports will get crazy if you release too much and too fast. We should give enough time for the users to change their working procedures with the new reports and environment. Otherwise, users may stick to their old procedures instead of switching to the new system.

Tuesday, September 29, 2020

BI/DW Best Practice #15 - Be careful to be pilot project

The application defeats and stability of the tools using always impact to our projects. So, try to avoid them as much as possible.

Monday, September 28, 2020

BI/DW Best Practice #14 - Flexible, rerunable and fault tolerance ETL processes

There are several different source systems. So, we may have errors from different reasons. We may have three options to handle. Halt, Proceed but reject failed records and Proceed with notification.

BI/DW Best Practice #13 - Keep measures additive if possible

BI tools rollup the measures, so we keep the measures additive if possible. For example, we won't pre-calculate and keep the KPI "Unit per Transaction (UPT)" in single measure column as a ratio. Instead, we keep two measures, "Number of Units" and "Number of Transactions".

Sunday, September 27, 2020

BI/DW Best Practice #12 - Manage data integrity

The data warehouse / data mart are usually implemented in RDBMS. RDBMS provides lots of features to manage data which are useful for both operational and analytical systems. However, per the different natures between operational and analytical systems. We have different practices to manage our data.

Saturday, September 26, 2020

BI/DW Best Practice #11 - Decouple with source systems

Traditionally, there are data file interfaces between the source system and the DW. Extraction process extract the data into data files. The subsequence ETL process read the extracted data file. However, supported by many ETL tools today, it is possible to extract the source data through direct connection and streamline the data, transform and load into the target database. It enables the whole ETL process streaming the data in memory without intermediate file IO which maximize the performance. Should we take this approach? My answer is no.

Wednesday, September 16, 2020

BI/DW Best Practice #10 - Handle unknown and null

NULL columns sometime give unexpected query results to users in user point of view. It happens in filtering (i.e.  condition [COLUMN1 = 0 or COLUMN1 != 0] doesn't return all records, variance from the records with COLUMN1 is NULL). We should avoid those unexpected outcome by avoiding NULL in all dimension columns and dimension foreign key.

Tuesday, September 15, 2020

BI/DW Best Practice #9 - Contain date dimension in all fact tables

If we are using dimensional data model, make sure to include a date dimension (at least one) in every fact tables. It must be an item in our checklist for our dimensional model design. I can't imagine if we can miss the date in any business activities. 

Monday, September 14, 2020

BI/DW Best Practice #8 - Use Surrogate key

We sometime hear the argument of using surrogate key or business key. There are number of reasons why we use surrogate key. I am not going through them one by one here. I would just emphasize my key reason here - to make the model capable to changes. 

Thursday, September 10, 2020

BI/DW Best Practice #7 - Atomic grain base fact table

If we are using dimensional data model, make sure to build atomic grain base fact tables. It is true that they takes more space and performs slower than aggregated fact table. We may build aggregated fact table on-top to improve the performance but don't skip the atomic grain base fact table. It can partly act as the single version of truth in your solution. The atomic grain base fact table is also likely reducing the efforts to rebuild everything from zero again if any future changes. 

BI/DW Best Practice #6 - Denormalize the data in the user access layer

Before I start writing this article, I want to simply suggest to use star schema. However, someone may not build a dimensional data warehouse, so I generalize my words to denormalization. Per performance concern, we always use denormalized data model to support fast user queries.

Saturday, September 5, 2020

BI/DW Best Practice #5 - Consolidate entities while keep correct grain

We may sometime consolidate similar analysis in a fact table. It may reduce the total size, complexity and effort from such shared fact table. However, we should make sure that we don't violate the correct granularity of the fact table. 

Conforming fact tables in wrong granularity will lead to wrong results in analysis. Feel free to create a new fact table if they are in different grain. It is ok that we can have multiple fact tables in a subject area.

BI/DW Best Practice #4 - Organization wide participation to ensure data quality

The technologies don't intelligent alone. The people are the intelligent and the BI solution assists them. We should introduce the best practice in the organization to maintain their data quality.

Tuesday, September 1, 2020

BI/DW Best Practice #3 - Understand the business processes

We need to understand the business processes and ask the right questions. Some managers will ask for whatever information they can get without thinking carefully about what they really need. Too much information can be as harmful as too little. Other managers may omit things they ought to know, or they may not know to ask for some types of information they should have.

BI/DW Best Practice #2 - Use common business languages

In the BI/DW projects, try to use common and aligned business languages in those deliverable including reports and dashboards. It can avoid the misunderstanding of the terms in deliverable between business units. 

Sometime, the business language may still be ambiguous between some business units. For example, don't confuse the terms - revenue, net profit and income. Are different business units use the different words talking the same thing? It is why we recommended having a organizational wide participation to define, document and spread the business terminologies within the organization. Usually, we may talk to the accountants of the organization first. They could give us great input before we propose the business terminology standard.

BI is not only a software solution, but also a set of practice to facilitate business decision making. The alignment of business terminology is one of them.

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...