Try to release reports batch by batch in smaller portions. Users and production supports will get crazy if you release too much and too fast. We should give enough time for the users to change their working procedures with the new reports and environment. Otherwise, users may stick to their old procedures instead of switching to the new system.
Wednesday, September 30, 2020
Tuesday, September 29, 2020
BI/DW Best Practice #15 - Be careful to be pilot project
The application defeats and stability of the tools using always impact to our projects. So, try to avoid them as much as possible.
Monday, September 28, 2020
BI/DW Best Practice #14 - Flexible, rerunable and fault tolerance ETL processes
There are several different source systems. So, we may have errors from different reasons. We may have three options to handle. Halt, Proceed but reject failed records and Proceed with notification.
BI/DW Best Practice #13 - Keep measures additive if possible
BI tools rollup the measures, so we keep the measures additive if possible. For example, we won't pre-calculate and keep the KPI "Unit per Transaction (UPT)" in single measure column as a ratio. Instead, we keep two measures, "Number of Units" and "Number of Transactions".
Sunday, September 27, 2020
BI/DW Best Practice #12 - Manage data integrity
The data warehouse / data mart are usually implemented in RDBMS. RDBMS provides lots of features to manage data which are useful for both operational and analytical systems. However, per the different natures between operational and analytical systems. We have different practices to manage our data.
Saturday, September 26, 2020
BI/DW Best Practice #11 - Decouple with source systems
Traditionally, there are data file interfaces between the source system and the DW. Extraction process extract the data into data files. The subsequence ETL process read the extracted data file. However, supported by many ETL tools today, it is possible to extract the source data through direct connection and streamline the data, transform and load into the target database. It enables the whole ETL process streaming the data in memory without intermediate file IO which maximize the performance. Should we take this approach? My answer is no.
Wednesday, September 16, 2020
BI/DW Best Practice #10 - Handle unknown and null
NULL columns sometime give unexpected query results to users in user point of view. It happens in filtering (i.e. condition [COLUMN1 = 0 or COLUMN1 != 0] doesn't return all records, variance from the records with COLUMN1 is NULL). We should avoid those unexpected outcome by avoiding NULL in all dimension columns and dimension foreign key.
Tuesday, September 15, 2020
BI/DW Best Practice #9 - Contain date dimension in all fact tables
If we are using dimensional data model, make sure to include a date dimension (at least one) in every fact tables. It must be an item in our checklist for our dimensional model design. I can't imagine if we can miss the date in any business activities.
Monday, September 14, 2020
BI/DW Best Practice #8 - Use Surrogate key
We sometime hear the argument of using surrogate key or business key. There are number of reasons why we use surrogate key. I am not going through them one by one here. I would just emphasize my key reason here - to make the model capable to changes.
Thursday, September 10, 2020
BI/DW Best Practice #7 - Atomic grain base fact table
If we are using dimensional data model, make sure to build atomic grain base fact tables. It is true that they takes more space and performs slower than aggregated fact table. We may build aggregated fact table on-top to improve the performance but don't skip the atomic grain base fact table. It can partly act as the single version of truth in your solution. The atomic grain base fact table is also likely reducing the efforts to rebuild everything from zero again if any future changes.
BI/DW Best Practice #6 - Denormalize the data in the user access layer
Before I start writing this article, I want to simply suggest to use star schema. However, someone may not build a dimensional data warehouse, so I generalize my words to denormalization. Per performance concern, we always use denormalized data model to support fast user queries.
Saturday, September 5, 2020
BI/DW Best Practice #5 - Consolidate entities while keep correct grain
We may sometime consolidate similar analysis in a fact table. It may reduce the total size, complexity and effort from such shared fact table. However, we should make sure that we don't violate the correct granularity of the fact table.
Conforming fact tables in wrong granularity will lead to wrong results in analysis. Feel free to create a new fact table if they are in different grain. It is ok that we can have multiple fact tables in a subject area.
BI/DW Best Practice #4 - Organization wide participation to ensure data quality
The technologies don't intelligent alone. The people are the intelligent and the BI solution assists them. We should introduce the best practice in the organization to maintain their data quality.
Tuesday, September 1, 2020
BI/DW Best Practice #3 - Understand the business processes
We need to understand the business processes and ask the right questions. Some managers will ask for whatever information they can get without thinking carefully about what they really need. Too much information can be as harmful as too little. Other managers may omit things they ought to know, or they may not know to ask for some types of information they should have.
BI/DW Best Practice #2 - Use common business languages
In the BI/DW projects, try to use common and aligned business languages in those deliverable including reports and dashboards. It can avoid the misunderstanding of the terms in deliverable between business units.
Sometime, the business language may still be ambiguous between some business units. For example, don't confuse the terms - revenue, net profit and income. Are different business units use the different words talking the same thing? It is why we recommended having a organizational wide participation to define, document and spread the business terminologies within the organization. Usually, we may talk to the accountants of the organization first. They could give us great input before we propose the business terminology standard.
BI is not only a software solution, but also a set of practice to facilitate business decision making. The alignment of business terminology is one of them.
Extract: Performance Tips
Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...
-
We usually implement the multi-dimensional data model in RDBMS or CUBE. Today, we focus on the Constraints and indexes in multi-dimensional ...
-
Measures are the values in your fact tables which provide numeric meaning to the business process event. There are three kinds of numeric me...
-
We always recommend to flatten the (fixed-depth) hierarchy information into the dimension. It is how we keep the star schema simple and nice...