Saturday, August 29, 2020

What is Data Warehouse?

Data warehouse (DW) is the architecture to collect and manage business data from various sources to provide meaningful business insight. It is different to the database of other operational systems. It collects the data from different business processes while operational database focuses in particular process. Its purpose for querying and analysis while operational database is for supporting its operations (e.g. frequent DML for sales transaction entries).

There are two well-known approaches to implement data warehouse.


Kimball approach

It defines a dimensional data warehouse based on the business requirements. It starts from the business processes analysis and build the model based on the facts, dimensions and grains identified. It is a dimensional and de-normalized data warehouse. The approach introduces dimensional data modeling techniques to organize the data in facts and dimensions. ETL processes are built to propagate the data from sources into the dimensional data model.

Advantages: 

  • Short development cycle
  • Easy to understand the data model
  • Good performance
  • Dimensional data model enables the usage of BI tools and self-serviced analysis

Disadvantages:

  • Missing of single version of truth
  • Potential data inconsistency between facts
  • Missing the completeness of entire enterprise data

Inmon approach

It defines a relational data warehouse (as known as corporate data model). It is a relational and normalized (3NF) data model act as the single version of truth. The data model covers all the entities and relationships in those major subject areas. Due to its relational structure, it is usually challenging in querying data directly from this kind of relational data warehouse. This approach proposes to further propagate the data into some de-normalized data marts (e.g. one data mart per department) for further querying and analysis. So, typically, there are two sets of ETL processes. The first set is propagating the data from sources system to the relational data warehouse. Another set is propagating the data from the relational data warehouse into the data marts.

Advantages:

  • Maintain a single version of truth
  • Better data consistency
  • Completeness of entire enterprise data

Disadvantages:

  • Long development cycle (costly)
  • Additional ETL processes
  • Data Expertise required to understand and maintain the entire corporate data model

No comments:

Post a Comment

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...