Wednesday, November 18, 2020

Extract: Full or Incremental (Delta)

Extract is the first step in the ETL process which obtains data from the source systems. This is our first question. How much data we need to extract?

In the data migration or the initial load of the Data Warehouse, it is typical to use full extraction. We have nothing in the destination system, so we need to extract everything. It takes longer time to extract all data which we may divide them by batches. Anyway, it is simple and straight forward to make a full extraction from the source systems.

However, during the regular execution of the Data Warehouse, we don't think that we still need to make a full extraction from the source systems gain every time. It takes significant time and resources for the full extraction, as well as the full load. It also doesn't make sense to do it again for the data we have loaded. It is why we use incremental extraction in the regular extraction. We always recommend extracting the delta (changes) only since our last extraction whenever applicable.

Once we controlled the volume of data by incremental extraction, we also reduced the volume of data in the subsequence processes of transforming and loading.

No comments:

Post a Comment

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...