Monday, September 28, 2020

BI/DW Best Practice #14 - Flexible, rerunable and fault tolerance ETL processes

There are several different source systems. So, we may have errors from different reasons. We may have three options to handle. Halt, Proceed but reject failed records and Proceed with notification.

Halt the whole ETL processes is the most easy handling method but not recommended. It makes the whole solution unreliable if it halts frequently. Users won't able to query the recent business data. We halt the system only in the situation that we can't avoid.

Reject records is one of the useful error handling. We simply reject and notify the users. So that they can fix and resend those data. At the same time, other good records can be proceeded. In this approach, a nice reload mechanism should be available to reload the fixed records. Technically, it looks good. However, you still need to estimate how long and how often the data will be fixed in source system. 

The last common method is to proceed the data anyway. We may mark them with quality issue in some way (e.g. error reports, audit dimension, etc.). "Augment" is one of the major techniques to proceed error records.

Sometime, wrong data may be found after they have been loaded for few weeks. A flexible ETL processes (runnable for any period, portions, etc.) will be very useful. So, we can use the same set of logic to reload historical data.

No comments:

Post a Comment

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...