Raymond Law Blog: What is ETL process?

ETL is the short form of Extract, Transform and Load. They are the processes to propagate data from single or multiple sources to the destination system. Firstly, data are extracted from the source system(s). Transform is the process to convert the extracted data into the required format. It may include data cleansing, combining, splitting, deriving, grouping, sorting, aggregating and lookup(ing). Finally, the transformed data are loaded into the destination system.

The most common use cases of ETL processes is in the Data Warehouse. ETL processes are developed to extract source data from various systems. And then transform and load into the data warehouse. Other common use cases are data migration or other data propagation between systems.

In this series, I am going to introduce some ETL techniques and practices.

Extract:

Transform:

Data Transformation
Data cleansing
Late-Arriving Dimension - Strategy 1 - Reject
Late-Arriving Dimension - Strategy 2 - Load as unknown dimension
Late-Arriving Dimension - Strategy 3 - Load with dimension augmentation
Validation
Performance Tips

Load:

Loading Strategies
Loading Performance - General
Loading Performance - Insert
Loading Performance - Update
Loading Performance - Delete
Loading by multiple steps
Loading Type 2 dimension
Loading partitioned table

Overall ETL:

Modularized and reusable
Restart-able and recovery
Error Handling
Data integrity
Error Reports
Audit Log and notification
Notification - Exception only?
Parallel
Dependency
Automated maintenance
Troubleshooting Performance Problems
Common tuning tips in ETL engine
ETL tool or hand-coded
Reliable workflow
Teamwork
ETL Framework

Raymond Law Blog

Tuesday, November 17, 2020

What is ETL process?

Extract:

Transform:

Load:

Overall ETL:

No comments:

Post a Comment

Extract: Performance Tips

Report Abuse