- Extract required columns only and specify in the query, avoid select *
- Extract required rows only. delta only
- Avoid unnecessary sorting operations
- union vs union all
- distinct, group by, order by: use only when necessary
- Avoid transformation logic in the extract queries. These logic should be offloaded to ETL engine.
- Indexes
- proper where clause;
- use AND;
- avoid OR/NOT/<>/NOT IN
- be careful using wildcard and range
- create proper indexes
- maintain indexes regularly
- Consider "Read uncommitted" (AKA dirty read, NOLOCK). Learn and use it carefully.
- Use native utility and driver to execute your query. They are usually tuned for better performance.
Raymond Law Blog
Wednesday, December 16, 2020
Extract: Performance Tips
Wednesday, December 2, 2020
Different file encoding and formats
Our ETL program need to handle different format and encoding coming from different sources. Below listed some common technical challenges.
- Encoding: ASCII vs EBCDIC
- Character Set: UTF8, Unicode, etc.
- Format: Fixed length, Delimited, JSON, XML, Excel Spreadsheet and more
- Numeric Format: Binary, Packed-decimal, Zoned-decimal and more
Monday, November 30, 2020
Transfer extracted data and checking
As mentioned in previous post, pushing or the publish/subscribe model are preferred. Todays topic is how to transfer the extracted files.
Tuesday, November 24, 2020
ETL: Interface files/tables or pipeline
We have different approaches to implement the ETL process. We may create an end-to-end ETL pipeline to propagate the data from extract, transform to load. Another common approach is to save the extracted data in some intermediate files or tables before transforming and loading.
Monday, November 23, 2020
Extract: Push, Pull or Publish/Subscribe
There are different extraction models. They are pushing, pulling and publish/subscribe models.
Thursday, November 19, 2020
Change Data Capture (CDC)
The concept of incremental extraction is strict forward. We just extract the changed data. Today's post is going to tell you how. How can we identify changed data?
Wednesday, November 18, 2020
Extract: Full or Incremental (Delta)
Extract is the first step in the ETL process which obtains data from the source systems. This is our first question. How much data we need to extract?
Extract: Performance Tips
Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...
-
Measures are the values in your fact tables which provide numeric meaning to the business process event. There are three kinds of numeric me...
-
We always recommend to flatten the (fixed-depth) hierarchy information into the dimension. It is how we keep the star schema simple and nice.
-
There are different extraction models. They are pushing, pulling and publish/subscribe models.