Wednesday, December 16, 2020

Extract: Performance Tips

Below listed some common performance tips for extract queries.
  1. Extract required columns only and specify in the query, avoid select *
  2. Extract required rows only. delta only
  3. Avoid unnecessary sorting operations
    • union vs union all
    • distinct, group by, order by: use only when necessary
  4. Avoid transformation logic in the extract queries. These logic should be offloaded to ETL engine.
  5. Indexes
    • proper where clause; 
      • use AND; 
      • avoid OR/NOT/<>/NOT IN
      • be careful using wildcard and range
    • create proper indexes
    • maintain indexes regularly
  6. Consider "Read uncommitted" (AKA dirty read, NOLOCK). Learn and use it carefully.
  7. Use native utility and driver to execute your query. They are usually tuned for better performance.

Wednesday, December 2, 2020

Different file encoding and formats

Our ETL program need to handle different format and encoding coming from different sources. Below listed some common technical challenges.

  • Encoding: ASCII vs EBCDIC
  • Character Set: UTF8, Unicode, etc.
  • Format: Fixed length, Delimited, JSON, XML, Excel Spreadsheet and more
  • Numeric Format: Binary, Packed-decimal, Zoned-decimal and more
Some of the above format or encoding may be challenging in hand coding ETL. However, most modern ETL engine could help to deal with the above different format and encoding.

Extract: Performance Tips

Below listed some common performance tips for extract queries. Extract required columns only and specify in the query, avoid select * Extrac...