Thursday, July 19, 2018 10:31:24 AM
I know that we were all taught to delete then replace as the standard practice for ETL. I personally don’t prefer that approach for these reasons:
When an ETL goes foul, the prior table’s data was truncated or the table dropped. Reverting is possible if a backup exists, but it is a painful process most often avoided.
The drop/truncate then select/insert into approach typically takes many minutes to perform. During that time, queries will return incomplete information (either none at all, or just the rows that have been ETL’d so far when query ran).
During the ETL, while the end table is in flux, end users are without good data.
The approach I prefer involves swapping a table from one schema to another (I refer to it as a flip-flop) – this swap process takes under a second, so the benefits are:
The end user can always rely on the table data – it is always a complete table from the ETL.
IT can easily revert to the prior table if the ETL results are found to be wanting for whatever reason. The end user effect is that they can continue using the older data while the ETL result is reviewed/repaired.
To set up the schema swap process, I establish two additional schema – new and old (but call them what you want). The new schema is where the table from the ETL is built and indexes added. The old schema is used to move the current dbo (or whatever schema it resides in) table so simple reversion can occur if needed.
Steps (entire flip-flop process takes under a second, no matter what the size of the table):
Either create table or select into ETL target table in the 'new' schema. Create indexes against as needed. These steps are basically the same as what is typically done in an ETL process, only done against a target table that isn't the currently active data.
Prepare for what I refer to as the 'flip-flop' process.
a. Get rid of the table if it exists in the 'old' schema
if exists (select * from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA = 'old' and TABLE_NAME = 'ETL Target Table') Drop Table [old].[ETL Target Table];
b. Place the current table into the 'old' schema
Alter Schema old TRANSFER [dbo].[ETL Target Table];
c. Move the 'new' ETL Target Table into the desired schema
if exists (select * from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA = 'new' and TABLE_NAME = 'ETL Target Table') Alter Schema dbo TRANSFER [new].[ETL Target Table];