Suppose the SSIS package contains around 30-40 tasks\ Containers and the execution time of ETL completion is approximately 18 hours. This means that the reports developed on the data mart needs to be refreshed after 18 hours. Now assume after 10th hour something happened and the ETL fails. Now what would you do? Start the package from beginning! Obviously not since Stake holders are waiting for the reports with refreshed data.
What you should do is ‘Implement Restart-ability’ in your package. You can implement restart-ability by either using SSIS features or by doing custom code. In this blog I will be discussing about SSIS feature to implement Restart-ability.
SSIS uses Checkpoints to restart the package from the point of failure instead of starting the package from the beginning. SSIS captures the restart point in the checkpoint file along with values of variables, details of successfully completed tasks and the unique identifier of the package. With the help of this unique identifier, SSIS picks up the correct checkpoint file and restarts the package. If the package identifier does not match with the identifier stored in Checkpoints file then restart will fail. After you implemented the fix in your code and restart the package, SSIS uses this checkpoint file to start execution from the point of failure, skipping all the tasks that successfully executed previously. Once the package is successful, the checkpoint file is removed from the system.
When package restarts from the point of failure, then:
- Committed transactions may run again
- Loop containers will always run again even if the task inside them successfully completed
Note: For the scenarios mentioned above, we have to go for Custom Checkpoints
Checkpoints and Data Integrity:
When the SSIS package use Checkpoints file, it tries to acquire lock on that file. Now if another instance of the same package is running and access the same checkpoint file, then it fails since the checkpoint file is locked by already running SSIS package. This ensures that no two instance of the same package can run simultaneously. Hence there will not be any data integrity issues.
Implementing Checkpoints:
To implement checkpoints, we need to configure only four properties:
- Checkpoint file name – Specify the name of checkpoint
- Checkpoint usage – Choose any 1 of three values: Never, Always and IfExists
- Save checkpoints – Set this property to true to restart the package from point of failure
- Set FailPackageOnFailure to true for every task\ container used in the package
Checkpoint usage properties:
- Never – Do not use checkpoints
- Always – Always use checkpoints, if the file does not exists package will fail
- IfExists – Package use checkpoint file if exists otherwise it creates it