Monthly Archives: September 2012

Checkpoints in SSIS – Implementing restart-ability

Suppose the SSIS package contains around 30-40 tasks\ Containers and the execution time of ETL completion is approximately 18 hours. This means that the reports developed on the data mart needs to be refreshed after 18 hours. Now assume after 10th hour something happened and the ETL fails. Now what would you do? Start the package from beginning! Obviously not since Stake holders are waiting for the reports with refreshed data.

What you should do is ‘Implement Restart-ability’ in your package. You can implement restart-ability by either using SSIS features or by doing custom code. In this blog I will be discussing about SSIS feature to implement Restart-ability.

SSIS uses Checkpoints to restart the package from the point of failure instead of starting the package from the beginning. SSIS captures the restart point in the checkpoint file along with values of variables, details of successfully completed tasks and the unique identifier of the package. With the help of this unique identifier, SSIS picks up the correct checkpoint file and restarts the package. If the package identifier does not match with the identifier stored in Checkpoints file then restart will fail. After you implemented the fix in your code and restart the package, SSIS uses this checkpoint file to start execution from the point of failure, skipping all the tasks that successfully executed previously. Once the package is successful, the checkpoint file is removed from the system.

When package restarts from the point of failure, then:

  1. Committed transactions may run again
  2. Loop containers will always run again even if the task inside them successfully completed

Note: For the scenarios mentioned above, we have to go for Custom Checkpoints

Checkpoints and Data Integrity:

When the SSIS package use Checkpoints file, it tries to acquire lock on that file. Now if another instance of the same package is running and access the same checkpoint file, then it fails since the checkpoint file is locked by already running SSIS package. This ensures that no two instance of the same package can run simultaneously. Hence there will not be any data integrity issues.

Implementing Checkpoints:

To implement checkpoints, we need to configure only four properties:

  1. Checkpoint file name –  Specify the name of checkpoint
  2. Checkpoint usage – Choose any 1 of three values: Never, Always and IfExists
  3. Save checkpoints – Set this property to true to restart the package from point of failure
  4. Set FailPackageOnFailure to true for every task\ container used in the package

Checkpoint usage properties:

  1. Never – Do not use checkpoints
  2. Always – Always use checkpoints, if the file does not exists package will fail
  3. IfExists – Package use checkpoint file if exists otherwise it creates it

Transactions in SSIS

Transactions in SSIS can be implemented at any level like package level, task level, Container level etc.

SSIS transactions support 3 values:

Supported: An SSIS object can only join the transaction, cannot initiate the transaction

Not Supported: An SSIS object neither initiate the transaction nor joins the transaction

Required: An SSIS object can initiate the transaction

Now we know the meaning of transaction values, let’s go a step further  and understand transaction’s behavior in case of Parent – child relationship:

Scenario 1: Transaction property set to ‘Required’ at parent level

Suppose at package level, I set the transaction property to ‘Required’. This means that package initiate the transaction. Now all its child whether sequence container or task would join this transaction. If any task fails, everything would be roll back.

Scenario 2: Transaction property set to ‘Supported’ at parent level

If the transaction property is set to ‘Supported’, and if any task fails, then only failed task is rolled back, rest of the tasks would be committed

Scenario 3: Transaction property not configured at parent level but is set to ‘Required’ at child level

In this scenario, the child will initiate the transaction.

 

Playing with SSIS is fun

Peculiar behavior of Merge Join Transformation

Hi guys

We have been using Merge join transformation to merge the data from two different sources. I would like to tell you few points that needs to be taken care of while using merge transformations:

  • We know that for merge transformations, the two inputs needs to be sorted. For that, we set IsSortProperty of the source to True and set SortKeyPosition of the input column as well on which we want to sort the data. However, this configuration does not suffice the Sort criteria. Merge join actually needs the data itself to be sorted. So do remember to add ‘Order By‘ clause in your source query
  • If you have an Identity column in your destination, then make sure the data type of the column is ‘int‘ not ‘bigint’. If the data type of the Identity column is bigint, then merge join will fail.

Enjoy guys!!

Happy Coding!!