Monthly Archives: March 2013

Using Data Profiling Task

Sometimes, you may need to review the data source before initiating processing of the data from that source. You may want to review the source for NULL values in columns, distinct rows etc. This initial analysis of data before processing it may help the developers in building the logic based on that data. To perform such an analysis, Data Profiling Task of SSIS can be used.

To use Data Profiling Task, drag the same from SSIS Tool Box to the Control flow. Double click the task and following window will be displayed:

Untitled

Click the drop down ,create a new File connection as shown below and Click Ok:

Untitled4

Now, since the destination connections are set, we will proceed to set the Requests.

Note: Requests indicate the different ways in which we want to analyze source.

Click the “Quick Profile” button shown in below image:

Untitled1

Configure settings as shown in below figure:

Untitled2

Now click Ok and a dialogue box will open showing the requests selected for analyzing data. Refer figure below:

Untitled3

Now run the Task. Once the execution is completed go to Data Profiler Viewer in Windows or double click the Data Profiling Task and open Data Profiler Viewer as shown:

Untitled5

After clicking the button, following window will open, Click Open as shown in below screenshot:

Untitled7

Browse to the location of XML file as specified in Destination of Data Profile Task. Click OK and following window will open:

Untitled6

Now select any of the options pointed in above figure and fetch the source information.

Like the page http://www.facebook.com/pages/SSISBlog/200678990060457  if the blog helps you in any aspect

Happy Coding!! Cheers!

Using File System Task

Sometimes, we need to copy files from one drive to another. One approach is you can perform a manual copy-paste. Now since it is not automated and in future you may need to do same task, it will kill your time. Other approach is to use a File System Task and develop a small ETL package to perform File Transfer.

Let’s proceed with a simple example. Create\ Populate a table in SQL Server using following script:

–Creating a table

 

CREATE TABLE [FileInfoTable]

(

[Source] varchar(50),

[FileName] varchar(20),

[Destination] varchar(50),

)

 

–Populating a table

INSERT INTO [FileInfoTable]

VALUES

(‘F:\S66\SOURCE’,’DG.txt’,’F:\D67′),

(‘F:\S66\SOURCE’,’HR.txt’,’F:\D67′),

(‘F:\S66\SOURCE’,’AB.txt’,’F:\D67′),

(‘F:\S66\SOURCE’,’DG.txt’,’F:\D68′),

(‘F:\S66\SOURCE’,’HR.txt’,’F:\D68′),

(‘F:\S66\SOURCE’,’AB.txt’,’F:\D68′),

(‘F:\S67\SOURCE’,’DG.txt’,’F:\D69′),

(‘F:\S67\SOURCE’,’HR.txt’,’F:\D69′),

(‘F:\S67\SOURCE’,’AB.txt’,’F:\D69′)

 

 

Create 2 folders in the F drive S66 and S77 and create some text files say DG.txt, HR.txt, AB.txt. This is just an example; you can create folders in any drive and have different files in it. Be sure to populate the table with the proper file paths.

Similarly create destinations where you want to dump these files. In my case, I have created 3 folders D67, D68, D69 under F: drive

Building SSIS package:

Below is the screenshot of SSIS package:

Untitled1

‘GetFileInfoTable’ will fetch the records from table created previously, store it in an Object and provide the same to For Each Loop Container via Object in the format below:

SourceName Destination
F:\S66\SOURCE\DG.txt F:\D67
F:\S66\SOURCE\HR.txt F:\D67
F:\S66\SOURCE\AB.txt F:\D67
F:\S66\SOURCE\DG.txt F:\D68
F:\S66\SOURCE\HR.txt F:\D68
F:\S66\SOURCE\AB.txt F:\D68
F:\S67\SOURCE\DG.txt F:\D69
F:\S67\SOURCE\HR.txt F:\D69
F:\S67\SOURCE\AB.txt F:\D69

One by one these records will be fetched by For Each Loop (stored in 2 SSIS variables SourceName and DestinationName) and be passed to File System Task.

Note: For passing an Object and configuring For Each Loop container please see https://ssisblog.wordpress.com/2013/03/06/working-with-loops-in-ssis/

Double click the FileSystemTask and set the properties as:

Untitled2

Run the package and check the destination. The files will be copied to the destination.

 

Leave a comment if it is helping you in any aspect.

Cheers! Happy Coding!!

Code to divide a column data into Range

DECLARE @NoOfIterations int, –Indicates in how many chunks you want to pull the data
@RangeAdd int,
@ColSIDStart int,
@ColSIDEnd int,
@Counter int ,
@MaxColSID int,
@MinColSID int

DECLARE @ColSIDRANGETBL TABLE
(
Iteration int,
ColSIDStart int,
ColSIDEnd int
)

SET @NoOfIterations = ?
SET @Counter = 1

SELECT @MinColSID = MIN(ColSID) FROM [dbo].[Table] (NOLOCK)
SELECT @MaxColSID = MAX(ColSID) FROM [dbo].[Table]] (NOLOCK)

SET @RangeAdd = (@MaxColSID – @MinColSID)/@NoOfIterations

SET @ColSIDStart = @MinColSID
SET @ColSIDEnd = @ColSIDStart + @RangeAdd

WHILE ( (@Counter<=@NoOfIterations) and (@ColSIDStart <= @ColSIDEnd) )
BEGIN
INSERT INTO @ColSIDRANGETBL
SELECT @Counter,@ColSIDStart,@ColSIDEnd

SET @ColSIDStart = @ColSIDEnd + 1
SET @ColSIDEnd = @ColSIDStart + @RangeAdd

IF(@ColSIDEnd>@MaxColSID)
BEGIN
SET @ColSIDEnd = @MaxColSID
END
SET @Counter = @Counter + 1
END

SELECT ColSIDStart, ColSIDEnd FROM @ColSIDRANGETBL

Working with Loops in SSIS

I have observed one peculiar issue while pulling data from source in one shot i.e. SSIS throws “Connection time out” error after pulling ~80% of the total records. To rectify this issue, I started pulling data in chunks. There, I made use of “For loop Container” which I am going to explain below:

Let’s say I need to pull data on the basis of ColSID (assuming ColSID is the name of integer column) and I have 100 records in source. This is just an example but in real time you may millions of records. For Loop Container shall accept the format as:

ColSIDStartRange ColSIDEndRange
1 20
21 40
41 60
61 80
81 100

Now, you have to write some logic and divide the ColSID column in a format acceptable by For Loop Container i.e. format similar to above mentioned table.

Note: For dividing the ColSID into table format please refer ‘SQL Helper’ section in my blog.

Below screenshot depicts the sample flow of execution:

Untitled1

Set SQL Task properties:

Double click SQL Task and set the properties as shown in below screenshot:

Untitled3

Go to Result Set and select the Range Variable name and set it as:

Untitled4

 

Now, since you have the format available, you will pass this format to For Loop container by selecting few properties as:

Untitled2

Note: The variable in which you are storing the Range should be of OBJECT Type. In our case it is ColSIDRange

Click ‘Variable Mappings’ and set the variables as shown in below screenshot:

Untitled5

Note: ColSIDStart and ColSIDEnd are integer variables.

Now double click the DFT task inside For Loop Container and set the properties as:

Untitled6

 

Now click the ‘Parameters’ tab and select parameters as:

Untitled7

 

Click OK and everything is set for you.

Map the source columns with your destination and start pulling the data.

 

Leave a comment if you like the blog and if it is helping you in any respect.

Happy Coding Cheers!!