Friday, 24 November 2017

SSIS Issues : A Day of Learning

Seldom are the days when you run into complex issues but resolve them in the shortest interval of time.

Thanks to Larry Page and my fellow developers :-).

But jokes apart I learned something new today and that always feel wonderful.

Issue #1:- Unable to enlist in the distributed transaction

Symptom:- Running a package (having sequence container and TransactionOption property set to  required) on your local machine

Solution 
#1: Activate all the options for MSDTC (Distributed Transaction Cordinator)
#2: Set the TransactionOption property to Supported


Issue #2:- The OLE DB Source.Outputs[OLE DB Source Output].Columns[XXXXXXXX] on the non-error output has no corresponding output

Symptom:- You will get the error message on OLEDB Source particularly when you add extra columns after the initial design/development of the package.

Solution
Remove the OLEDB Source and drag n drop a new one.


"Life is 10 % what happens and 90 % of how you react to it."

Thanks
Lokesh

https://www.linkedin.com/in/sqlbideveloper/

Thursday, 2 November 2017

SQL Query Experiment - "WHERE" & "ON"

Sometimes the mistakes you make in the coding lead you to a better understanding of a particular concept.

This happened recently with me. I accidentally replaced the 'WHERE' clause in a SQL query with "AND" and the results confused the heck out of me. 

Lets look at the coding part now. Suppose we have two tables #t1 and #t2. 

#t1 consists of id and name
#t2 consist of id and department

Now the requirement is to fetch name and department of id = 2. 

CREATE TABLE #T1 (ID INT, NAM VARCHAR(100))
INSERT INTO #T1 VALUES (1, 'AMIT')
INSERT INTO #T1 VALUES (2, 'AMITT')
INSERT INTO #T1 VALUES (3, 'AMITTT')

CREATE TABLE #T2 (ID INT, DEP VARCHAR(100))
INSERT INTO #T2 VALUES (1, 'HR')
INSERT INTO #T2 VALUES (2, 'FINANCE')

SELECT * FROM #T1 
LEFT JOIN #T2 ON #T1.ID = #T2.ID 
where #T1.ID = 2 

This is quite simple 






But what if the select query becomes 

SELECT * FROM #T1 
LEFT JOIN #T2 ON #T1.ID = #T2.ID 
AND #T1.ID = 2

Any guesses of what the result set would look like









Explanation:-
1. Its a left join and therefore all the rows from #t1 table are retrieved.
2. "ON" clause has 2 joins one on id column of table #t2 and another one on a integer value 2. This indirectly implies that the #t2 table is filtered out for a single row where id is 2.

Make sure to use where and on clause carefully.

Thanks
Lokesh








Tuesday, 6 June 2017

Be Aware of CARTESIAN PRODUCT When Using Join Keys in SQL

I got a massive satisfaction last week when I was able to bring down the execution time of a sql stored procedure from > 3 hours to mere 2 minutes. It all boiled down to a missing joining key that led to a cartesian product within the tables.

Here is how the basic structure of the query (there were plenty of joining columns, however for our convenience lets consider 2 columns only) looked like

Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                     left join table4 t4 on t1.k1 = t4.k1


And this query ran forever when given a particular set of parameters. I started with the approach of joining the parent table with each subsequent table until I met the road block. For example:-

Step1 Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
was returning rows

Step 2 
Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                             
was returning rows

Step 3
Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                     left join table4 t4 on t1.k1 = t4.k1
ran forever

As you can clearly notice, this query was missing a joining condition with column k2. On including the joining condition for column k2, query executed 1000 times quickly.

In the end, it turned out to be a bug or a blunder. But the takeaway is that whenever you have a query involving joins of multiple tables, always make sure that it does not lead to a cartesian product.

Happy coding!

Tuesday, 30 May 2017

Decrease Stored Proc Execution Time By Limiting the Table Size

Hi Guys,

Hope everyone is doing well. Last week, I was working on the most exciting task on which any database developer would love to work - Performance Tuning of a Stored Procedure.

Let me explain about the code that this stored procedure consisted of. The basic task was to truncate the data from an existing table and fill it with new data based on some really complex logic and calculations. There were various temp tables that were being used inside the subsequent steps of the stored procedure.

Whenever the stored procedure consists of various sub-steps, my approach is to get the execution time of each one of the sub-steps (by running it as an ad-hoc query) and find out the possible solutions to decrease the execution time of this step. And yes, you can also look at the Actual/Estimated execution plans and try to check whether the code written follows the best practices or not.

So, after knowing the execution time of all the steps, I began to focus on the step which was consuming more than 70 % of the execution time of the entire stored procedure. It looked something like this:

1. Creation of temp table say #temptb
2. Creation of another temp table  #temptb2 which uses the previously created temp table and a master table (consisting of 40 million rows). It involved a left join with the master table something like

#temptb left join master_table on #tempdb.key=master_table.key

What I did was limited this master_table into smaller version by using another temp table (#short_master_table where key in (select * from #temptb)) at the start of stored proc.

So instead of scanning all the 40 million rows, the query was now scanning some 1 million rows. Which reduced the execution time of the stored proc by 6 minutes approximately.

Hope this makes sense. Thanks for reading, please do share your views or opinions of how do you go about fine tuning a particular stored procedure or a SQL query!



Monday, 22 May 2017

Dump Data From SQL 2005 table to Excel 2016

By First looking at the topic name, does the task sound simple to you? Well, I thought so but it turned out to be quite a mammoth task with loads of learning.

The requirement is to transfer the data from a SQL Server table to an excel spreadsheet. The caveat is that it is not a one item activity. Instead, everytime we dump data into the spreadsheet the previous data should be overwritten. Before going into the solution, here is all that I tried with no luck:-


  1. Tried using BCP utility. It worked in transporting the data, however on opening the excel sheet you get some formatting error. 
  2. Using SSIS Package, I was not able to use Excel destination because it supports till office 2007 only.
  3. So had to use OLEDB destination by changing the configuration so that it picks up the destination excel file.
  4. Was not able to use the "Delete from [sheet$]" command (because we need to get rid of the data before the new data comes in) as it is not supported by Excel and OLEDB connection manager.
  5. Using something like "Update Sheet$ set field = Null" could be a workaroundound for using delete. However, the new data that comes in starts populating from the next row number of the data that was already in there. For example, if the excel file initially had 100 rows and if we had used update command to make the data blank, the next data load will start from row number 101. (However, this needs to be confirmed whether this behaviour is 100 percent true or not)

The Solution: Not a perfect one but does the job. I used File Process Task to archive the excel file once data load is done and create a new copy of the original file using a predefined template:

TargetFile - Excel file to which data needs to be uploaded
TargetFolder - Location of the target excel file.
ArchiveFolder - Folder having the target file template (or exact copy of the target file but empty)

  1.  Copy the existing file to an archived folder with today's date stamp.
  2. Delete the target file which has all the data in it.
  3. The archive folder will have a template for the original file meaning it has the same number of columns and the name of the file is exactly same as of target excel file.
  4. Copy this template file and put it into target destination folder.
  5. Use Data Flow to load the data 


Let me know your comments and how you would have approached the task?

Saturday, 11 March 2017

Run a Job on SQL Azure (Using SSIS Package)

Requirements: A valid azure account, Local Instance of SQL Server (>= 2012), SQL Server Data Tools, Excel 2007 or above

Scenario: Dump Excel File Contents to a Table in SQL Azure

Steps:
 
 Develop SSIS Package-

  1. Drag Data Flow Task on the control flow window and double click on it to open the Data flow window
  2. Drag Excel source and configure it to the source file and the worksheet.
  3. Drag ADO.NET Destination.
  4. Create a new ADO.NET connection by selecting provider - .NetProviders\SQLClient Data Provider
  5. Put in the server name of a valid SQL Azure database.
  6. User SQL Server Authentication and type in the username and password.
  7. Go to ALL window and set the properties
  8. Select the target database.
  9. Configure ADO.NET destination with this new connection manager.
  10. Point to the target table.
  11. Map the columns correctly.
  12. Save the package by specifying the password (using encryptsensitivewithpassword)
  13. Go to ProjectProperties --> Debugging and set the Run64bitruntime property to False
  14. Run the package and make sure it runs without any error.
Creating a SQL Job
  1. Open SSMS and connect to the local server instance.
  2. Go to Sql Server Agent and make sure Agent service is running.
  3. Go to the Jobs Node and Expand it.
  4. Create a new job and set a step using Integration services.
  5. Select the file System option and point to the SSIS Package.
  6. Go to Execution options and select 32 bit runtime.
  7. Set the owner of the job as 'sa'.
  8. Run the job and it would load the data from excel file stored in your local to the table residing in Azure database.
Let me know your suggestions/observations.

See ya. 




Friday, 23 September 2016

SQL QUERY NIGHTMARE

Last week, we faced a basic and yet the only major problem that sql developers/admin had to deal with and yes your guess is right, It was with SQL QUERY RUNNING SLOW.  I will be explaining the problem below, however I want everyone reading this to help me find answers because I still haven't found one!

One more amazing aspect of the issue was that it worked fairly quickly in Development but was running like a tortoise in Production even though the number of records were practically the same.

I'll start by throwing some light on to the execution plan, so that we narrow down to the code which was the culprit here. The code is enclosed into a stored proc. On contacting my dba to enquire about the issue, I got the following replay:-



I must hand it to our dba to provide such a detailed response. 

So my job was now to look at the code and modify it. What I did was to shrink the calender table into a temp table consisting of few rows that would speed up the operation. Below you would see the culprit code and a rectified version of it.

Culprit Code :-

-- Customer Table and latest advertiser with complete hierarichy details                                              
        INSERT  INTO reporting.Obtain_CustAM_Hier
                SELECT  cod.AdvertiserId ,
                        cod.SalesConsultantId AS end_employee_sid ,
                        COALESCE(st10.end_org_team_key, stmax.END_ORG_TEAM_KEY) end_org_team_key ,
                        COALESCE(st10.end_org_area_key, stmax.END_ORG_AREA_KEY) end_org_area_key ,
                        COALESCE(st10.end_org_region_key,
                                 stmax.END_ORG_REGION_KEY) end_org_region_key ,
                        COALESCE(st10.end_org_channel_key,
                                 stmax.END_ORG_CHANNEL_KEY) end_org_channel_key ,
                        COALESCE(st10.Cycle_Qtr, latest_cod.FIN_QTR_KEY) Cycle_Qtr ,
                        cod.FromDateKey ,
                        cod.ToDateKey
                FROM    #ADOWNER cod with (nolock)
                        INNER JOIN ( SELECT DISTINCT
                                            DT.FIN_QTR_KEY ,
                                            AdvertiserId ,
                                            MAX(fromdatekey) FROMDATEKEY ,
                                            MAX(ToDateKey) TODATEKEY
                                     FROM   ( SELECT    ADV.* ,
                                                        DT.FIN_QTR_KEY END_QTR
                                              FROM      ( SELECT
                                                              AdvertiserId ,
                                                              DT.FIN_QTR_KEY STRT_QTR ,
                                                              MAX(FromDateKey) AS fromdatekey ,
                                                              MAX(ToDateKey) AS ToDateKey
                                                          FROM
                                                              #ADOWNER AO
                                                              INNER JOIN CustomerMeasurementTool.workflow.Calendar DT ON DT.DATE_KEY = AO.FromDateKey
                                                              AND FIN_QTR_KEY >= 20131
                                                          GROUP BY AdvertiserId ,
                                                              DT.FIN_QTR_KEY
                                                        ) ADV
                                                        LEFT JOIN CustomerMeasurementTool.workflow.Calendar DT ON DT.DATE_KEY = ADV.ToDateKey
                                            ) AO
                                            LEFT JOIN CustomerMeasurementTool.workflow.Calendar DT ON DT.DATE_KEY BETWEEN fromdatekey AND ToDateKey
                                     WHERE  CALENDAR_DATE <= CONVERT(DATETIME, CONVERT(VARCHAR, GETDATE()
                                            - 1, 101))
                                     GROUP BY FIN_QTR_KEY ,
                                            AdvertiserId
                                   ) latest_cod ON cod.AdvertiserId = latest_cod.AdvertiserId
                                                   AND cod.FromDateKey = latest_cod.FROMDATEKEY
                        LEFT JOIN #ALL_CONSULATANTS_Prev st10 ON cod.SalesConsultantId = st10.end_employee_sid
                                                              AND latest_cod.FIN_QTR_KEY = st10.Cycle_Qtr
                        LEFT JOIN ( SELECT  EHD.EMPLOYEE_SID AS END_EMPLOYEE_SID ,
                                            EHD.ORG_TEAM_KEY AS END_ORG_TEAM_KEY ,
                                            EHD.ORG_AREA_KEY AS END_ORG_AREA_KEY ,
                                            EHD.ORG_REGION_KEY AS END_ORG_REGION_KEY ,
                                            EHD.ORG_CHANNEL_KEY AS END_ORG_CHANNEL_KEY
                                    FROM    [BIStage].[dbo].[EMPLOYEE_HIST_DIM_V] EHD with (nolock)
                                            INNER JOIN ( SELECT
                                                              EMPLOYEE_SID ,
                                                              MAX(BEGIN_DATE_KEY) AS BEGIN_DATE_KEY
                                                         FROM [BIStage].[dbo].[EMPLOYEE_HIST_DIM_V] with (nolock)
                                                         WHERE
                                                              POSITION_KEY > 0
                                                         GROUP BY EMPLOYEE_SID
                                                       ) EMP ON EHD.EMPLOYEE_SID = EMP.EMPLOYEE_SID
                                                              AND EHD.BEGIN_DATE_KEY = EMP.BEGIN_DATE_KEY
                                  ) stmax ON cod.SalesConsultantId = stmax.END_EMPLOYEE_SID; 
        
      

CHANGED CODE :-

I have changed the yellowed part of the code and insert new code higlighted in green as :-


-- Temp Tables to build a mini version of the Calender Table.
select * into #Mini_Cal from CustomerMeasurementTool.workflow.Calendar where FIN_QTR_KEY >= 20131

SELECT DISTINCT
                                            DT.FIN_QTR_KEY ,
                                            AdvertiserId ,
                                            MAX(fromdatekey) FROMDATEKEY ,
                                            MAX(ToDateKey) TODATEKEY into #latest_cod
                                     FROM   ( SELECT    ADV.* ,
                                                        DT.FIN_QTR_KEY END_QTR
                                              FROM      ( SELECT
                                                              AdvertiserId ,
                                                              DT.FIN_QTR_KEY STRT_QTR ,
                                                              MAX(FromDateKey) AS fromdatekey ,
                                                              MAX(ToDateKey) AS ToDateKey
                                                          FROM
                                                              #ADOWNER AO
                                                              INNER JOIN #Mini_Cal DT ON DT.DATE_KEY = AO.FromDateKey
                                                             
                                                          GROUP BY AdvertiserId ,
                                                              DT.FIN_QTR_KEY
                                                        ) ADV
                                                        LEFT JOIN #Mini_Cal DT ON DT.DATE_KEY = ADV.ToDateKey
                                            ) AO
                                            LEFT JOIN #Mini_Cal DT ON DT.DATE_KEY BETWEEN fromdatekey AND ToDateKey
                                     WHERE  CALENDAR_DATE <= CONVERT(DATETIME, CONVERT(VARCHAR, GETDATE()
                                            - 1, 101))
                                     GROUP BY FIN_QTR_KEY ,
                                            AdvertiserId

     
-- Customer Table and latest advertiser with complete hierarichy details                                              
        INSERT  INTO reporting.Obtain_CustAM_Hier
                SELECT  cod.AdvertiserId ,
                        cod.SalesConsultantId AS end_employee_sid ,
                        COALESCE(st10.end_org_team_key, stmax.END_ORG_TEAM_KEY) end_org_team_key ,
                        COALESCE(st10.end_org_area_key, stmax.END_ORG_AREA_KEY) end_org_area_key ,
                        COALESCE(st10.end_org_region_key,
                                 stmax.END_ORG_REGION_KEY) end_org_region_key ,
                        COALESCE(st10.end_org_channel_key,
                                 stmax.END_ORG_CHANNEL_KEY) end_org_channel_key ,
                        COALESCE(st10.Cycle_Qtr, latest_cod.FIN_QTR_KEY) Cycle_Qtr ,
                        cod.FromDateKey ,
                        cod.ToDateKey
                FROM    #ADOWNER cod with (nolock)
                        INNER JOIN  #latest_cod latest_cod ON cod.AdvertiserId = latest_cod.AdvertiserId
                                                   AND cod.FromDateKey = latest_cod.FROMDATEKEY
                        LEFT JOIN #ALL_CONSULATANTS_Prev st10 ON cod.SalesConsultantId = st10.end_employee_sid
                                                              AND latest_cod.FIN_QTR_KEY = st10.Cycle_Qtr
                        LEFT JOIN ( SELECT  EHD.EMPLOYEE_SID AS END_EMPLOYEE_SID ,
                                            EHD.ORG_TEAM_KEY AS END_ORG_TEAM_KEY ,
                                            EHD.ORG_AREA_KEY AS END_ORG_AREA_KEY ,
                                            EHD.ORG_REGION_KEY AS END_ORG_REGION_KEY ,
                                            EHD.ORG_CHANNEL_KEY AS END_ORG_CHANNEL_KEY
                                    FROM    [BIStage].[dbo].[EMPLOYEE_HIST_DIM_V] EHD with (nolock)
                                            INNER JOIN ( SELECT
                                                              EMPLOYEE_SID ,
                                                              MAX(BEGIN_DATE_KEY) AS BEGIN_DATE_KEY
                                                         FROM [BIStage].[dbo].[EMPLOYEE_HIST_DIM_V] with (nolock)
                                                         WHERE
                                                              POSITION_KEY > 0
                                                         GROUP BY EMPLOYEE_SID
                                                       ) EMP ON EHD.EMPLOYEE_SID = EMP.EMPLOYEE_SID
                                                              AND EHD.BEGIN_DATE_KEY = EMP.BEGIN_DATE_KEY

                                  ) stmax ON cod.SalesConsultantId = stmax.END_EMPLOYEE_SID;



Apologies for such a messy code, however if you are able to follow or understand the code (by going through highlighted rows) you will find that I have done following things:-
1. Limited the Calendar table.
2. Taken the complicated code out and put it into a temp table.


The above changes did the trick and the stored proc ran within the SLA of 6 minutes (was taking 3 hours on that fatal day).

However the query which remained unanswered is why the stored proc ran much faster in our DEV environment than compared to PROD environment.

I know I may not have given the complete info, please ask questions to solve this conundrum!


SSIS Issues : A Day of Learning

Seldom are the days when you run into complex issues but resolve them in the shortest interval of time. Thanks to Larry Page and my fello...