Dev's Guide to SQL Server: TSQL

Showing posts with label TSQL. Show all posts

Tuesday, April 11, 2017

THROW versus RAISERROR

Error handling is just as important as verifying that your code runs and produces correct results. For developers, error handling is last chance to prepare code to fail gracefully instead of just blowing up and having to pick up pieces later.

When it comes to SQL Server, there are few options for error handling. Basics of error handling are wrapping the code that is likely to fail inside the TRY block and then dealing with failures inside the CATCH block.

Let's look at a most basic scenario below.

 
SELECT 1 / 0

SELECT 1

The first statement generates the error division by 0, but the second statement still executed as nothing is controlling what happens when errors encountered. In some cases, this is perfectly fine, but in others, we want to stop the execution of our script in the event of encountered errors. We might even want to roll back the transaction to prevent partial data change.

Now if we take the same code and place it inside TRY block we get different behavior. In this case, the first statement generates the error but the second statement is not run.

 
BEGIN TRY
SELECT 1 / 0

SELECT 1
END TRY
BEGIN CATCH

END CATCH

Now that we covered the basics let's look at controlling a flow of the code through the use of RAISERROR and THROW. In some cases, we need to fail code based on business rules which pass syntax test. For example, if our procedure is responsible for withdrawing money from an account, we might have a business rule that prevents going into negative. For the first example, we are just going to try to withdraw money.

 
CREATE PROCEDURE dbo.WithdrawMoney
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        UPDATE  dbo.Accounts
        SET Amount = Amount - @Amount
        WHERE   AccountID = @AccountId
    END

In this example we can keep withdrawing money until arighmetic overflow error will be thrown for creating large negative number. Since we want to avoid negative balance we are going to add code to where clause.

 
CREATE PROCEDURE dbo.WithdrawMoney
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        UPDATE  dbo.Accounts
        SET Amount = Amount - @Amount
        WHERE   AccountID              = @AccountId
                AND (Amount - @Amount) > 0
    END

Now, whenever you try to withdraw money that results in negative amount update will "fail." This failure is only reflected by the number of rows affected. In this instance, it is zero. Now we need something in front end application that reads the number of rows affected by stored procedure to understand what happened.

Another way of handling this would be to run a check first and to generate the error that is sent back to front end. So we write two stored procedures one using RAISERROR and another one using THROW and expect the same result.

 
CREATE PROCEDURE dbo.WithdrawMoney_RAISERROR
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        IF EXISTS
            (
                SELECT  1
                FROM    dbo.Accounts
                WHERE   AccountID              = @AccountId
                        AND (Amount - @Amount) > 0
            )
            RAISERROR( 'Insufficient Amount in account to perform withdrawal', 16, 1 )

        UPDATE  dbo.Accounts
        SET Amount = Amount - @Amount
        WHERE   AccountID = @AccountId
    END
GO
CREATE PROCEDURE dbo.WithdrawMoney_THROW
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        IF EXISTS
            (
                SELECT  1
                FROM    dbo.Accounts
                WHERE   AccountID              = @AccountId
                        AND (Amount - @Amount) > 0
            )
            THROW 51000, 'Insufficient Amount in account to perform withdrawal', 1

        UPDATE  dbo.Accounts
        SET Amount = Amount - @Amount
        WHERE   AccountID = @AccountId
    END

Upon executing the first procedure, we get the error message back to the front end, but after checking balance, we find that money withdrawn from the account, but in the case of the second procedure, the same error returned to the front end but money still there.

Now we begin to scratch our head trying to figure out why we lost the money even though we got errors in both cases. The truth behind is the fact that RAISERROR does not stop the execution of code if it is outside of TRY CATCH block. To get same behavior out of RAISERROR, we would need to rewrite procedure to look something like following example.

 
CREATE PROCEDURE dbo.WithdrawMoney_RAISERROR
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        BEGIN TRY
            IF EXISTS
                (
                    SELECT  1
                    FROM    dbo.Accounts
                    WHERE   AccountID              = @AccountId
                            AND (Amount - @Amount) > 0
                )
                RAISERROR( 'Insufficient Amount in account to perform withdrawal', 16, 1 )


            UPDATE  dbo.Accounts
            SET Amount = Amount - @Amount
            WHERE   AccountID = @AccountId
        END TRY
        BEGIN CATCH
            DECLARE @ErrorMessage  VARCHAR(4000)
                  , @ErrorSeverity INT
                  , @ErrorState    INT

            SELECT  @ErrorMessage  = ERROR_MESSAGE()
                  , @ErrorSeverity = ERROR_SEVERITY()
                  , @ErrorState    = ERROR_STATE()

            RAISERROR( @ErrorMessage, @ErrorSeverity, @ErrorState )
        END CATCH
    END

Now we added another complexity to the procedure because if we wrap RAISERROR in TRY CATCH block, we suppress the error and the front end left clueless as to what happened inside stored procedure. Therefore we need to "rethrow" it by catching the information about it and doing another RAISERROR. If we choose to do the same with THROW, it is a whole lot simpler. Just look at the next example.

 
CREATE PROCEDURE dbo.WithdrawMoney_THROW
    @AccountId INT
  , @Amount    DECIMAL(18, 2)
AS
    BEGIN
        BEGIN TRY
            IF EXISTS
                (
                    SELECT  1
                    FROM    dbo.Accounts
                    WHERE   AccountID              = @AccountId
                            AND (Amount - @Amount) > 0
                )
                THROW 51000, 'Insufficient Amount in account to perform withdrawal', 1


            UPDATE  dbo.Accounts
            SET Amount = Amount - @Amount
            WHERE   AccountID = @AccountId
        END TRY
        BEGIN CATCH
            THROW
        END CATCH
    END

In this case, we simply add THROW to CATCH block which acts as "RETHROW." Unfortunately for me, I write most of the code for SQL Server 2008 R2 which did not have THROW command. It is only available starting with SQL Server 2012. Therefore I have to use RAISERROR and always remember to wrap it inside TRY CATCH block to ensure that it always stops execution when I want it to do so.

Be sure to leave a comment and thank you so much for reading.

Tuesday, March 21, 2017

What kind of JOIN syntax is this?

I am not often puzzled when looking at TSQL syntax, but this one time I could not figure out the syntax. Syntax looked similar to the example below.

 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN dbo.Student AS s
JOIN dbo.Grades AS gr
    ON s.ID = gr.StudentID
       AND  gr.GradeLetter = 'A'
    ON s.TeacherID = t.ID

My first gut reaction was that this code is broken and would not run. To my amazement code ran just fine. Now came the hard part, which was to figure out what the code was doing because I have never seen this syntax before. Since I did not understand what I was looking at I could not BING "weird join syntax" to get an answer. As a developer, I learned long time ago to break down code into smallest possible chunks to get the answer.

After I have figured out the relationship between tables, I was able to understand what query was doing. To be able to read query better it can be rewritten in the following way.

 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN
            (
                SELECT  s.*
                      , gr.*
                FROM    dbo.students AS s
                JOIN dbo.Grades AS gr
                    ON s.ID                = gr.StudentID
                       AND  gr.GradeLetter = 'A'
            ) AS s
    ON s.TeacherID = t.ID

This code is much easier to read and to understand. Students filtered by Grades and we are only interested in Students who have "A" grade. Therefore we get all records from Teacher table but only get Students who have good grades. If we would try to rewrite the query in the following matter, we would only get Teachers that have Students with perfect grades as we specified inner join; therefore, it takes precedence over the left outer join.

 
SELECT  t.*
      , s.*
FROM    Teacher AS t
LEFT OUTER JOIN dbo.Student AS s
    ON s.TeacherID         = t.ID
JOIN dbo.Grades AS gr
    ON s.ID                = gr.StudentID
       AND  gr.GradeLetter = 'A'

Now that I knew what code was doing I wanted to know performance because I do not care how the code looks as long as it does not perform fast it is useless. The execution plans of the two queries are almost identical with the only difference being that when I rewrote the original query as sub-query, I got additional operation added to the plan. Physical joins were identical and execution time was almost exact.

I could not believe I learned something new after writing T-SQL for many years. All of the sudden I got another tool in my toolbelt to write better code. Now if I ever need to filter tables that should not effect result I can write with this new syntax. To be honest, I have not taken this to the extreme to see how many JOIN operations I can nest before it becomes unreadable. However, I am sure it would not take many before someone else could not figure out what was going on and would have to break it down piece by piece to see it clear.

Just imagine having to support code that looks like example 1 below instead of example 2. Even though the code in the following case produces same execution plan, it becomes hard to read and therefore hard to maintain, and if you have junior developers who are not strong in SQL, they are more likely to make mistakes.

 
--Example 1:
SELECT  t.FullName AS TeacherName
      , s.FullName AS StudentName
      , gr.Class
FROM    dbo.Teacher AS t
JOIN dbo.Student AS s
JOIN dbo.Grades AS gr
JOIN dbo.Class AS c
JOIN dbo.Room AS r
    ON c.ID = gr.ClassID
    ON r.ClassID = c.ID
    ON s.ID = gr.StudentID
    ON s.TeacherID = t.ID

--Example 2:
SELECT  t.FullName AS TeacherName
      , s.FullName AS StudentName
      , gr.Class
FROM  dbo.Teacher AS t
JOIN dbo.Student AS s
    ON t.ID = s.TeacherID
JOIN dbo.Grades AS gr
    ON s.ID = gr.StudentID
JOIN dbo.Class AS c
    ON c.ID = r.ClassID
JOIN dbo.Room AS r
    ON c.ID = gr.ClassID

P.S. I still don't know if this JOIN syntax has special name, but at least I now know what it does.

Thursday, December 29, 2016

Date Dimension aka Calendar Table

Before we get into discussing how to create it date dimension and how to use it, first let's talk about what it is and why do we need it. Depending on who you talk to, people can refer to this concept as "Calendar table" or "Date Dimension," which is usually found in Data Warehouse. No matter how it is called, at the end of the day, it is a table in SQL Server which is populated with different date/calendar related information to help speed up SQL queries which require specific parts of dates.

In my case, I have created it to be able to aggregate data by quarters, years and month. Depending on how large your requirements are it will add additional complexity to building it. Since I don't care about holidays (for now at least), I will not be creating holiday schedule which can be complicated to populate.

Now onto writing some SQL. Step 1: Create a table and add "covered" Nonclustered index that I need for my queries. Depending on your needs you will need to create additional indexes and/or modify the one I'm creating.

 
CREATE TABLE [dbo].[DateDimension]
    (
        [DateKey]             [INT]         NOT NULL
      , [Date]                [DATE]        NOT NULL
      , [Day]                 [TINYINT]     NOT NULL
      , [DaySuffix]           [CHAR](2)     NOT NULL
      , [Weekday]             [TINYINT]     NOT NULL
      , [WeekDayName]         [VARCHAR](10) NOT NULL
      , [IsWeekend]           [BIT]         NOT NULL
      , [DOWInMonth]          [TINYINT]     NOT NULL
      , [DayOfYear]           [SMALLINT]    NOT NULL
      , [WeekOfMonth]         [TINYINT]     NOT NULL
      , [WeekOfYear]          [TINYINT]     NOT NULL
      , [Month]               [TINYINT]     NOT NULL
      , [MonthName]           [VARCHAR](10) NOT NULL
      , [Quarter]             [TINYINT]     NOT NULL
      , [QuarterName]         [VARCHAR](6)  NOT NULL
      , [Year]                [INT]         NOT NULL
      , [MMYYYY]              [CHAR](6)     NOT NULL
      , [MonthYear]           [CHAR](7)     NOT NULL
      , [FirstDayOfMonth]     [DATE]        NOT NULL
      , [LastDayOfMonth]      [DATE]        NOT NULL
      , [FirstDayOfQuarter]   [DATE]        NOT NULL
      , [LastDayOfQuarter]    [DATE]        NOT NULL
      , [FirstDayOfYear]      [DATE]        NOT NULL
      , [LastDayOfYear]       [DATE]        NOT NULL
      , [FirstDayOfNextMonth] [DATE]        NOT NULL
      , [FirstDayOfNextYear]  [DATE]        NOT NULL ,
    )

GO

ALTER TABLE dbo.DateDimension
ADD CONSTRAINT PK_DateDimension
    PRIMARY KEY CLUSTERED (DateKey)

IF NOT EXISTS
    (
        SELECT  1
        FROM    sys.indexes AS i
        WHERE   i.name = 'IX_DateDimension_Date'
    )
    CREATE NONCLUSTERED INDEX [IX_DateDimension_Date]
        ON dbo.DateDimension ([Date])
        INCLUDE ([Month], [Quarter], [Year])
GO

The reason why I chose to include Month, Quarter and Year columns in my index is based on my requirements to aggregate data by those columns. Originally I did not have any columns included, which produced Key Lookup. Now to populate table I have taken some suggestions from Aaron's script, link at the bottom of the post. In my case, I'm truncating table and limiting it to 300 years which is more than plenty for my scenario.

 
TRUNCATE TABLE dbo.DateDimension

DECLARE @StartDate     DATE = '01/01/1900'
      , @NumberOfYears INT  = 300;

DECLARE @CutoffDate DATE = DATEADD( YEAR, @NumberOfYears, @StartDate );

-- prevent set or regional settings from interfering with 
-- interpretation of dates / literals
SET DATEFIRST 7;
SET DATEFORMAT MDY;
SET LANGUAGE US_ENGLISH;
-- Thanks to Aaron for this hint

WITH Pass0 AS
    (
        SELECT  1 AS C
        UNION ALL
        SELECT  1
    )
   , Pass1 AS
    (
        SELECT  1 AS C
        FROM    Pass0 AS A
              , Pass0 AS B
    )
   , Pass2 AS
    (
        SELECT  1 AS C
        FROM    Pass1 AS A
              , Pass1 AS B
    )
   , Pass3 AS
    (
        SELECT  1 AS C
        FROM    Pass2 AS A
              , Pass2 AS B
    )
   , Pass4 AS
    (
        SELECT  1 AS C
        FROM    Pass3 AS A
              , Pass3 AS B
    )
   , Pass5 AS
    (
        SELECT  1 AS C
        FROM    Pass4 AS A
              , Pass4 AS B
    )
   , Pass6 AS
    (
        SELECT TOP (DATEDIFF( DAY, @StartDate, @CutoffDate ))
            rn = ROW_NUMBER() OVER (ORDER BY Pass5.C)
        FROM    Pass5
    )
   , Pass7 AS
    (
        SELECT  a.Date
              , [day]          = DATEPART( DAY, a.Date )
              , [week]         = DATEPART( WEEK, a.Date )
              , [month]        = DATEPART( MONTH, a.Date )
              , [quarter]      = DATEPART( QUARTER, a.Date )
              , [year]         = DATEPART( YEAR, a.Date )
              , [DayOfWeek]    = DATEPART( WEEKDAY, a.Date )
              , [FirstOfMonth] = CONVERT( DATE, DATEADD( MONTH, DATEDIFF( MONTH, 0, a.Date ), 0 ))
              , FirstOfYear    = CONVERT( DATE, DATEADD( YEAR, DATEDIFF( YEAR, 0, a.Date ), 0 ))
              , [MonthName]    = DATENAME( MONTH, a.Date )
              , Style101       = CONVERT( CHAR(10), a.Date, 101 )
              , Style112       = CONVERT( CHAR(8), a.Date, 112 )
        FROM
            (
                SELECT  [Date] = DATEADD( DAY, Pass6.rn - 1, @StartDate )
                FROM    Pass6
            ) a
    )
INSERT  dbo.DateDimension WITH (TABLOCKX)
SELECT  DateKey             = CONVERT( INT, CONVERT( CHAR(8), pass7.Date, 112 ))
      , [Date]              = pass7.Date
      , [Day]               = CONVERT( TINYINT, pass7.day )
      , DaySuffix           = CONVERT(   CHAR(2)
                                       , CASE
                                             WHEN pass7.day / 10 = 1
                                                 THEN 'th'
                                         ELSE CASE RIGHT(pass7.day, 1)
                                                  WHEN '1'
                                                      THEN 'st'
                                                  WHEN '2'
                                                      THEN 'nd'
                                                  WHEN '3'
                                                      THEN 'rd'
                                              ELSE 'th'
                                              END
                                         END
                                     )
      , [Weekday]           = CONVERT( TINYINT, pass7.DayOfWeek )
      , [WeekDayName]       = CONVERT( VARCHAR(10), DATENAME( WEEKDAY, pass7.Date ))
      , [IsWeekend]         = CONVERT(   BIT
                                       , CASE
                                             WHEN pass7.DayOfWeek IN (
                                                                         1, 7
                                                                     )
                                                 THEN 1
                                         ELSE 0
                                         END
                                     )
      , [DOWInMonth]        = CONVERT(   TINYINT
                                       , ROW_NUMBER() OVER (PARTITION BY pass7.FirstOfMonth
                                                                       , pass7.DayOfWeek
                                                            ORDER BY pass7.Date
                                                           )
                                     )
      , [DayOfYear]         = CONVERT( SMALLINT, DATEPART( DAYOFYEAR, pass7.Date ))
      , WeekOfMonth         = CONVERT(   TINYINT
                                       , DENSE_RANK() OVER (PARTITION BY pass7.year
                                                                       , pass7.month
                                                            ORDER BY pass7.week
                                                           )
                                     )
      , WeekOfYear          = CONVERT( TINYINT, pass7.week )
      , [Month]             = CONVERT( TINYINT, pass7.month )
      , [MonthName]         = CONVERT( VARCHAR(10), pass7.MonthName )
      , [Quarter]           = CONVERT( TINYINT, pass7.quarter )
      , QuarterName         = CONVERT(   VARCHAR(6)
                                       , CASE pass7.quarter
                                             WHEN 1
                                                 THEN 'First'
                                             WHEN 2
                                                 THEN 'Second'
                                             WHEN 3
                                                 THEN 'Third'
                                             WHEN 4
                                                 THEN 'Fourth'
                                         END
                                     )
      , [Year]              = pass7.year
      , MMYYYY              = CONVERT( CHAR(6), LEFT(pass7.Style101, 2) + LEFT(pass7.Style112, 4))
      , MonthYear           = CONVERT( CHAR(7), LEFT(pass7.MonthName, 3) + LEFT(pass7.Style112, 4))
      , FirstDayOfMonth     = pass7.FirstOfMonth
      , LastDayOfMonth      = MAX( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.month
                                                     )
      , FirstDayOfQuarter   = MIN( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.quarter
                                                     )
      , LastDayOfQuarter    = MAX( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.quarter
                                                     )
      , FirstDayOfYear      = pass7.FirstOfYear
      , LastDayOfYear       = MAX( pass7.Date ) OVER (PARTITION BY pass7.year)
      , FirstDayOfNextMonth = DATEADD( MONTH, 1, pass7.FirstOfMonth )
      , FirstDayOfNextYear  = DATEADD( YEAR, 1, pass7.FirstOfYear )
FROM    Pass7

Few things to consider in the script. Depending on requirements and local setting, part of the script would need to be adjusted to reflect correct data. For example, Sunday could the first day of the week in some countries and last day of the week in other coutnries. Same goes for date formats, in United States we have Month/Day/Year back home in Russia I always wrote Day/Month/Year which could create large issues when trying to compare data stored in one format to date dimension that is stored in a different format. That is why the following two commands are so important SET DATEFIRST 7 and SET DATEFORMAT MDY;

Now that table is populated you too can run aggregate queries without having to generate data at run time. Enjoy!

References:

Bertrand, Aaron. "Creating a Date Dimension or Calendar Table in SQL Server." Creating a Date Dimension or Calendar Table in SQL Server. N.p., 20 Oct. 2015. Web. 29 Dec. 2016. <https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/>.

Tuesday, August 30, 2016

SQL Joins - Basics Part 2

Having already covered the basics of SQL Join syntax, now it is time to jump into a little more advanced stuff. Early in my career, I remember being confused about the difference in results when the same clause is placed in JOIN instead of WHERE clause. This post is aimed to clarify those questions and few others around ON and WHERE clauses. For demonstration I will use table built and populated in SQL Joins - Basics Part 1

First, let us look at basic LEFT join from Part 1. All rows are returned from Teacher table and all matching rows from Student table are displayed.

 
SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
LEFT OUTER JOIN Student s
        ON s.TeacherID = t.ID

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Roy Chad	Milburn Brett
Dudley Goddard	Clinton Schuylerr
Dudley Goddard	Norbert Kemp
Raphael Philander	Meriwether Kennedy
Raphael Philander	Braith Cornelius
Sonnie Davin	NULL

Now we introduce simple WHERE clause. By adding condition to WHERE clause we are now restricting entire results set to specific condition, any rows that do not satisfy that condition are excluded from results. Therefore we only end up with one row seen below.

 
SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
LEFT OUTER JOIN Student s
        ON s.TeacherID = t.ID
WHERE s.FullName = 'Jepson Balfour'

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour

So what happens when you move same clause to the ON clause? Ok, lets test it!

 
SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
LEFT OUTER JOIN Student s
        ON s.TeacherID = t.ID
  and s.FullName = 'Jepson Balfour'

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Dudley Goddard	NULL
Raphael Philander	NULL
Sonnie Davin	NULL

What happened? Result set looks nothing like first or second example. When I first did that I was thoroughly confused. To explain the result lets run another query.

 
SELECT s.FullName
FROM dbo.Student as s
WHERE s.FullName = 'Jepson Balfour'

StudentFullName
Jepson Balfour

Even though last two queries look different in reality they are placing exactly same restriction on Student table. On clause on Student table became where clause that restricts results to only rows that specific that critirea. Since Teacher table is not joined with INNER join it is not restricted by what happens to Student therefore we see all Teachers displyaed but only one of them actually showing a student.

Conclusion:

The ON clause is a powerfull way to change your result set exactly to what you need it to be, but if used without understanding of what happens to the data it can produce unpredicted result set. Each statement placed in ON clause will be evaulated prior to WHERE clause. This goes back to understanding order of operations in SQL Server. Below are just few of the operations listed in correct order. By knowing and understanding the order of operations in SQL Server it helps to understand why queries behaved the way that they did above. Each one was evaluated by SQL Server in correct order which produced correct output based on that structure.

FROM

ON

WHERE

SELECT

As always feel free to leave comments, questions, etc. For next post I will answer common question of TOP clause.

Tuesday, August 23, 2016

NOT IN - NOT EXISTS - LEFT JOIN - EXCEPT

Today I stumbled upon a blog post that pointed out how inefficient NOT IN clause and comment was made to use LEFT OUTER JOIN because it is "faster". They showed execution plans of both and SSMS query cost from Actual Execution plan. The first thing that jumped out at me were the Table scans that were present in both plans. Since I did some reading previously on how these suppose to yield almost similar results I decided to give this test a try myself.

To begin with, I created tables in my test database and pulled over data from AdventureWorks2014 database. To avoid extra operations that I did not want to skew results, I took the liberty of changing NVARCHAR to VARCHAR and drop extra columns that served no purpose in my tests.

 
--CREATE Clustered Tables
CREATE TABLE dbo.Person_Clustered
    (
      [BusinessEntityID] INT NOT NULL
                             PRIMARY KEY
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )
CREATE TABLE dbo.MyPeople_Clustered
    (
      [MyPeople_ID] INT NOT NULL
                        PRIMARY KEY
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )

--CREATE NonClustered Tables
CREATE TABLE dbo.Person_NonClustered
    (
      [BusinessEntityID] INT NOT NULL
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )
CREATE TABLE dbo.MyPeople_NonClustered
    (
      [MyPeople_ID] INT NOT NULL
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )

--Write all data from base table
INSERT  INTO dbo.Person_Clustered
        ( BusinessEntityID
        , PersonType
        , FirstName
        , LastName
        )
        SELECT  BusinessEntityID
              , CONVERT(CHAR(2), PersonType)
              , CONVERT(VARCHAR(20), FirstName)
              , LastName
        FROM    AdventureWorks2014.Person.Person

--Write some data to it, 
INSERT  INTO dbo.MyPeople_Clustered
        SELECT  *
        FROM    dbo.Person_Clustered AS p
        WHERE   p.BusinessEntityID % 133 = 0

INSERT  INTO dbo.MyPeople_NonClustered
        SELECT  *
        FROM    dbo.MyPeople_Clustered AS p

INSERT  INTO dbo.Person_NonClustered
        SELECT  *
        FROM    dbo.Person_Clustered AS p

SELECT  COUNT(*) dbo.Person_Clustered
SELECT  COUNT(*) dbo.Person_NonClustered
SELECT  COUNT(*) dbo.MyPeople_Clustered
SELECT  COUNT(*) dbo.MyPeople_NonClustered

For the actual test I created few different scenarios including NOT EXISTS and EXCEPT clauses. First 4 test are based on clustered primary keys and last 4 are based on non-clustered tables.

 

--Clustered Tables  - NOT IN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
WHERE   p.BusinessEntityID NOT IN ( SELECT  mp.MyPeople_ID
                                    FROM    dbo.MyPeople_Clustered mp )
--Clustered Tables  - NOT EXISTS
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
WHERE   NOT EXISTS ( SELECT 1
                     FROM   dbo.MyPeople_Clustered mp
                     WHERE  mp.MyPeople_ID = p.BusinessEntityID )


--Clustered Tables  - LEFT OUTER JOIN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
LEFT OUTER JOIN dbo.MyPeople_Clustered mp
        ON mp.MyPeople_ID = p.BusinessEntityID
WHERE   mp.MyPeople_ID IS NULL

--Clustered Tables  - EXCEPT
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
EXCEPT
SELECT  mp.MyPeople_ID
      , mp.PersonType
      , mp.FirstName
      , mp.LastName
FROM    dbo.MyPeople_Clustered AS mp


--NonClustered Tables  - NOT IN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
WHERE   p.BusinessEntityID NOT IN ( SELECT  mp.MyPeople_ID
                                    FROM    dbo.MyPeople_NonClustered mp )

--NonClustered Tables  - NOT EXISTS
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
WHERE   NOT EXISTS ( SELECT 1
                     FROM   dbo.MyPeople_NonClustered mp
                     WHERE  mp.MyPeople_ID = p.BusinessEntityID )

--NonClustered Tables  - LEFT OUTER JOIN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
LEFT OUTER JOIN dbo.MyPeople_NonClustered mp
        ON mp.MyPeople_ID = p.BusinessEntityID
WHERE   mp.MyPeople_ID IS NULL


--NonClustered Tables  - EXCEPT
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
EXCEPT
SELECT  mp.MyPeople_ID
      , mp.PersonType
      , mp.FirstName
      , mp.LastName
FROM    dbo.MyPeople_NonClustered AS mp

Since the point of this test was compare execution times and actual plans I captured actual execution plans using SSMS and compared run duration using SQL Sentry Plan Explorer.

Conclusion:

As it can be seen from my test, all 4 operations performed almost identical plans and almost within few milliseconds of each other. In case of Non-Clustered tables operations changed from clustered index scans to table scans but duration was still on point. In case of EXCEPT clause on Non-Clustered tables it introduced additional sort operations.

Looking at NOT IN and NOT EXISTS those operations produced identical plans and LEFT OUTER JOIN actually required one more operation to complete on top of those performed by first two.

This is exactly why I will continue to test code multiple ways and might use one way over another depending on situation.

Subscribe to: Posts ( Atom )

Pages

Tuesday, April 11, 2017

THROW versus RAISERROR

Tuesday, March 21, 2017

What kind of JOIN syntax is this?

Thursday, December 29, 2016

Date Dimension aka Calendar Table

References:

Tuesday, August 30, 2016

SQL Joins - Basics Part 2

Conclusion:

Tuesday, August 23, 2016

NOT IN - NOT EXISTS - LEFT JOIN - EXCEPT

Conclusion: