Tuesday, August 23, 2016

NOT IN - NOT EXISTS - LEFT JOIN - EXCEPT

Today I stumbled upon a blog post that pointed out how inefficient NOT IN clause and comment was made to use LEFT OUTER JOIN because it is "faster". They showed execution plans of both and SSMS query cost from Actual Execution plan. The first thing that jumped out at me were the Table scans that were present in both plans. Since I did some reading previously on how these suppose to yield almost similar results I decided to give this test a try myself.

To begin with, I created tables in my test database and pulled over data from AdventureWorks2014 database. To avoid extra operations that I did not want to skew results, I took the liberty of changing NVARCHAR to VARCHAR and drop extra columns that served no purpose in my tests.
 
--CREATE Clustered Tables
CREATE TABLE dbo.Person_Clustered
    (
      [BusinessEntityID] INT NOT NULL
                             PRIMARY KEY
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )
CREATE TABLE dbo.MyPeople_Clustered
    (
      [MyPeople_ID] INT NOT NULL
                        PRIMARY KEY
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )

--CREATE NonClustered Tables
CREATE TABLE dbo.Person_NonClustered
    (
      [BusinessEntityID] INT NOT NULL
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )
CREATE TABLE dbo.MyPeople_NonClustered
    (
      [MyPeople_ID] INT NOT NULL
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )

--Write all data from base table
INSERT  INTO dbo.Person_Clustered
        ( BusinessEntityID
        , PersonType
        , FirstName
        , LastName
        )
        SELECT  BusinessEntityID
              , CONVERT(CHAR(2), PersonType)
              , CONVERT(VARCHAR(20), FirstName)
              , LastName
        FROM    AdventureWorks2014.Person.Person

--Write some data to it, 
INSERT  INTO dbo.MyPeople_Clustered
        SELECT  *
        FROM    dbo.Person_Clustered AS p
        WHERE   p.BusinessEntityID % 133 = 0

INSERT  INTO dbo.MyPeople_NonClustered
        SELECT  *
        FROM    dbo.MyPeople_Clustered AS p

INSERT  INTO dbo.Person_NonClustered
        SELECT  *
        FROM    dbo.Person_Clustered AS p

SELECT  COUNT(*) dbo.Person_Clustered
SELECT  COUNT(*) dbo.Person_NonClustered
SELECT  COUNT(*) dbo.MyPeople_Clustered
SELECT  COUNT(*) dbo.MyPeople_NonClustered

For the actual test I created few different scenarios including NOT EXISTS and EXCEPT clauses. First 4 test are based on clustered primary keys and last 4 are based on non-clustered tables.
 

--Clustered Tables  - NOT IN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
WHERE   p.BusinessEntityID NOT IN ( SELECT  mp.MyPeople_ID
                                    FROM    dbo.MyPeople_Clustered mp )
--Clustered Tables  - NOT EXISTS
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
WHERE   NOT EXISTS ( SELECT 1
                     FROM   dbo.MyPeople_Clustered mp
                     WHERE  mp.MyPeople_ID = p.BusinessEntityID )


--Clustered Tables  - LEFT OUTER JOIN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
LEFT OUTER JOIN dbo.MyPeople_Clustered mp
        ON mp.MyPeople_ID = p.BusinessEntityID
WHERE   mp.MyPeople_ID IS NULL

--Clustered Tables  - EXCEPT
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
EXCEPT
SELECT  mp.MyPeople_ID
      , mp.PersonType
      , mp.FirstName
      , mp.LastName
FROM    dbo.MyPeople_Clustered AS mp


--NonClustered Tables  - NOT IN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
WHERE   p.BusinessEntityID NOT IN ( SELECT  mp.MyPeople_ID
                                    FROM    dbo.MyPeople_NonClustered mp )

--NonClustered Tables  - NOT EXISTS
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
WHERE   NOT EXISTS ( SELECT 1
                     FROM   dbo.MyPeople_NonClustered mp
                     WHERE  mp.MyPeople_ID = p.BusinessEntityID )

--NonClustered Tables  - LEFT OUTER JOIN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
LEFT OUTER JOIN dbo.MyPeople_NonClustered mp
        ON mp.MyPeople_ID = p.BusinessEntityID
WHERE   mp.MyPeople_ID IS NULL


--NonClustered Tables  - EXCEPT
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
EXCEPT
SELECT  mp.MyPeople_ID
      , mp.PersonType
      , mp.FirstName
      , mp.LastName
FROM    dbo.MyPeople_NonClustered AS mp

Since the point of this test was compare execution times and actual plans I captured actual execution plans using SSMS and compared run duration using SQL Sentry Plan Explorer.


Conclusion:
As it can be seen from my test, all 4 operations performed almost identical plans and almost within few milliseconds of each other. In case of Non-Clustered tables operations changed from clustered index scans to table scans but duration was still on point. In case of EXCEPT clause on Non-Clustered tables it introduced additional sort operations.

Looking at NOT IN and NOT EXISTS those operations produced identical plans and LEFT OUTER JOIN actually required one more operation to complete on top of those performed by first two.

This is exactly why I will continue to test code multiple ways and might use one way over another depending on situation.

No comments :

Post a Comment