To begin with, I created tables in my test database and pulled over data from AdventureWorks2014 database. To avoid extra operations that I did not want to skew results, I took the liberty of changing NVARCHAR to VARCHAR and drop extra columns that served no purpose in my tests.
--CREATE Clustered Tables
CREATE TABLE dbo.Person_Clustered
(
[BusinessEntityID] INT NOT NULL
PRIMARY KEY
, [PersonType] CHAR(2) NOT NULL
, [FirstName] VARCHAR(50) NOT NULL
, [LastName] VARCHAR(50) NOT NULL,
)
CREATE TABLE dbo.MyPeople_Clustered
(
[MyPeople_ID] INT NOT NULL
PRIMARY KEY
, [PersonType] CHAR(2) NOT NULL
, [FirstName] VARCHAR(50) NOT NULL
, [LastName] VARCHAR(50) NOT NULL,
)
--CREATE NonClustered Tables
CREATE TABLE dbo.Person_NonClustered
(
[BusinessEntityID] INT NOT NULL
, [PersonType] CHAR(2) NOT NULL
, [FirstName] VARCHAR(50) NOT NULL
, [LastName] VARCHAR(50) NOT NULL,
)
CREATE TABLE dbo.MyPeople_NonClustered
(
[MyPeople_ID] INT NOT NULL
, [PersonType] CHAR(2) NOT NULL
, [FirstName] VARCHAR(50) NOT NULL
, [LastName] VARCHAR(50) NOT NULL,
)
--Write all data from base table
INSERT INTO dbo.Person_Clustered
( BusinessEntityID
, PersonType
, FirstName
, LastName
)
SELECT BusinessEntityID
, CONVERT(CHAR(2), PersonType)
, CONVERT(VARCHAR(20), FirstName)
, LastName
FROM AdventureWorks2014.Person.Person
--Write some data to it,
INSERT INTO dbo.MyPeople_Clustered
SELECT *
FROM dbo.Person_Clustered AS p
WHERE p.BusinessEntityID % 133 = 0
INSERT INTO dbo.MyPeople_NonClustered
SELECT *
FROM dbo.MyPeople_Clustered AS p
INSERT INTO dbo.Person_NonClustered
SELECT *
FROM dbo.Person_Clustered AS p
SELECT COUNT(*) dbo.Person_Clustered
SELECT COUNT(*) dbo.Person_NonClustered
SELECT COUNT(*) dbo.MyPeople_Clustered
SELECT COUNT(*) dbo.MyPeople_NonClustered
For the actual test I created few different scenarios including NOT EXISTS and EXCEPT clauses. First 4 test are based on clustered primary keys and last 4 are based on non-clustered tables.
--Clustered Tables - NOT IN
SELECT p.BusinessEntityID
, p.PersonType
, p.FirstName
, p.LastName
FROM dbo.Person_Clustered AS p
WHERE p.BusinessEntityID NOT IN ( SELECT mp.MyPeople_ID
FROM dbo.MyPeople_Clustered mp )
--Clustered Tables - NOT EXISTS
SELECT p.BusinessEntityID
, p.PersonType
, p.FirstName
, p.LastName
FROM dbo.Person_Clustered AS p
WHERE NOT EXISTS ( SELECT 1
FROM dbo.MyPeople_Clustered mp
WHERE mp.MyPeople_ID = p.BusinessEntityID )
--Clustered Tables - LEFT OUTER JOIN
SELECT p.BusinessEntityID
, p.PersonType
, p.FirstName
, p.LastName
FROM dbo.Person_Clustered AS p
LEFT OUTER JOIN dbo.MyPeople_Clustered mp
ON mp.MyPeople_ID = p.BusinessEntityID
WHERE mp.MyPeople_ID IS NULL
--Clustered Tables - EXCEPT
SELECT p.BusinessEntityID
, p.PersonType
, p.FirstName
, p.LastName
FROM dbo.Person_Clustered AS p
EXCEPT
SELECT mp.MyPeople_ID
, mp.PersonType
, mp.FirstName
, mp.LastName
FROM dbo.MyPeople_Clustered AS mp
--NonClustered Tables - NOT IN
SELECT p.BusinessEntityID
, p.PersonType
, p.FirstName
, p.LastName
FROM dbo.Person_NonClustered AS p
WHERE p.BusinessEntityID NOT IN ( SELECT mp.MyPeople_ID
FROM dbo.MyPeople_NonClustered mp )
--NonClustered Tables - NOT EXISTS
SELECT p.BusinessEntityID
, p.PersonType
, p.FirstName
, p.LastName
FROM dbo.Person_NonClustered AS p
WHERE NOT EXISTS ( SELECT 1
FROM dbo.MyPeople_NonClustered mp
WHERE mp.MyPeople_ID = p.BusinessEntityID )
--NonClustered Tables - LEFT OUTER JOIN
SELECT p.BusinessEntityID
, p.PersonType
, p.FirstName
, p.LastName
FROM dbo.Person_NonClustered AS p
LEFT OUTER JOIN dbo.MyPeople_NonClustered mp
ON mp.MyPeople_ID = p.BusinessEntityID
WHERE mp.MyPeople_ID IS NULL
--NonClustered Tables - EXCEPT
SELECT p.BusinessEntityID
, p.PersonType
, p.FirstName
, p.LastName
FROM dbo.Person_NonClustered AS p
EXCEPT
SELECT mp.MyPeople_ID
, mp.PersonType
, mp.FirstName
, mp.LastName
FROM dbo.MyPeople_NonClustered AS mp
Since the point of this test was compare execution times and actual plans I captured actual execution plans using SSMS and compared run duration using SQL Sentry Plan Explorer.
Conclusion:
As it can be seen from my test, all 4 operations performed almost identical plans and almost within few milliseconds of each other. In case of Non-Clustered tables operations changed from clustered index scans to table scans but duration was still on point. In case of EXCEPT clause on Non-Clustered tables it introduced additional sort operations.Looking at NOT IN and NOT EXISTS those operations produced identical plans and LEFT OUTER JOIN actually required one more operation to complete on top of those performed by first two.
This is exactly why I will continue to test code multiple ways and might use one way over another depending on situation.
No comments :
Post a Comment