I don't think I have tested for every scenario possible, but it is better than only a single test. With that being said, I'm a little surprised at the result. I thought that User Defined Table Type would win in every scenario but it is clear that in some cases STRING_SPLIT() function is actually faster. Just like most other things in SQL Server the answer is: It Depends!
In the case of STRING_SPLIT() function Engineers at Microsoft did amazing job optimizing it over other custom solutions that people had to resort to in the past. Of course there are limitations with it, that need to be acknowledged and taken into consideration when designing a solution. In my opinion this could be quick go to solution that is great for initial testing but needs to be heavily tested when it comes to final solution for production environment.
Just like I mentioned in previous post, estimated row count always shows 50 rows and value returned from the function is a string and needs to be converted to target type. This did not prove to be large issue in my testing but in complex execution plans it could throw query optimizer into wrong plan when getting data from other tables which could cause major slow downs.
In my test scenarios outlined below, User Defined Table Type Won 4 out of 6 times. I'm sure I could have setup other scenarios where it would lose every time but in this case it won. Leave comments if you want to see full source code or want me to run any other scenarios. Thanks for reading!
SELECT a.*
, CASE WHEN ( MAX(a.AvgDuration) OVER ( PARTITION BY RowCnt ) ) = AvgDuration
THEN 'LOOSER' ELSE 'WINNER' END AS RoundDecision
, ROW_NUMBER() OVER ( ORDER BY RowCnt, SPName ) AS RowNumber
FROM (
SELECT SPName
, RowCnt
, COUNT(*) AS NumberOfExecutions
, MIN(Duration) AS MinDuration
, AVG(Duration) AS AvgDuration
, MAX(duration) AS MaxDuration
FROM dbo.SPExecutionTime
GROUP BY SPName
, RowCnt
) a
ORDER BY RowNumber
During SQL Saturday Indianapolis I heard about new function introduced in SQL Server 2016 STRING_SPLIT(). Presenter showed few examples of how to use it and mentioned that it is faster than other implementations of what people had to previously done. This got me excited and I wanted to see how it compares to User Defined Table Types which has always been my go to replacement for string splitting.
This post is not about comparing new function to other function, Aaron Bertrand already did a great post comparing it to other solutions. IT can be found here -> Performance Surprises and Assumptions : STRING_SPLIT(). What I wanted to do is compare it to User Defined Table Types because it always been faster (at least this is what I think about it).
Now unto test scenario. First time I had to replace string splitting function with User Defined Table Typed Parameter was when I had to rewrite stored procedure that accepted a list of 5000 items inside varchar(max) parameter. With that in mind I decided to test new function the same way. Pass a huge string, split it inside stored procedure and return result set back to .NET application. For this test I created two stored procedures running against Wide World Importers Test Database.
/*
==============================================================================================================================
Author: Vlady Oselsky
Create date: 08/15/2016
Description: Testing String Split Function
==============================================================================================================================
*/
CREATE PROCEDURE [dbo].[Invoices_ByTableType]
(
@List IdList READONLY
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @StartTime datetime2 = SYSDATETIME();
DECLARE @RwCnt int;
select iv.SalespersonPersonID
,iv.InvoiceDate
,iv.DeliveryInstructions
from WideWorldImporters.Sales.Invoices as iv
join @List as l
on iv.InvoiceID = l.ID
order by 1
SET @RwCnt = @@ROWCOUNT
DECLARE @EndTime Datetime2 = Sysdatetime();
INSERT INTO SPExecutionTime (SPName,RowCnt, Duration)
VALUES(OBJECT_NAME(@@PROCID),@rwCnt, DATEDIFF(MICROSECOND, @StartTime, @EndTime))
END
GO
/*
==============================================================================================================================
Author: Vlady Oselsky
Create date: 08/15/2016
Description: Testing String Split Function
==============================================================================================================================
*/
CREATE PROCEDURE [dbo].[Invoices_ByStringList]
(
@List VARCHAR(MAX)
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @StartTime datetime2 = SYSDATETIME();
DECLARE @RwCnt int;
select iv.SalespersonPersonID
,iv.InvoiceDate
,iv.DeliveryInstructions
from WideWorldImporters.Sales.Invoices as iv
join string_split(@List, ',') as l
on iv.InvoiceID = l.value
order by 1
SET @RwCnt = @@ROWCOUNT
DECLARE @EndTime Datetime2 = Sysdatetime();
INSERT INTO SPExecutionTime (SPName,RowCnt, Duration)
VALUES(OBJECT_NAME(@@PROCID),@rwCnt, DATEDIFF(MICROSECOND, @StartTime, @EndTime))
END
Below is the script for user defined table type and table to store results on SQL Side. I done this in order to compare SQL duration to total duration on .NET side
CREATE TYPE [dbo].[IdList] AS TABLE( [ID] [int] NULL )
GO
CREATE TABLE dbo.SPExecutionTime
(
ID INT IDENTITY(1,1) PRIMARY KEY
,SPName sysname
,RowCnt INT
,Duration INT
)
GO
Now onto .NET C# code. To be fair I wanted to pass entire value on .NET side instead of trying to build it just before calling stored procedure. Below is the code for calling TableType stored procedure, as you can see instead of having VarChar type, I have Structured type which accepts IEnumerable variable. In my case I'm passing DataTable.
stopwatch.Start(); //Time starts here
DataTable table = new DataTable();
using (var con = new SqlConnection(connectionString))
{
con.Open();
var cmd = new SqlCommand();
cmd.Connection = con;
cmd.CommandText = "Invoices_ByTableType";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("@List", SqlDbType.Structured, -1).Value = hugeListTable;
using (var da = new SqlDataAdapter(cmd))
{
da.Fill(table);
}
}
dataGridView2.DataSource = table;
stopwatch.Stop(); //Time ends Here
Next comes a call to stored procedure with VARCHAR(MAX).
stopwatch.Start();
DataTable table = new DataTable();
using (var con = new SqlConnection(connectionString))
{
con.Open();
var cmd = new SqlCommand();
cmd.Connection = con;
cmd.CommandText = "Invoices_ByStringList";
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("@List", SqlDbType.VarChar, -1).Value = hugeListString;
using (var da = new SqlDataAdapter(cmd))
{
da.Fill(table);
}
}
dataGridView1.DataSource = table;
stopwatch.Stop();
On front end I setup simple interface with two buttons and a way to aggregate data together. After running each stored procedure 50 times. I came up with the following result.
Conclusion: Round 1 goes to User Defined Table Type. One thing I noticed when looking at execution plans. I saw that string_split() function always estimates 50 rows, whereas when calling with user defined table type estimated row count matches actual. Additionally there is implicit conversion because string returned from split function and I'm joining it to integer column. Next I want to make few other adjustment to stored procedures and front end to try to call it with different number of rows to see how it changes the outcome.