Monday, August 15, 2016

STRING_SPLIT() vs User Defined Table Types - Round 1

During SQL Saturday Indianapolis I heard about new function introduced in SQL Server 2016 STRING_SPLIT(). Presenter showed few examples of how to use it and mentioned that it is faster than other implementations of what people had to previously done. This got me excited and I wanted to see how it compares to User Defined Table Types which has always been my go to replacement for string splitting.

This post is not about comparing new function to other function, Aaron Bertrand already did a great post comparing it to other solutions. IT can be found here -> Performance Surprises and Assumptions : STRING_SPLIT(). What I wanted to do is compare it to User Defined Table Types because it always been faster (at least this is what I think about it).

Now unto test scenario. First time I had to replace string splitting function with User Defined Table Typed Parameter was when I had to rewrite stored procedure that accepted a list of 5000 items inside varchar(max) parameter. With that in mind I decided to test new function the same way. Pass a huge string, split it inside stored procedure and return result set back to .NET application. For this test I created two stored procedures running against Wide World Importers Test Database.
 
 /*
 ==============================================================================================================================
 Author:  Vlady Oselsky
 Create date: 08/15/2016
 Description:   Testing String Split Function
==============================================================================================================================
*/
CREATE PROCEDURE [dbo].[Invoices_ByTableType]
(
 @List IdList READONLY
)
AS
BEGIN
 SET NOCOUNT ON;
 DECLARE @StartTime datetime2 = SYSDATETIME();
 DECLARE @RwCnt int;

 select iv.SalespersonPersonID
  ,iv.InvoiceDate
  ,iv.DeliveryInstructions
 from WideWorldImporters.Sales.Invoices as iv
 join @List as l
  on iv.InvoiceID = l.ID
 order by 1

 SET @RwCnt = @@ROWCOUNT
 DECLARE @EndTime Datetime2 = Sysdatetime();
 INSERT INTO SPExecutionTime (SPName,RowCnt, Duration)
 VALUES(OBJECT_NAME(@@PROCID),@rwCnt, DATEDIFF(MICROSECOND, @StartTime, @EndTime))
END
GO
 /*
 ==============================================================================================================================
 Author:  Vlady Oselsky
 Create date: 08/15/2016
 Description:   Testing String Split Function
==============================================================================================================================
*/
CREATE PROCEDURE [dbo].[Invoices_ByStringList]
(
 @List VARCHAR(MAX)
)
AS
BEGIN
 SET NOCOUNT ON;
 DECLARE @StartTime datetime2 = SYSDATETIME();
 DECLARE @RwCnt int;

 select iv.SalespersonPersonID
  ,iv.InvoiceDate
  ,iv.DeliveryInstructions
 from WideWorldImporters.Sales.Invoices as iv
 join string_split(@List, ',') as l
  on iv.InvoiceID = l.value
 order by 1
 
 SET @RwCnt = @@ROWCOUNT
 DECLARE @EndTime Datetime2 = Sysdatetime();
 INSERT INTO SPExecutionTime (SPName,RowCnt, Duration)
 VALUES(OBJECT_NAME(@@PROCID),@rwCnt, DATEDIFF(MICROSECOND, @StartTime, @EndTime))
END
Below is the script for user defined table type and table to store results on SQL Side. I done this in order to compare SQL duration to total duration on .NET side
 
CREATE TYPE [dbo].[IdList] AS TABLE( [ID] [int] NULL )
GO
CREATE TABLE dbo.SPExecutionTime
(
 ID INT IDENTITY(1,1) PRIMARY KEY
 ,SPName sysname
 ,RowCnt INT
 ,Duration INT
)
GO

Now onto .NET C# code. To be fair I wanted to pass entire value on .NET side instead of trying to build it just before calling stored procedure. Below is the code for calling TableType stored procedure, as you can see instead of having VarChar type, I have Structured type which accepts IEnumerable variable. In my case I'm passing DataTable.
 
stopwatch.Start(); //Time starts here
DataTable table = new DataTable();
using (var con = new SqlConnection(connectionString))
{
    con.Open();
    var cmd = new SqlCommand();
    cmd.Connection = con;
    cmd.CommandText = "Invoices_ByTableType";
    cmd.CommandType = CommandType.StoredProcedure;
    cmd.Parameters.Add("@List", SqlDbType.Structured, -1).Value = hugeListTable;

    using (var da = new SqlDataAdapter(cmd))
    {
        da.Fill(table);
    }

}
dataGridView2.DataSource = table;
stopwatch.Stop(); //Time ends Here
Next comes a call to stored procedure with VARCHAR(MAX).
 
stopwatch.Start();
DataTable table = new DataTable();
            
using (var con = new SqlConnection(connectionString))
{
    con.Open();
    var cmd = new SqlCommand();
    cmd.Connection = con;
    cmd.CommandText = "Invoices_ByStringList";
    cmd.CommandType = CommandType.StoredProcedure;
    cmd.Parameters.Add("@List", SqlDbType.VarChar, -1).Value = hugeListString;

    using (var da = new SqlDataAdapter(cmd))
    {
        da.Fill(table);
    }

}
dataGridView1.DataSource = table;
stopwatch.Stop();
On front end I setup simple interface with two buttons and a way to aggregate data together. After running each stored procedure 50 times. I came up with the following result.

Invoices_ByStringList - 112.82 millisecond
Invoices_ByTableType - 79.4 millisecond

Conclusion: Round 1 goes to User Defined Table Type. One thing I noticed when looking at execution plans. I saw that string_split() function always estimates 50 rows, whereas when calling with user defined table type estimated row count matches actual. Additionally there is implicit conversion because string returned from split function and I'm joining it to integer column. Next I want to make few other adjustment to stored procedures and front end to try to call it with different number of rows to see how it changes the outcome.
Update:

No comments :

Post a Comment