Thursday, December 29, 2016

Date Dimension aka Calendar Table

Before we get into discussing how to create it date dimension and how to use it, first let's talk about what it is and why do we need it. Depending on who you talk to, people can refer to this concept as "Calendar table" or "Date Dimension," which is usually found in Data Warehouse. No matter how it is called, at the end of the day, it is a table in SQL Server which is populated with different date/calendar related information to help speed up SQL queries which require specific parts of dates.

In my case, I have created it to be able to aggregate data by quarters, years and month. Depending on how large your requirements are it will add additional complexity to building it. Since I don't care about holidays (for now at least), I will not be creating holiday schedule which can be complicated to populate.

Now onto writing some SQL. Step 1: Create a table and add "covered" Nonclustered index that I need for my queries. Depending on your needs you will need to create additional indexes and/or modify the one I'm creating.
 
CREATE TABLE [dbo].[DateDimension]
    (
        [DateKey]             [INT]         NOT NULL
      , [Date]                [DATE]        NOT NULL
      , [Day]                 [TINYINT]     NOT NULL
      , [DaySuffix]           [CHAR](2)     NOT NULL
      , [Weekday]             [TINYINT]     NOT NULL
      , [WeekDayName]         [VARCHAR](10) NOT NULL
      , [IsWeekend]           [BIT]         NOT NULL
      , [DOWInMonth]          [TINYINT]     NOT NULL
      , [DayOfYear]           [SMALLINT]    NOT NULL
      , [WeekOfMonth]         [TINYINT]     NOT NULL
      , [WeekOfYear]          [TINYINT]     NOT NULL
      , [Month]               [TINYINT]     NOT NULL
      , [MonthName]           [VARCHAR](10) NOT NULL
      , [Quarter]             [TINYINT]     NOT NULL
      , [QuarterName]         [VARCHAR](6)  NOT NULL
      , [Year]                [INT]         NOT NULL
      , [MMYYYY]              [CHAR](6)     NOT NULL
      , [MonthYear]           [CHAR](7)     NOT NULL
      , [FirstDayOfMonth]     [DATE]        NOT NULL
      , [LastDayOfMonth]      [DATE]        NOT NULL
      , [FirstDayOfQuarter]   [DATE]        NOT NULL
      , [LastDayOfQuarter]    [DATE]        NOT NULL
      , [FirstDayOfYear]      [DATE]        NOT NULL
      , [LastDayOfYear]       [DATE]        NOT NULL
      , [FirstDayOfNextMonth] [DATE]        NOT NULL
      , [FirstDayOfNextYear]  [DATE]        NOT NULL ,
    )

GO

ALTER TABLE dbo.DateDimension
ADD CONSTRAINT PK_DateDimension
    PRIMARY KEY CLUSTERED (DateKey)

IF NOT EXISTS
    (
        SELECT  1
        FROM    sys.indexes AS i
        WHERE   i.name = 'IX_DateDimension_Date'
    )
    CREATE NONCLUSTERED INDEX [IX_DateDimension_Date]
        ON dbo.DateDimension ([Date])
        INCLUDE ([Month], [Quarter], [Year])
GO


The reason why I chose to include Month, Quarter and Year columns in my index is based on my requirements to aggregate data by those columns. Originally I did not have any columns included, which produced Key Lookup. Now to populate table I have taken some suggestions from Aaron's script, link at the bottom of the post. In my case, I'm truncating table and limiting it to 300 years which is more than plenty for my scenario.

 
TRUNCATE TABLE dbo.DateDimension

DECLARE @StartDate     DATE = '01/01/1900'
      , @NumberOfYears INT  = 300;

DECLARE @CutoffDate DATE = DATEADD( YEAR, @NumberOfYears, @StartDate );

-- prevent set or regional settings from interfering with 
-- interpretation of dates / literals
SET DATEFIRST 7;
SET DATEFORMAT MDY;
SET LANGUAGE US_ENGLISH;
-- Thanks to Aaron for this hint

WITH Pass0 AS
    (
        SELECT  1 AS C
        UNION ALL
        SELECT  1
    )
   , Pass1 AS
    (
        SELECT  1 AS C
        FROM    Pass0 AS A
              , Pass0 AS B
    )
   , Pass2 AS
    (
        SELECT  1 AS C
        FROM    Pass1 AS A
              , Pass1 AS B
    )
   , Pass3 AS
    (
        SELECT  1 AS C
        FROM    Pass2 AS A
              , Pass2 AS B
    )
   , Pass4 AS
    (
        SELECT  1 AS C
        FROM    Pass3 AS A
              , Pass3 AS B
    )
   , Pass5 AS
    (
        SELECT  1 AS C
        FROM    Pass4 AS A
              , Pass4 AS B
    )
   , Pass6 AS
    (
        SELECT TOP (DATEDIFF( DAY, @StartDate, @CutoffDate ))
            rn = ROW_NUMBER() OVER (ORDER BY Pass5.C)
        FROM    Pass5
    )
   , Pass7 AS
    (
        SELECT  a.Date
              , [day]          = DATEPART( DAY, a.Date )
              , [week]         = DATEPART( WEEK, a.Date )
              , [month]        = DATEPART( MONTH, a.Date )
              , [quarter]      = DATEPART( QUARTER, a.Date )
              , [year]         = DATEPART( YEAR, a.Date )
              , [DayOfWeek]    = DATEPART( WEEKDAY, a.Date )
              , [FirstOfMonth] = CONVERT( DATE, DATEADD( MONTH, DATEDIFF( MONTH, 0, a.Date ), 0 ))
              , FirstOfYear    = CONVERT( DATE, DATEADD( YEAR, DATEDIFF( YEAR, 0, a.Date ), 0 ))
              , [MonthName]    = DATENAME( MONTH, a.Date )
              , Style101       = CONVERT( CHAR(10), a.Date, 101 )
              , Style112       = CONVERT( CHAR(8), a.Date, 112 )
        FROM
            (
                SELECT  [Date] = DATEADD( DAY, Pass6.rn - 1, @StartDate )
                FROM    Pass6
            ) a
    )
INSERT  dbo.DateDimension WITH (TABLOCKX)
SELECT  DateKey             = CONVERT( INT, CONVERT( CHAR(8), pass7.Date, 112 ))
      , [Date]              = pass7.Date
      , [Day]               = CONVERT( TINYINT, pass7.day )
      , DaySuffix           = CONVERT(   CHAR(2)
                                       , CASE
                                             WHEN pass7.day / 10 = 1
                                                 THEN 'th'
                                         ELSE CASE RIGHT(pass7.day, 1)
                                                  WHEN '1'
                                                      THEN 'st'
                                                  WHEN '2'
                                                      THEN 'nd'
                                                  WHEN '3'
                                                      THEN 'rd'
                                              ELSE 'th'
                                              END
                                         END
                                     )
      , [Weekday]           = CONVERT( TINYINT, pass7.DayOfWeek )
      , [WeekDayName]       = CONVERT( VARCHAR(10), DATENAME( WEEKDAY, pass7.Date ))
      , [IsWeekend]         = CONVERT(   BIT
                                       , CASE
                                             WHEN pass7.DayOfWeek IN (
                                                                         1, 7
                                                                     )
                                                 THEN 1
                                         ELSE 0
                                         END
                                     )
      , [DOWInMonth]        = CONVERT(   TINYINT
                                       , ROW_NUMBER() OVER (PARTITION BY pass7.FirstOfMonth
                                                                       , pass7.DayOfWeek
                                                            ORDER BY pass7.Date
                                                           )
                                     )
      , [DayOfYear]         = CONVERT( SMALLINT, DATEPART( DAYOFYEAR, pass7.Date ))
      , WeekOfMonth         = CONVERT(   TINYINT
                                       , DENSE_RANK() OVER (PARTITION BY pass7.year
                                                                       , pass7.month
                                                            ORDER BY pass7.week
                                                           )
                                     )
      , WeekOfYear          = CONVERT( TINYINT, pass7.week )
      , [Month]             = CONVERT( TINYINT, pass7.month )
      , [MonthName]         = CONVERT( VARCHAR(10), pass7.MonthName )
      , [Quarter]           = CONVERT( TINYINT, pass7.quarter )
      , QuarterName         = CONVERT(   VARCHAR(6)
                                       , CASE pass7.quarter
                                             WHEN 1
                                                 THEN 'First'
                                             WHEN 2
                                                 THEN 'Second'
                                             WHEN 3
                                                 THEN 'Third'
                                             WHEN 4
                                                 THEN 'Fourth'
                                         END
                                     )
      , [Year]              = pass7.year
      , MMYYYY              = CONVERT( CHAR(6), LEFT(pass7.Style101, 2) + LEFT(pass7.Style112, 4))
      , MonthYear           = CONVERT( CHAR(7), LEFT(pass7.MonthName, 3) + LEFT(pass7.Style112, 4))
      , FirstDayOfMonth     = pass7.FirstOfMonth
      , LastDayOfMonth      = MAX( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.month
                                                     )
      , FirstDayOfQuarter   = MIN( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.quarter
                                                     )
      , LastDayOfQuarter    = MAX( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.quarter
                                                     )
      , FirstDayOfYear      = pass7.FirstOfYear
      , LastDayOfYear       = MAX( pass7.Date ) OVER (PARTITION BY pass7.year)
      , FirstDayOfNextMonth = DATEADD( MONTH, 1, pass7.FirstOfMonth )
      , FirstDayOfNextYear  = DATEADD( YEAR, 1, pass7.FirstOfYear )
FROM    Pass7


Few things to consider in the script. Depending on requirements and local setting, part of the script would need to be adjusted to reflect correct data. For example, Sunday could the first day of the week in some countries and last day of the week in other coutnries. Same goes for date formats, in United States we have Month/Day/Year back home in Russia I always wrote Day/Month/Year which could create large issues when trying to compare data stored in one format to date dimension that is stored in a different format. That is why the following two commands are so important SET DATEFIRST 7 and SET DATEFORMAT MDY;

Now that table is populated you too can run aggregate queries without having to generate data at run time. Enjoy!

References:
Bertrand, Aaron. "Creating a Date Dimension or Calendar Table in SQL Server." Creating a Date Dimension or Calendar Table in SQL Server. N.p., 20 Oct. 2015. Web. 29 Dec. 2016. <https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/>.

No comments :

Post a Comment