Before we get into discussing how to create it date dimension and how to use it, first let's talk about what it is and why do we need it. Depending on who you talk to, people can refer to this concept as "Calendar table" or "Date Dimension," which is usually found in Data Warehouse. No matter how it is called, at the end of the day, it is a table in SQL Server which is populated with different date/calendar related information to help speed up SQL queries which require specific parts of dates.
In my case, I have created it to be able to aggregate data by quarters, years and month. Depending on how large your requirements are it will add additional complexity to building it. Since I don't care about holidays (for now at least), I will not be creating holiday schedule which can be complicated to populate.
Now onto writing some SQL. Step 1: Create a table and add "covered" Nonclustered index that I need for my queries. Depending on your needs you will need to create additional indexes and/or modify the one I'm creating.
The reason why I chose to include Month, Quarter and Year columns in my index is based on my requirements to aggregate data by those columns. Originally I did not have any columns included, which produced Key Lookup. Now to populate table I have taken some suggestions from Aaron's script, link at the bottom of the post. In my case, I'm truncating table and limiting it to 300 years which is more than plenty for my scenario.
Few things to consider in the script. Depending on requirements and local setting, part of the script would need to be adjusted to reflect correct data. For example, Sunday could the first day of the week in some countries and last day of the week in other coutnries. Same goes for date formats, in United States we have Month/Day/Year back home in Russia I always wrote Day/Month/Year which could create large issues when trying to compare data stored in one format to date dimension that is stored in a different format. That is why the following two commands are so important SET DATEFIRST 7 and SET DATEFORMAT MDY;
Now that table is populated you too can run aggregate queries without having to generate data at run time. Enjoy!
References:
Bertrand, Aaron. "Creating a Date Dimension or Calendar Table in SQL Server."
Creating a Date Dimension or Calendar Table in SQL Server. N.p., 20 Oct. 2015. Web. 29 Dec. 2016. <https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/>.
Has it ever happened to you that most simple piece of code produces results that make no sense what so ever? Today, I helped a coworker fix one of those little issues that I thought deserves a post. Even though all men are created equal, SQL Server functions are not. Depending on who wrote the function and at what point in the history of SQL Server it can behave differently from what is expected.
Today I want to look at ISNULL vs COALESCE vs OR . To start will I will create a simple table and populate it with two records.
Results are as follows.
ID
CODE
1
BOB01
2
VLADIMIR01
Exactly what we would expect. Now unto something more fun. Let us try to limit result by use of a local variable, this would work exactly the same as a parameter in stored procedure or a function.
After running the code we get the following results
Method
ID
CODE
ISNULL- METHOD
1
BOB01
Even though we would expect to see both records returned we only get 1 record. Huh? This is exactly what puzzled a coworker, ofcourse query was not as simple as this one but same issue caused him to hit a road block.
In the case of COALESCE and OR methods, results are identical.
Method
ID
CODE
COALESCE - METHOD
1
BOB01
COALESCE - METHOD
2
VLADIMIR01
Method
ID
CODE
OR- METHOD
1
BOB01
OR- METHOD
2
VLADIMIR01
So what caused results to be different for the first function but be the same for next two functions? I'm sure the title of the post already gave it away. Implicit Conversion is what happens behind the scenes and usually when you least expect it. Unless you look at execution plan you would not be able to tell that CONVERT_IMPLICIT has ever occured. I know I can't stop raving about Plan Explorer and how much I love it, if we run same query in PE and look at the base select statement it shows right on the plan CONVERT_IMPLICIT as seen in the screenshot below.
Moral of today's story is: Don't ever assume that all SQL Server functions will behave exactly the same. In some cases, they might throw you in for a spin and you will be scratching your head at results without knowing how to fix it or what is causing it. In the case of what inspired today's post, my coworker will now be more educated and will be more cautious when working with SQL server functions. "It depends!" phrase has truly become a common theme for SQL Server operations. It is always best to test multiple ways of coding to figure out what is the best and which way will produce consistent results.
This is a quick little reference on how to update the password for SQL Login when you don't SSMS installed on the machine running SQL, but you are able to login to the machine with the user that has SA the priviledges on SQL Instance.
Step 1
Login to machine (RDP, Hyper-V or however you need to run something from the machine itself)
Step 2
Start PowerShell
Step 3
Connect to SQL Server by running the following command and update password.
In my case I did the following.
Step 4
Celebrate and go back to finishing preparing for a presentation or whatever else you were doing.
Recently run into an issue that caused me spend more time trying figure out what to do that it did to fix it. I got a very simple ticket. Client reports that extra pages are being printed on SSRS report when it is being sent to a specific printer but other printers are fine, additionally printing to PDF is fine.
After some research, I found multiple articles online that talk about improper page and body setup that results in extra pages. Since I'm not used to working on SSRS report inside BIDS (Bussiness Intelligence Development Studio) which was a precursor to SSDT (SQL Server Data Tools), It took me for longer than I would expect to accomplish a simple task. Therefore I'm hoping the following screenshots will save someone (most likely me) time in fixing this issue.
Solution
Step 1:
Open report in Design mode which is causing issues.
Step 2:
Verify printer margins by right clicking on empty space and selecting Report Properties
Step 3:
Check all margins, for some older printers margins, should be at least .25in to be on safe side. Newer printers can go less but some of older printers can not do it.
Step 4:
Calculate maximum report body size by subtracting margins from width and height, in my case
Width - Left - Right (11 - .5 - .6) = 9.9in
Height - Top - Bottom (8.5 - .5 - .25) = 7.75in
Step 5:
Click on the body of the report to see height and width in the properties window. Now you can adjust the size to be less than calculated size from previous step.
Conlusion
Now you are all done, all that is left is to publish changes and deploy report to the client.
This guide is more of work around for specific opportunity I faced at work. Hopefully, this can help someone else to create a workaround for a problem that should be rather easy to be fixed, but there are limitations that prevent you from taking the simple solution.
Scenario:
You have upgraded to newest Visual Studio 2015 and you are excited (not me) to make some reports. You get the path for the project and you create working directory using Visual Source Safe ( an old tool that got replaced by Team Foundation Server). Now that files are on your local machine, you fire up Visual Studio and try to open the solution only to get the following screen.
Now What? You talk to other people and they are still using Visual Studio 2010 and you are not allowed to upgrade and break it for everyone.
Solution:
Create new project and name it something different from original project to avoid confusion
Now that we have a project we need to copy all report definitions to the new location that was created for the project. To do that just find all .RDL files in old folder and copy them to new folder that was created for the project.
Now just add reports to the project by right-clicking on Reports Folder in Solution explorer and select all reports that have been copied over.
At this point reports are in new project and you are ready to develop (or are you)
You find the most amazing report that looks something like this and you click Preview just to get the following error.
This error even though it is cryptic, is actually missing Shared Data Source file that existed in the original project.
Now all we have to do is copy .RDS file found in the original directory to the new directory where we placed .RDL files and then add Existing Item under Shared Data Sources
At this point we can preview report but now arises the issue of deploying it the server. Since we created the new project we don't know where it needs to be deployed. Luckily this can be easily fixed by opening .rptproj file in Notepad++ or another editor and looking for Configurations element. It should look something like this.
Once you have that information just place it project properties of your new project and you are done.
Notes:
Couple things to consider, but doing this we have removed the project from original source control solution and it will take additional effort to commit it back to source control. The easiest way to overcome would be to check out and check-in files from new location using Source Control utility. In my case, I can easily check out and check files back in using Visual SourceSafe Explorer.
As always questions and comments are welcomed and I if I can add any details to help out fellow developer I would be happy to do so.
Having already covered the basics of SQL Join syntax, now it is time to jump into a little more advanced stuff. Early in my career, I remember being confused about the difference in results when the same clause is placed in JOIN instead of WHERE clause. This post is aimed to clarify those questions and few others around ON and WHERE clauses. For demonstration I will use table built and populated in SQL Joins - Basics Part 1
First, let us look at basic LEFT join from Part 1. All rows are returned from Teacher table and all matching rows from Student table are displayed.
TeacherFullName
StudentFullName
Roy Chad
Jepson Balfour
Roy Chad
Milburn Brett
Dudley Goddard
Clinton Schuylerr
Dudley Goddard
Norbert Kemp
Raphael Philander
Meriwether Kennedy
Raphael Philander
Braith Cornelius
Sonnie Davin
NULL
Now we introduce simple WHERE clause. By adding condition to WHERE clause we are now restricting entire results set to specific condition, any rows that do not satisfy that condition are excluded from results. Therefore we only end up with one row seen below.
TeacherFullName
StudentFullName
Roy Chad
Jepson Balfour
So what happens when you move same clause to the ON clause? Ok, lets test it!
TeacherFullName
StudentFullName
Roy Chad
Jepson Balfour
Dudley Goddard
NULL
Raphael Philander
NULL
Sonnie Davin
NULL
What happened? Result set looks nothing like first or second example. When I first did that I was thoroughly confused. To explain the result lets run another query.
StudentFullName
Jepson Balfour
Even though last two queries look different in reality they are placing exactly same restriction on Student table. On clause on Student table became where clause that restricts results to only rows that specific that critirea. Since Teacher table is not joined with INNER join it is not restricted by what happens to Student therefore we see all Teachers displyaed but only one of them actually showing a student.
Conclusion:
The ON clause is a powerfull way to change your result set exactly to what you need it to be, but if used without understanding of what happens to the data it can produce unpredicted result set. Each statement placed in ON clause will be evaulated prior to WHERE clause. This goes back to understanding order of operations in SQL Server. Below are just few of the operations listed in correct order. By knowing and understanding the order of operations in SQL Server it helps to understand why queries behaved the way that they did above. Each one was evaluated by SQL Server in correct order which produced correct output based on that structure.
FROM
ON
WHERE
SELECT
As always feel free to leave comments, questions, etc. For next post I will answer common question of TOP clause.
Today I stumbled upon a blog post that pointed out how inefficient NOT IN clause and comment was made to use LEFT OUTER JOIN because it is "faster". They showed execution plans of both and SSMS query cost from Actual Execution plan. The first thing that jumped out at me were the Table scans that were present in both plans. Since I did some reading previously on how these suppose to yield almost similar results I decided to give this test a try myself.
To begin with, I created tables in my test database and pulled over data from AdventureWorks2014 database. To avoid extra operations that I did not want to skew results, I took the liberty of changing NVARCHAR to VARCHAR and drop extra columns that served no purpose in my tests.
For the actual test I created few different scenarios including NOT EXISTS and EXCEPT clauses. First 4 test are based on clustered primary keys and last 4 are based on non-clustered tables.
Since the point of this test was compare execution times and actual plans I captured actual execution plans using SSMS and compared run duration using SQL Sentry Plan Explorer.
Conclusion:
As it can be seen from my test, all 4 operations performed almost identical plans and almost within few milliseconds of each other. In case of Non-Clustered tables operations changed from clustered index scans to table scans but duration was still on point. In case of EXCEPT clause on Non-Clustered tables it introduced additional sort operations.
Looking at NOT IN and NOT EXISTS those operations produced identical plans and LEFT OUTER JOIN actually required one more operation to complete on top of those performed by first two.
This is exactly why I will continue to test code multiple ways and might use one way over another depending on situation.
During another day on twitter, I started friendly discussion on about which cursor options are faster versus another. Since I love a little challenge, I decided to recreate test scenario, but to be "fair" to the cursor since I have a soft spot for them since my Oracle days, I changed code just slightly by "fixing" few things that could hinder clean fight.
First thing I did was to place both cursors into stored procedures so I can easily call it, and next I modified code to write test result to a table instead of trying to calculate them by hand after each run. At the bottom of the page is the code to setup test.
I have done "cold" testing to remove extra factors that can contribute to inconclusive results. I ran each of the following query 20 times.
Query 1
Query 2
Query 3
Now for the results just a simple aggregate query from the table that I created.
Test Result Query
Results
Call Type
MinDuration
MaxDuration
AvgDuration
NumberOfTries
Cursor_FASTForward
122
390
191
20
Cursor_KeySetTest
162
398
232
20
Set Based
77
275
98
20
Conclusion:
In my test which was done on my desktop with Windows 10, SQL Server 2016 Developer Edition, SET BASED approach won with average duration of 98 milliseconds. Second place goes to LOCAL FAST_FORWARD with 191 millisecond average time which is twice as slow as set based approach. Third place goes to LOCAL KEYSET cursor with average duration of 232 millisecond.
In my test scenario SET BASED still faster and will likely be faster for most scenarios, so far I have only encountered 1 scenario in my work experiences where cursor was actually faster and it had to do with TSQL based ETL loading very large tables with geospatial data. Cursors have their purpose and should be used with understanding that it can create performance issues if used incorrectly. Until I see some other information I will likely stick to using LOCAL FAST_FORWARD options which seem to provide fastest CURSOR times.
I don't think I have tested for every scenario possible, but it is better than only a single test. With that being said, I'm a little surprised at the result. I thought that User Defined Table Type would win in every scenario but it is clear that in some cases STRING_SPLIT() function is actually faster. Just like most other things in SQL Server the answer is: It Depends!
In the case of STRING_SPLIT() function Engineers at Microsoft did amazing job optimizing it over other custom solutions that people had to resort to in the past. Of course there are limitations with it, that need to be acknowledged and taken into consideration when designing a solution. In my opinion this could be quick go to solution that is great for initial testing but needs to be heavily tested when it comes to final solution for production environment.
Just like I mentioned in previous post, estimated row count always shows 50 rows and value returned from the function is a string and needs to be converted to target type. This did not prove to be large issue in my testing but in complex execution plans it could throw query optimizer into wrong plan when getting data from other tables which could cause major slow downs.
In my test scenarios outlined below, User Defined Table Type Won 4 out of 6 times. I'm sure I could have setup other scenarios where it would lose every time but in this case it won. Leave comments if you want to see full source code or want me to run any other scenarios. Thanks for reading!
During SQL Saturday Indianapolis I heard about new function introduced in SQL Server 2016 STRING_SPLIT(). Presenter showed few examples of how to use it and mentioned that it is faster than other implementations of what people had to previously done. This got me excited and I wanted to see how it compares to User Defined Table Types which has always been my go to replacement for string splitting.
This post is not about comparing new function to other function, Aaron Bertrand already did a great post comparing it to other solutions. IT can be found here -> Performance Surprises and Assumptions : STRING_SPLIT(). What I wanted to do is compare it to User Defined Table Types because it always been faster (at least this is what I think about it).
Now unto test scenario. First time I had to replace string splitting function with User Defined Table Typed Parameter was when I had to rewrite stored procedure that accepted a list of 5000 items inside varchar(max) parameter. With that in mind I decided to test new function the same way. Pass a huge string, split it inside stored procedure and return result set back to .NET application. For this test I created two stored procedures running against Wide World Importers Test Database.
Below is the script for user defined table type and table to store results on SQL Side. I done this in order to compare SQL duration to total duration on .NET side
Now onto .NET C# code. To be fair I wanted to pass entire value on .NET side instead of trying to build it just before calling stored procedure. Below is the code for calling TableType stored procedure, as you can see instead of having VarChar type, I have Structured type which accepts IEnumerable variable. In my case I'm passing DataTable.
Next comes a call to stored procedure with VARCHAR(MAX).
On front end I setup simple interface with two buttons and a way to aggregate data together. After running each stored procedure 50 times. I came up with the following result.
Conclusion: Round 1 goes to User Defined Table Type. One thing I noticed when looking at execution plans. I saw that string_split() function always estimates 50 rows, whereas when calling with user defined table type estimated row count matches actual. Additionally there is implicit conversion because string returned from split function and I'm joining it to integer column. Next I want to make few other adjustment to stored procedures and front end to try to call it with different number of rows to see how it changes the outcome.
I have another opportunity to share my T-SQL knowledge. This time, it will be in Birmingham, AL at dev-data DAY on August 20. This will be a free conference with SQL, .NET, Powershell/Azure and Professional Development sessions. I'm happy to present "Crash Course on Better SQL Development" session which includes elements of T-SQL coding, error handling, a look at execution plans and finally exploring some capabilities of SSDT (SQL Server Data Tools).
This year it will by my 5th time speaking at local events. Each new event brings a new perspective into local user group communities and different knowledge that technology professionals bring to the market. For me, speaking became a way to share and at the same polish my own knowledge. Depending on experiences and knowledge of each group, there are different questions which lead a session on a slightly different path. Presenting same content takes different shape with every new group of people.
For me, one of the biggest challenges that I had to overcome in order to become a speaker was to realize that it is perfectly fine to admit that I don't know an answer to every question. Whenever a difficult question is asked, I do my best to answer based on my knowledge at that time and take a time to research it after the fact to make myself prepared for it next time. I'm not an expert in any given field and will never claim to be one. I like to be Jack of all trades. Being diverse in my knowledge helps me to approach problems from a different perspective and think of solutions from different angles.
Today was the proof that I have a lot to learn about DBA tasks. I took on a somewhat simple task of renaming a database that turned into a 30-minute ordeal. Hopefully, this post will save you some time by avoiding issues that I ran into. Below I outlined one of the possible ways of getting this task done. Please note that this can result in outage and should not be done in production (live) environment, unless you are confident in what you are doing and it is done during a maintenance window.
Step 1
Before we can rename DB we need to identify Logical and Physical file names. I recommend copying result to notepad or somewhere else as it will be easier to paste it into later queries. To get names we can query system table in master DB. In my scenario, my Database Name is "VladTesting"
Logical Name
DB File Path
File Type
State
VladTesting
C:\SQL Server Data Files\MSSQL11\Data\VladTesting.mdf
ROWS
ONLINE
VladTesting_log
C:\SQL Server Data Files\MSSQL11\Log\VladTesting_log.ldf
LOG
ONLINE
Step 2
Now we are ready to start the process. Before logic names can be renamed we need to have exclusive rights to DB.
Step 3
Now when we execute the following statement it will rename logical file names.
Step 4
At this point we have several options, the following is one of the easiest. We need to DETACH database, rename files and than attach it with new name later.
Another option is to take DB Offline and rename files and later rename DB.
Step 5
Now to rename the files it can be done from SSMS with TSQL or directly on file system. NOTE: When I ran the following command it messed up file permissions, this should be used with caution. It is safer to ask System Admins or someone else with proper file system permissions to rename files directly on file system, then it is possible to shift the blame if something goes wrong (just kidding).
Most of the time XP_CMDSHELL is not enable as it is considered dangerous, to enable it, just run the following code.
Step 6
Now to actually rename the files, we simply provide old file name with path and new filename to the XP_CMDSHELL command.
In order to execute rename command we first need to enable XP_CMDSHELL if it is not already enabled.
Step 7
Now for the last step. We attach DB with new name and new physical file names that we just renamed.
Step 8 - Housekeeping
Since xp_cmdshell considered dangerous it is best to disable before we forget that it had to be enabled. First command will disable xp_cmdshell and second command will hide advanced options.
So now you know how to rename database. Usually this should only be done when there is specific business reason to do so. If command sp_renamedb or ALTER DATABASE Modify Name is used, it does not rename physical or logical files which can cause issues later when someone tries to create new database with same name as last one or someone does restore of backup without changing options.
This topic takes me back to my first SQL Server database class I had in college. Back then it was extremely difficult subject for me. Partly because I have worked very little with databases and secondary, because of the way it was presented. Taking all that into account let's try to attack it in most basic manner. SQL JOIN in its basic form is just a way to connect two tables together. There are several types of JOINs that can be defined: LEFT, RIGHT, FULL, INNER, CROSS. There are time and place for each one, but most of the time I can get away with using only LEFT and INNER.
Its is great to talk about JOINs, but I'm sure that there are other people who like me don't learn but hearing, they learn by seeing and trying it them self. To do that, we need to create two basic tables and populate it with some data.
First, let us create two tables to store data for our tests. Since the concept of student and teacher is most easily understood, I'm creating Teacher table and Student table with a key to link back to Teacher Table.
Now to load some data, I generated insert script with random names from the following website.
Now that we got some data loaded into tables let the JOINing commence. First will look at 'INNER' JOIN
INNER JOIN throws out all records that do not match. In our case Teacher "Sonnie Davin" and Student "Dion Hayden" do not appear in result because there was no way to connect those records.
TeacherFullName
StudentFullName
Roy Chad
Jepson Balfour
Roy Chad
Milburn Brett
Dudley Goddard
Clinton Schuylerr
Dudley Goddard
Norbert Kemp
Raphael Philander
Meriwether Kennedy
Raphael Philander
Braith Cornelius
LEFT OUTER JOIN, returns all records from main table and attempts to match records from secondary table.
As seen in this result list, Teacher "Sonnie Davin" appears in the list but does not have any student records associated. LEFT JOIN is most usefull when ever you not sure if you all records from main table are matched in secondary table. Almost every time I write query I use LEFT JOIN the first I'm JOINing tables together to know for sure that I'm not excluding any records that I wanted to include.
TeacherFullName
StudentFullName
Roy Chad
Jepson Balfour
Roy Chad
Milburn Brett
Dudley Goddard
Clinton Schuylerr
Dudley Goddard
Norbert Kemp
Raphael Philander
Meriwether Kennedy
Raphael Philander
Braith Cornelius
Sonnie Davin
NULL
RIGHT OUTER JOIN returns all records from RIGHT table and matches records from left table. The following two queries can be written as RIGHT or LEFT join with same results.
In this result set because we reversed the condition of LEFT JOIN above, we can see record from Student table "Dion Hayden" who is not matched with a teacher record.
TeacherFullName
StudentFullName
Roy Chad
Jepson Balfour
Roy Chad
Milburn Brett
Dudley Goddard
Clinton Schuylerr
Dudley Goddard
Norbert Kemp
Raphael Philander
Meriwether Kennedy
Raphael Philander
Braith Cornelius
NULL
Dion Hayden
FULL OUTER JOIN attempts to match records from both tables just like INNER JOIN and also returns all additional records that do not match from ether table.
In the this result set we have all records that we had from INNER JOIN, plus extra record from LEFT JOIN and one record from RIGHT JOIN. Teacher "Sonnie Davin" appears in result set without student and Student "Dion Hayden" also appears in result set without teach. This join is most usefull when need to get all records no matter if match exists or not.
TeacherFullName
StudentFullName
Roy Chad
Jepson Balfour
Roy Chad
Milburn Brett
Dudley Goddard
Clinton Schuylerr
Dudley Goddard
Norbert Kemp
Raphael Philander
Meriwether Kennedy
Raphael Philander
Braith Cornelius
Sonnie Davin
NULL
NULL
Dion Hayden
CROSS JOIN, returns all records from main table and matches them to every record from secondary table.
What we get back is the following result set that has every record from both table. Since no condition has be specified every record appears in result set. Number of records in result set will always be equal to Number of records in TableA multiple by number of record in TableB. In our case we had 4 Teachers and 7 Students therefore we got 28 records in result. But what if we had 4 million teachers and 70 million students. This JOIN has worst performance and will require most memory to complete the operation.
TeacherFullName
StudentFullName
Roy Chad
Jepson Balfour
Roy Chad
Milburn Brett
Roy Chad
Clinton Schuylerr
Roy Chad
Norbert Kemp
Roy Chad
Meriwether Kennedy
Roy Chad
Braith Cornelius
Roy Chad
Dion Hayden
Dudley Goddard
Jepson Balfour
Dudley Goddard
Milburn Brett
Dudley Goddard
Clinton Schuylerr
Dudley Goddard
Norbert Kemp
Dudley Goddard
Meriwether Kennedy
Dudley Goddard
Braith Cornelius
Dudley Goddard
Dion Hayden
Raphael Philander
Jepson Balfour
Raphael Philander
Milburn Brett
Raphael Philander
Clinton Schuylerr
Raphael Philander
Norbert Kemp
Raphael Philander
Meriwether Kennedy
Raphael Philander
Braith Cornelius
Raphael Philander
Dion Hayden
Sonnie Davin
Jepson Balfour
Sonnie Davin
Milburn Brett
Sonnie Davin
Clinton Schuylerr
Sonnie Davin
Norbert Kemp
Sonnie Davin
Meriwether Kennedy
Sonnie Davin
Braith Cornelius
Sonnie Davin
Dion Hayden
This concludes the first look at logical JOIN operations. Just like I mentioned in beginning of the post, just about every time I write query it ends up being LEFT or INNER JOIN. Do I ever had to use others? Yes, but with specific business cases.
Please leave a comment with questions and/or feedback on post.