Thursday, October 15, 2015

Performance Pitfall - Act III: the Shyamalan twist!!

***** Update *****
#CarefulWhatYouAskFor :-)
Paul White (@SQL_kiwi) was willing to take a look at this, and gave it a very thorough treatment in the blog post below.  Thanks, Paul!

Hash Joins on Nullable Columns
The last few days I've provided some details of my recent testing of SQL Server joins on NULLable columns.

Time for a surprise twist!  You'll remember that I am creating a table X_1 that looks like this:


The primary key in the first column is an increasing integer, step 1.  I'm using 2012 compatibility mode today, so rather than the 600,000 rows I used with 2014 compatibility mode I have just 72048 rows.

Still going to use 32,000  rows in my X_2 table.  But here's my gimmick for today.  In the 32,000 rows of table X_2, I'm going to vary the number of NULL values.  No big trick to doing that - just use a CASE statement with a MOD operator to assign value of 1 (or NULL) when MOD N is 0, ELSE NULL (or 1, as the case may be to get the desired count of NULL values).

In my first post on this investigation you'll see the basic code I'm using to populate these tables.

Recall that the almost-innocent-looking query I am testing is this:

FROM       X_1 AS t1
INNER JOIN X_2 AS t2 ON t1.NOT_PK_X_1 = t2.NOT_PK_X_2
INNER JOIN X_2 AS t3 ON t2.PK_X_2     = t3.PK_X_2

So - what happens to execution time of the query as I change the number and distribution of NULL values throughout the X_2 table?

I'm glad you asked.

As the number of NULL values in the second column increased (while maintaining a total row count of 32,000), query CPU_ms dropped in a fairly linear fashion.  Makes sense - as the number of NULL values increase, the join has less work to do - because NULL values can't join unless you do something tricky.

Ready for the Shyamalan twist?

Maybe it looks like I went all the way to 32,000 rows in the graph above.  I didn't.  Let me dramatically zoom out to catch what happened at 32,000 NULL values and ZERO non-NULL values.


Wow.  OK.  Maybe that's not "Sixth Sense" level twist.  How about "Lady in the Lake"? :-)

So... SQL Server isn't treating the query the same without the implicit NULL exclusion that accompanies the join on the NULLable columns.  But it doesn't REALLY cause a problem unless the column has only NULL values.  And then, only after a certain number of rows in the table.  And I haven't been able to tease out the problem unless I've got at least two joins.

No comments:

Post a Comment