Friday, August 25, 2017

#SQLServer 2016 CCI COUNT(*) Aggregate Pushdown Tipping Point - Part I

I've been working with SQL Server 2016 SP1 CCIs more lately... anything I figure out before Niko Neugebauer or Joe Obbish have given them full treatment in their blogs I'm gonna try to get in here right quick :-) Niko's and Joe's blogs are both on my 'must read list' by the way.
Niko's blog can be found here:
http://www.nikoport.com/

And Joe's blog can be found here:
https://orderbyselectnull.com/

Today I was tinkering with some CCIs... doing something that I often do: speculating about a system I haven't been able to access directly, trying to come up with theories to explain unexpected behavior.  Along the way I found something interesting I hadn't heard anyone mention: a tipping point for count(*) aggregate pushdown against a CCI.

*****Update 26 August 2017*****
Niko checked out my plans and tried to reproduce on both SQL Server 2017 and SQL Server 2016... no dice.  That reminds me - I should have included this for system context :-)




Without the same system context, results may differ.  I'll update the blog post again once I determine if trace flags, compatibility level, optimizer hot fixes, CE version are necessary components for this behavior. (Turns out the main consideration for tipping point of Bigint CCI column is MAXDOP.  CE plays a minor role - new CE gives integral row estimates for BIGINT columns.)

*****

Here's a quick repro if you want to follow along.


SET NOCOUNT ON;
GO
CREATE FUNCTION dbo.GetNums(@low AS BIGINT, @high AS BIGINT) RETURNS TABLE
AS
RETURN
  WITH
    L0   AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)),
    L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4   AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5   AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
    Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
             FROM L5)
  SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n
  FROM Nums
  ORDER BY rownum;
----------------------------------------------------------------------
-- © Itzik Ben-Gan
----------------------------------------------------------------------
/* http://tsql.solidq.com/SourceCodes/GetNums.txt */
GO

CREATE TABLE dbo.Tipper 
([Key] bigint NOT NULL,
 INDEX CCI_Tipper CLUSTERED COLUMNSTORE);

set statistics time on;
INSERT INTO dbo.Tipper
SELECT TOP (52428800) nums.n
FROM dbo.getNums(1, 52428800) nums;

create statistics Tipper_key on Tipper([key]) with fullscan;

/* predicate pushdown!! */
select count(*) from 
(SELECT [key] FROM Tipper where [key] > 0 and [key] < 27213.9) subQ

/* NO predicate pushdown!! */
select count(*) from 
(SELECT [key] FROM Tipper where [key] > 0 and [key] < 27213.8) subQ

Let me walk through it quickly.

I use Itzik Ben-Gan's getnums function a lot, so I'm more than happy to keep citing it :-)

So create the getnums function and the Tipper CCI in my database.


OK, lets use the getnums function to add 50 fully populated (1024 * 1024 rows) to the CCI, with incrementing values in the [key] column.


Now lets create fullscan stats on the [key] column.  Fullscan stats and values starting at 1, incrementing by 1 give a little more predictability to row estimates in the plans.


Below is what we want post-execution plans to look like when counting rows in a range - the thin arrow coming out of the Columnstore scan is a hint that predicate pushdown was successful.  I didn't specify MAXDOP in a query hint, and Resource Governor isn't supplying MAXDOP; MAXDOP 8 is coming from the Database Scoped Configuration.  The degree of parallelism turns out to be a significant factor in determining the tipping point.  The [key] column is a BigInt.  Maybe its surprising that I'm using 27213.9 as the upper bound.  But... check out the estimated number of rows :-) Again - this estimate is coming from the Legacy CE, specified in the database scoped configuration for my database.



Look what happens if the upper limit is 27213.8 at MAXDOP 8.  Row estimate of 27213.8 and... no more predicate pushdown!



Above I mentioned that DOP is an important part of determining the tipping point, with the cardinality estimator version playing a minor role.  Here are the tipping point values for legacy and new CE, at DOP 1 to DOP 8.


The tipping point values are row estimates, remember.  That's why the CE only plays a minor role here - since it's responsible for the estimates.  The estimates of the new CE are slightly different than the legacy CE - and for this case where the filter is against a BigInt column it looks like the new CE is giving integral row estimates, while the legacy CE estimates are not necessarily integral.

Anyway... a graph of the Legacy CE tipping point values against MAXDOP makes the nearly linear relationship clear.