Wednesday, November 30, 2016

Three Minutes to compile a very simple #SQLServer Query plan?!?

***Update 2017 Jan 2***
Here's the deal. If a quickstats query is issued to amend the histogram for the first ordinal key of a filtered index... it doesn't include the filter in the query. Without the filter in the query, the filtered index can't satisfy the query, even if it was the reason for the query. Unless there's another nonfiltered index with the same first ordinal key to satisfy the quickstats query, a full table scan will be needed to get the results of the quickstats query.

For a 600 million row table that can take a while. Here's a little salt for the wound. Multiple tasks in query compilation may load the filtered index stats - for example if the first key of the index is in an "col=X or col=Y or col=Z" situation. Each task that loads the filtered index will result in another quickstats query - the amended histogram isn't retained from one task to the next while the plan is compiling. Even if there is no change to the underlying data in the interim. Takes 3 minutes to complete a quickstats query due to a full table scan? And there's 6 of them? Just added 18 minutes to the query compile time. Yikes.

And the next query against the same data and statistics runs into the same behavior.

Check out Michael J Swart's connect item, and the repro details behind the "Details" link in the connect item. Notice in his results... the quickstats query for the filtered index doesn't have the expected filter.

Trace flag 2390 can cause large compile time when dealing with filtered indexes. (Active)
https://connect.microsoft.com/SQLServer/feedback/details/2528743/

In this blog post, I detail my theory that SQL Server 2014 SP1 CU9/SP2 CU2 kb3189645 significantly changes behavior in these cases by exempting filtered indexes from quickstats queries.

SQL Server Trace flags 2389/2390/4139 ~ Quickstats Queries for Filtered Indexes?
http://sql-sasquatch.blogspot.com/2016/12/sql-server-trace-flags-238923904139.html


***Update 2016 Dec 9***
Trace flag 4139 is part of the stew, as is the legacy cardinality estimater.
Forcing new cardinality estimater with QUERYTRACEON 2312 allowed fast compile.
So did global disable of trace flag 4139 - even though trace flags 2389 and 2390 remained enabled.
Brute force method - manual update of ALL stats on the table (which was unchanging during these tests) also allowed fast compile.

So... looks like quickstats updates of these filtered, include indexes due to trace flag 4139 is part of the problem.  Maybe the quickstats updates are failing?
Not sure yet.

Additional work on this issue here:
http://sql-sasquatch.blogspot.com/2016/12/retrieve-recent-sql-server-stats-update.html
And here:
http://sql-sasquatch.blogspot.com/2016/12/sql-server-trace-flag-2388-shows.html
*****

2016 Nov 30

If this issue isn't reproducible again tomorrow I'll be upset.  Hoping to finally get to the bottom of this.  Let me know if you have ideas to investigate.  My plan is to use trace flag 9292 and 9204 to find out which stats the optimizer is interested in,  Then create a new, empty copy of table Fact, transplanting the stats to the empty table.  And hopefully compiling the query against the empty table will take just as long.  If I can reproduce the problem in a completely clean context I'll have the most options for collaborative investigation.

A very simple validation query is being issued (below) - a filtered include index exists that satisfies all conditions with the first row returned.

And in fact execution time itself is very, very fast.  But... compile time is ridiculous.  Over 3 minutes.  Output from trace flag 8675 is below.  Time to compile was almost identical with and without trace flag 8757 added to the mix.

This system is SQL Server 2012 SP3. Because its a production system and production data, no windbg.  But if the issue is present in a query against an empty table with transferred stats, next step will be trying to port the empty table, stats, and problem to a dev instance :-)




SELECT  TOP 1 Id_1 
FROM    Fact 
WHERE   Category_1 = '55' 
AND     Id_1 IS NOT NULL 
AND     Bit = 1 
AND     String_1 IS NOT NULL
OPTION (QUERYTRACEON 3604, --redirect trace flag output
        QUERYTRACEON 8675, --optimizer stages & timing
        QUERYTRACEON 8757, --no early quit with trivial plan
        RECOMPILE)

End of simplification, time: 0 net: 0 total: 0 net: 0

end exploration, tasks: 36 no total cost time: 198.649 net: 195.236 total: 198.65 net: 195.237

end search(1),  cost: 0.0032832 tasks: 74 time: 0 net: 0 total: 198.65 net: 195.237

End of post optimization rewrite, time: 0 net: 0 total: 198.651 net: 195.237

End of query plan compilation, time: 0 net: 0 total: 198.651 net: 195.237

****

The elapsed time for the session was almost completely CPU time - not dominated by waits at all.  There was some other activity on the server at the time of observation - but not much.  Here are the spinlock deltas from measurements before and after the query was executed.

name collisions spins sleep_time backoffs
SOS_BLOCKDESCRIPTORBUCKET 365291 893170987 827998 223884
SOS_OBJECT_STORE 117525 425010979 126664 90510
SOS_CACHESTORE_CLOCK 787 138375000 2319 584

That doesn't tell me much... since just about the only web search hits for SOS_BLOCKDESCRIPTORBUCKET are my own tweets wondering what they might mean :-)


An idea occurred to me - maybe this resulted from an overly full plan cache hash bucket?

SELECT plans_in_bucket,count(*) as number_of_buckets
FROM
(
SELECT bucketid,count(*) as plans_in_bucket
FROM sys.dm_exec_cached_plans
GROUP BY bucketid
) inr
GROUP BY plans_in_bucket

ORDER BY plans_in_bucket DESC

plans_in_bucket
number_of_buckets
46
1
39
1
36
1
34
1
30
1
24
2
23
1
22
1
20
2
19
4
17
2
16
4
15
7
14
8
13
15
12
16
11
34
10
128
9
264
8
708
7
1499
6
3115
5
5158
4
7484
3
8631
2
7398
1
4242

Not likely with the high count bucket at 46.  Total cached plans: 140630.

Tuesday, November 1, 2016

A little bit of play when measuring #SQLServer Resource Governor resource pool reads vs LogicalDisk reads


I use perfmon a lot :-)
So I'm always looking for new patterns.

Today I just happened to graph these two perfmon metrics against each other.  Perfmon was collected with a 30 second interval here.

Its not too surprising that there is a difference.  But how could there be *more* read IOs per second for the default workload group than the total on the server? (See just before 4:00 for a great example.)


Maybe if I sum the two metrics over the 4 hour period they'll even out?

23446639.19   << sum of LogicalDisk reads
22787670        << sum of default pool reads

Huh.  LogicalDisk reads are 2.89% higher than default pool reads.  Not too hard to believe - but for the small stretches where the default pool reads are significantly higher.

Oh well.

Works well for trending.  Not so well if trying to accurately count individual reads.