Monday, June 17, 2019

SQL Server Query Store: Usability Soft Spots -- Part 3

Others in this series...

SQL Server Query Store: Usability Soft Spots -- Part 1
https://sql-sasquatch.blogspot.com/2019/06/sql-server-query-store-usability-soft.html

 SQL Server Query Store: Usability Soft Spots -- Part 2
https://sql-sasquatch.blogspot.com/2019/06/sql-server-query-store-usability-soft_16.html

Summary of this post: comb through the surprise plans, finding an example that looks like the description of "morally equivalent plans"... and one that kinda doesn't.

~~~~~

A reminder of where this set of tests is currently.
In Part 2, the repro code was tweaked a bit, in order to no longer use a tempdb table in a CTAS statement ("SELECT... INTO", Create Table As SELECT) and to TRUNCATE results table between iterations rather than DROP. After 1600 iterations of the test query with some filler queries also executed with each iteration, 19 plan_id values were displayed in Query Store "Tracked Queries".

Attempting to force each one of those plan_id values from Query Store resulted in 2 successful forces and 17 uses of surprise plans!

A grid of all 34 plan_id values is below.  Two interesting things are highlighted.  In the blue box a single row is highlighted, where plan_id 195 produced plan_id 292 when forced.  That's unusual because it's the only example in the list where one of the original 17 plans produced another one of the 17 originals as the "Surprise Plan". The other box highlights two consecutive rows.  Plan_id values 814 and 899 both produced "Surprise Plan" plan_id 3250.



OK, let's see if any of those "surprise plans" resulted in a last_force_failure_reason_desc.


SELECT plan_id, query_id, engine_version, compatibility_level,  
       is_forced_plan, last_force_failure_reason_desc
FROM sys.query_store_plan 
WHERE query_id = 2
AND plan_id < 3220 
AND plan_id NOT IN (292, 501)
ORDER BY plan_id ASC;


Nope, not a single one is listed as a "force failure", even though the plan forced at the time was not used.

Let's compare plan_id 195 and 292.

That's pretty much what we'd expect from a "morally equivalent plan".  Both plans have the same shape.  All joins and plan operators are in the same location of the two graphical plans.




After seeing that plan_id 195 and plan_id 292 have all the same operators in all the same places, why did new estimates appear for several operators, resulting in plan_id 292 being used when plan_id 195 was forced?

Was it due to a stats update?
Because the new CE is used, optimizer stats info is included in plan XML.  Here's OptimizerStatsUsage for plan_id 195.

<OptimizerStatsUsage>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.22" ModificationCount="193" SamplingPercent="100" Statistics="[plan_persist_plan_cidx]" Table="[plan_persist_plan]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="189" SamplingPercent="100" Statistics="[plan_persist_query_text_cidx]" Table="[plan_persist_query_text]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="189" SamplingPercent="100" Statistics="[_WA_Sys_00000002_09DE7BCC]" Table="[plan_persist_query_text]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="193" SamplingPercent="100" Statistics="[plan_persist_plan_idx1]" Table="[plan_persist_plan]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="189" SamplingPercent="100" Statistics="[plan_persist_query_idx1]" Table="[plan_persist_query]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.22" ModificationCount="189" SamplingPercent="100" Statistics="[plan_persist_query_cidx]" Table="[plan_persist_query]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
</OptimizerStatsUsage>


And here's OptimizerStatsUsage for plan 292.

<OptimizerStatsUsage>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.22" ModificationCount="290" SamplingPercent="100" Statistics="[plan_persist_plan_cidx]" Table="[plan_persist_plan]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="285" SamplingPercent="100" Statistics="[plan_persist_query_text_cidx]" Table="[plan_persist_query_text]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="285" SamplingPercent="100" Statistics="[_WA_Sys_00000002_09DE7BCC]" Table="[plan_persist_query_text]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="290" SamplingPercent="100" Statistics="[plan_persist_plan_idx1]" Table="[plan_persist_plan]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="285" SamplingPercent="100" Statistics="[plan_persist_query_idx1]" Table="[plan_persist_query]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.22" ModificationCount="285" SamplingPercent="100" Statistics="[plan_persist_query_cidx]" Table="[plan_persist_query]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
</OptimizerStatsUsage>


Notice that LastUpdate for each of the six optimizer stats is the same in plan_id 195 and plan_id 292.
Its not a stats update that results in different estimates.  ModificationCount shows a difference, though, with each of the six ModificationCount values higher in plan_id 292 than plan_id 195.

Here's a relevant passage from a blog post by Itzik Ben Gan.


Seek and You Shall Scan Part II: Ascending Keys
https://www.itprotoday.com/sql-server/seek-and-you-shall-scan-part-ii-ascending-keys
  • "The new cardinality estimator identifies that the query filter exceeds the maximum value in the histogram. It has access to the column modification counter (call it in short modctr), so it knows how many changes took place since the last refresh. When the column is unique and of an integer type or numeric with a scale of zero, SQL Server assumes it’s ascending. So it interpolates the estimate based on the distribution of the values within the existing histogram and the number of changes that took place since the last refresh."

*This* definitely seems like the description I've heard of a "morally equivalent plan".  As estimates for various operators changed along with modctr, eventually the previous plan no longer "fit" even though the newer plan still had the same plan shape, with the same plan operators in the same positions.

Let's compare plan_id 814 to plan_id 3250.


All operator in the same spots in the two plans, the difference between them being the estimates.  Its the same when comparing plan_id 899 to plan_id 3250.

But when I look at the planXML for plan_id 3250, I see something extraordinary.  The QueryHash value is the same as the QueryPlanHash value.


QueryHash="0xE91EDF2B4F8EA8B4" QueryPlanHash="0xE91EDF2B4F8EA8B4"

Well.  Let's see how widespread that phenomenon is. First we';ll check the original 19 plan_id values.


;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
planXML AS
(SELECT plan_id, planXML = CONVERT(XML, query_plan)
 FROM sys.query_store_plan WHERE query_id = 2 AND plan_id < 3220)
SELECT plan_id, xml_query_hash = RelOp.pln.value(N'@QueryHash', N'varchar(50)'), 
       xml_plan_hash = RelOp.pln.value(N'@QueryPlanHash', N'varchar(50)'),
       TerminationReason = RelOp.pln.value(N'@StatementOptmEarlyAbortReason', N'varchar(50)')
FROM planXML
CROSS APPLY planXML.planXML.nodes(N'//StmtSimple') RelOp (pln);

Nope, none of them have QueryHash = QueryPlanHash in the plan XML.


What about the 15 surprise plans that only showed up once we started forcing other plans?

;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
planXML AS
(SELECT plan_id, planXML = CONVERT(XML, query_plan)
 FROM sys.query_store_plan WHERE query_id = 2 AND plan_id > 3219)
SELECT plan_id, xml_query_hash = RelOp.pln.value(N'@QueryHash', N'varchar(50)'), 
       xml_plan_hash = RelOp.pln.value(N'@QueryPlanHash', N'varchar(50)'),
       TerminationReason = RelOp.pln.value(N'@StatementOptmEarlyAbortReason', N'varchar(50)')
FROM planXML
CROSS APPLY planXML.planXML.nodes(N'//StmtSimple') RelOp (pln);

Wow.  15 "surprise plans" first showed up after other plans were forced for query_id 2.  Every single one of them has QueryHash equal to QueryPlanHash in the plan XML.


That is... hm.  So even if the forced query plan doesn't fit the estimates, or there's some other "near-miss" reason for avoiding the forced query plan... the QueryHash value in the plan XML should always be the same as [sys].[query_store_query].[query_hash].  Its not in this case.  And strangely QueryHash = QueryPlanHash.

(Microsoft hasn't yet confirmed this kind of QueryHash behavior is a bug... but I honestly can't imagine a design that anticipates or intends this behavior.)


Q

No comments:

Post a Comment