Monday, June 17, 2019

SQL Server Query Store: Usability Soft Spots -- Part 3

Others in this series...

SQL Server Query Store: Usability Soft Spots -- Part 1
https://sql-sasquatch.blogspot.com/2019/06/sql-server-query-store-usability-soft.html

 SQL Server Query Store: Usability Soft Spots -- Part 2
https://sql-sasquatch.blogspot.com/2019/06/sql-server-query-store-usability-soft_16.html

Summary of this post: comb through the surprise plans, finding an example that looks like the description of "morally equivalent plans"... and one that kinda doesn't.

~~~~~

A reminder of where this set of tests is currently.
In Part 2, the repro code was tweaked a bit, in order to no longer use a tempdb table in a CTAS statement ("SELECT... INTO", Create Table As SELECT) and to TRUNCATE results table between iterations rather than DROP. After 1600 iterations of the test query with some filler queries also executed with each iteration, 19 plan_id values were displayed in Query Store "Tracked Queries".

Attempting to force each one of those plan_id values from Query Store resulted in 2 successful forces and 17 uses of surprise plans!

A grid of all 34 plan_id values is below.  Two interesting things are highlighted.  In the blue box a single row is highlighted, where plan_id 195 produced plan_id 292 when forced.  That's unusual because it's the only example in the list where one of the original 17 plans produced another one of the 17 originals as the "Surprise Plan". The other box highlights two consecutive rows.  Plan_id values 814 and 899 both produced "Surprise Plan" plan_id 3250.



OK, let's see if any of those "surprise plans" resulted in a last_force_failure_reason_desc.


SELECT plan_id, query_id, engine_version, compatibility_level,  
       is_forced_plan, last_force_failure_reason_desc
FROM sys.query_store_plan 
WHERE query_id = 2
AND plan_id < 3220 
AND plan_id NOT IN (292, 501)
ORDER BY plan_id ASC;


Nope, not a single one is listed as a "force failure", even though the plan forced at the time was not used.

Let's compare plan_id 195 and 292.

That's pretty much what we'd expect from a "morally equivalent plan".  Both plans have the same shape.  All joins and plan operators are in the same location of the two graphical plans.




After seeing that plan_id 195 and plan_id 292 have all the same operators in all the same places, why did new estimates appear for several operators, resulting in plan_id 292 being used when plan_id 195 was forced?

Was it due to a stats update?
Because the new CE is used, optimizer stats info is included in plan XML.  Here's OptimizerStatsUsage for plan_id 195.

<OptimizerStatsUsage>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.22" ModificationCount="193" SamplingPercent="100" Statistics="[plan_persist_plan_cidx]" Table="[plan_persist_plan]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="189" SamplingPercent="100" Statistics="[plan_persist_query_text_cidx]" Table="[plan_persist_query_text]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="189" SamplingPercent="100" Statistics="[_WA_Sys_00000002_09DE7BCC]" Table="[plan_persist_query_text]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="193" SamplingPercent="100" Statistics="[plan_persist_plan_idx1]" Table="[plan_persist_plan]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="189" SamplingPercent="100" Statistics="[plan_persist_query_idx1]" Table="[plan_persist_query]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.22" ModificationCount="189" SamplingPercent="100" Statistics="[plan_persist_query_cidx]" Table="[plan_persist_query]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
</OptimizerStatsUsage>


And here's OptimizerStatsUsage for plan 292.

<OptimizerStatsUsage>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.22" ModificationCount="290" SamplingPercent="100" Statistics="[plan_persist_plan_cidx]" Table="[plan_persist_plan]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="285" SamplingPercent="100" Statistics="[plan_persist_query_text_cidx]" Table="[plan_persist_query_text]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="285" SamplingPercent="100" Statistics="[_WA_Sys_00000002_09DE7BCC]" Table="[plan_persist_query_text]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="290" SamplingPercent="100" Statistics="[plan_persist_plan_idx1]" Table="[plan_persist_plan]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.23" ModificationCount="285" SamplingPercent="100" Statistics="[plan_persist_query_idx1]" Table="[plan_persist_query]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
  <StatisticsInfo LastUpdate="2019-06-16T17:14:24.22" ModificationCount="285" SamplingPercent="100" Statistics="[plan_persist_query_cidx]" Table="[plan_persist_query]" Schema="[sys]" Database="[surprise_plan]">
  </StatisticsInfo>
</OptimizerStatsUsage>


Notice that LastUpdate for each of the six optimizer stats is the same in plan_id 195 and plan_id 292.
Its not a stats update that results in different estimates.  ModificationCount shows a difference, though, with each of the six ModificationCount values higher in plan_id 292 than plan_id 195.

Here's a relevant passage from a blog post by Itzik Ben Gan.


Seek and You Shall Scan Part II: Ascending Keys
https://www.itprotoday.com/sql-server/seek-and-you-shall-scan-part-ii-ascending-keys
  • "The new cardinality estimator identifies that the query filter exceeds the maximum value in the histogram. It has access to the column modification counter (call it in short modctr), so it knows how many changes took place since the last refresh. When the column is unique and of an integer type or numeric with a scale of zero, SQL Server assumes it’s ascending. So it interpolates the estimate based on the distribution of the values within the existing histogram and the number of changes that took place since the last refresh."

*This* definitely seems like the description I've heard of a "morally equivalent plan".  As estimates for various operators changed along with modctr, eventually the previous plan no longer "fit" even though the newer plan still had the same plan shape, with the same plan operators in the same positions.

Let's compare plan_id 814 to plan_id 3250.


All operator in the same spots in the two plans, the difference between them being the estimates.  Its the same when comparing plan_id 899 to plan_id 3250.

But when I look at the planXML for plan_id 3250, I see something extraordinary.  The QueryHash value is the same as the QueryPlanHash value.


QueryHash="0xE91EDF2B4F8EA8B4" QueryPlanHash="0xE91EDF2B4F8EA8B4"

Well.  Let's see how widespread that phenomenon is. First we';ll check the original 19 plan_id values.


;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
planXML AS
(SELECT plan_id, planXML = CONVERT(XML, query_plan)
 FROM sys.query_store_plan WHERE query_id = 2 AND plan_id < 3220)
SELECT plan_id, xml_query_hash = RelOp.pln.value(N'@QueryHash', N'varchar(50)'), 
       xml_plan_hash = RelOp.pln.value(N'@QueryPlanHash', N'varchar(50)'),
       TerminationReason = RelOp.pln.value(N'@StatementOptmEarlyAbortReason', N'varchar(50)')
FROM planXML
CROSS APPLY planXML.planXML.nodes(N'//StmtSimple') RelOp (pln);

Nope, none of them have QueryHash = QueryPlanHash in the plan XML.


What about the 15 surprise plans that only showed up once we started forcing other plans?

;WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
planXML AS
(SELECT plan_id, planXML = CONVERT(XML, query_plan)
 FROM sys.query_store_plan WHERE query_id = 2 AND plan_id > 3219)
SELECT plan_id, xml_query_hash = RelOp.pln.value(N'@QueryHash', N'varchar(50)'), 
       xml_plan_hash = RelOp.pln.value(N'@QueryPlanHash', N'varchar(50)'),
       TerminationReason = RelOp.pln.value(N'@StatementOptmEarlyAbortReason', N'varchar(50)')
FROM planXML
CROSS APPLY planXML.planXML.nodes(N'//StmtSimple') RelOp (pln);

Wow.  15 "surprise plans" first showed up after other plans were forced for query_id 2.  Every single one of them has QueryHash equal to QueryPlanHash in the plan XML.


That is... hm.  So even if the forced query plan doesn't fit the estimates, or there's some other "near-miss" reason for avoiding the forced query plan... the QueryHash value in the plan XML should always be the same as [sys].[query_store_query].[query_hash].  Its not in this case.  And strangely QueryHash = QueryPlanHash.

(Microsoft hasn't yet confirmed this kind of QueryHash behavior is a bug... but I honestly can't imagine a design that anticipates or intends this behavior.)


Q

Sunday, June 16, 2019

SQL Server Query Store: Usability Soft Spots -- Part 2

Part 2 Summary: the stored procedure was modified to no longer use "SELECT... INTO" with a table from tempdb, and to truncate the results table with each iteration rather than drop the table.  Of the initial 19 plan_id values for the test query, only 2 would successfully force from Query Store.  After trying to force all of the plans (including the surprise plans which first showed up and were used while another plan was forced), from the 34 total plan_id values, the tally stood at 17 plans which would successfully force, and 17 plans which would not.

Others in the series:
SQL Server Query Store: Usability Soft Spots -- Part 1
https://sql-sasquatch.blogspot.com/2019/06/sql-server-query-store-usability-soft.html

~~~~~

Yesterday I started a blog post series.

SQL Server Query Store: Usability Soft Spots -- Part 1 
https://sql-sasquatch.blogspot.com/2019/06/sql-server-query-store-usability-soft.html

One of my goals for problem repros is to eliminate as many extraneous vulnerable points as possible.  For example, when reproducing a problem with Query Store plan forcing, I want to remove as many known or suspected plan forcing issues as possible, other than the issue the repro is meant to draw attention to.

Erin Stellato pointed out that temp tables are associated with some atypical plan forcing behavior.  Dropping of tables, indexes, or statistics can also result in atypical plan forcing behavior.

For these reasons I've reworked the repro which I'll be building on in these blog posts.

In my initial blog post, the test query used "SELECT... INTO" with a temp table.

The new version of the repro contains a CREATE TABLE statement before the test query, and no longer uses a temp table.

I'm still creating the test database the same way.


USE [master]
GO
CREATE DATABASE [surprise_plan]
 ON  PRIMARY 
( NAME = N'surprise_plan', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.CTP_3_0__2019\MSSQL\DATA\surprise_plan.mdf' , SIZE = 139264KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB )
 LOG ON 
( NAME = N'surprise_plan_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.CTP_3_0__2019\MSSQL\DATA\surprise_plan_log.ldf' , SIZE = 270336KB , MAXSIZE = 2048GB , FILEGROWTH = 65536KB )
 WITH CATALOG_COLLATION = DATABASE_DEFAULT
GO
ALTER DATABASE [surprise_plan] SET  MULTI_USER 
GO
ALTER DATABASE [surprise_plan] SET QUERY_STORE = ON
GO
ALTER DATABASE [surprise_plan] SET QUERY_STORE (OPERATION_MODE = READ_WRITE, CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30), DATA_FLUSH_INTERVAL_SECONDS = 60, INTERVAL_LENGTH_MINUTES = 1, MAX_STORAGE_SIZE_MB = 1000, QUERY_CAPTURE_MODE = ALL, SIZE_BASED_CLEANUP_MODE = AUTO)
GO
ALTER DATABASE [surprise_plan] SET  READ_WRITE 
GO

The stored procedure follows the same type of strategy.  Note there are now two parameters. @testQuery_reps indicates how many instances of the test query will be executed.  In addition to the test query itself, the stored procedure executes some additional queries to increase the rate of change in the Query Store structures.  @filler_reps indicates how times the 'filler query' section of code will execute.

USE [surprise_plan]
GO
CREATE OR ALTER PROCEDURE [dbo].[surprise_plan2test] @testQuery_reps INT, @filler_reps INT = 2
AS
SET NOCOUNT ON
DROP TABLE IF EXISTS sq__surprise_plan_tests;
CREATE TABLE sq__surprise_plan_tests(
     query_id BIGINT,
     query_hash BINARY(8),
     plan_id BIGINT,
     query_plan_hash BINARY(8),
     runtime_stats_id BIGINT,
     count_executions BIGINT);
 
DECLARE @ij INT = 0, @query_count INT, @sqlcmd2 NVARCHAR(MAX), @sqlcmd NVARCHAR(MAX) =
'INSERT INTO sq__surprise_plan_tests
 SELECT /*surprise2test*/ qsq.query_id, qsq.query_hash, qsp.plan_id,
        qsp.query_plan_hash, qsrs.runtime_stats_id, qsrs.count_executions
 FROM sys.query_store_runtime_stats qsrs
 JOIN sys.query_store_plan qsp ON qsrs.plan_id = qsp.plan_id
 JOIN sys.query_store_query qsq ON qsp.query_id = qsq.query_id
 JOIN sys.query_store_query_text qsqt ON qsq.query_text_id = qsqt.query_text_id
 WHERE qsqt.query_sql_text LIKE ''%surprise2test%''
 ORDER BY qsrs.runtime_stats_id DESC;' 
 
WHILE @testQuery_reps > 0
BEGIN
     TRUNCATE TABLE sq__surprise_plan_tests;
     DBCC FREEPROCCACHE WITH no_infomsgs;
     EXEC (@sqlcmd);
     SET @ij = 0;
     WHILE @ij < @filler_reps
     BEGIN
          SELECT TOP 1 
@sqlcmd2 = 'SELECT TOP (1) plan_id INTO #tmp 
                             FROM SYS.QUERY_STORE_RUNTIME_STATS 
                             WHERE runtime_stats_interval_id = '
                             + CONVERT(NVARCHAR(10), ((query_id - 1) * @testQuery_reps) + @ij) + ';
                             DROP TABLE IF EXISTS #tmp;'
          FROM sys.query_store_query
          WHERE @testQuery_reps = @testQuery_reps AND @ij = @ij
          ORDER BY query_id DESC;
 
EXEC (@sqlcmd2);
          SET @ij = @ij + 1;
     END 
     SELECT @testQuery_reps = @testQuery_reps - 1;
END
  
SELECT * FROM sq__surprise_plan_tests
ORDER BY runtime_stats_id DESC;

Ok.  Let's execute the test query 1600 times, with each iteration of the test query submission accompanied by two iterations of the filler query section of code.


EXEC [dbo].[surprise_plan2test] @testQuery_reps = 1600, @filler_reps = 2;

The result... 19 unique plan_id values for the query from 1600 executions.



So let's take a look in the "Tracked Queries" Query Store activity.


Interesting that the order of plan_id values in the right-hand legend is immediately sensible.  Let's see what happens when I force plan_id 2...



I execute the test query again twice, with 1 time 'round for the filler queries. I could execute the test query a single time but I like it when the new plans show up in the query results the test query returns... due to timing issues that is reliable when the test query is executed twice but not when the test query is executed a single time only.


EXEC [dbo].[surprise_plan2test] @testQuery_reps = 2, @filler_reps = 1;

Well look at that! At the top of the results list is plan 3220 now.


Let's refresh the "Tracked Queries" activity.

Ohhhh.  I forced plan_id 2 (in the purple box below) but what showed up was plan_id 3220 (in the yellow box below).


Ok, that's weird.  Did a last_force_failure_reason_desc show up for plan_id 2?


SELECT plan_id, query_id, engine_version, compatibility_level,  
       is_forced_plan, last_force_failure_reason_desc
FROM sys.query_store_plan WHERE plan_id = 2;



Hm.  So I forced plan_id 2, but plan_id 3220 showed up instead.  And there is no last_force_failure_reason_desc for plan_id 2.

Kendra Little uses the term “morally equivalent plan” in some excellent blog posts, mentioning that the term likely came from Conor Cunningham via Erland Sommarskog.
I've listed those blog posts below, as well as a related Azure Feedback suggestion.

Forced Plan Confusion: Is_Forced vs Use Plan = True

https://sqlworkbooks.com/2018/03/forced-plan-confusion-is_forced-vs-use-plan-true/

What is a “Morally Equivalent Execution Plan,” and Why Is It Good?
https://sqlworkbooks.com/2018/03/what-is-a-morally-equivalent-execution-plan-and-why-is-it-good/

Indicate “morally equivalent” forced plan where use_plan = true but is_forced = 0
https://feedback.azure.com/forums/908035-sql-server/suggestions/33562136-indicate-morally-equivalent-forced-plan-where-us

So... something funky happened when I tried to force plan_id 2.  What if I try forcing the other plans for query_id 2?

Here's a grid of the plan_id values selected for execution after forcing each originally available plan_id for query_id 2.




Of 19 query plans available from the first 1600 executions of the test query, only two plan_id values were able to be successfully forced: 292 and 501.  Seventeen of the original plan_id values were not selected by the optimizer after they were forced from Query Store. That seems like the cards are stacked against me.

But as the plan_id values were forced, new plans appeared.  Plenty of them LoL.  If I add in the new plans and attempt to force them as well...



So, with all 34 of the plans in scope, it's a 50-50 chance.  17 of the original plans wouldn't force, 2 of them would.  Add in 15 surprise plans, and there are 17 plans that won't force and 17 plans that do force successfully. And the "Tracked Queries" activity now looks like...


 
Ok.  So with this example I hope I've shown that, at least as through SQL Server 2019 CTP 3.0, encountering a “morally equivalent plan” choice after Query Store forcing a plan might not be as rare as commonly held.

If you're still with me, I've still got a little ways to go with this one... next installment won't be at least until tomorrow, though.

Thanks for reading!

Ciao!!

Q

Saturday, June 15, 2019

SQL Server Query Store: Usability Soft Spots -- Part 1

Part 1 Summary: pretty easy to get lots of Query Store plan_id values per query_id, and sometimes the sort of plan_id values in "Tracked Queries" is downright unhelpful.

SQL Server Query Store: Usability Soft Spots -- Part 2
https://sql-sasquatch.blogspot.com/2019/06/sql-server-query-store-usability-soft_16.html
~~~~

For now, I'm working with a Developer Edition SQL Server 2019 CTP 3.0 installation, on my laptop.

No global trace flags enabled.  Pretty vanilla.

 
OK, let's create a database just for this series of blog posts. No tricks up my sleeve, enable Query Store out the gate, change QUERY_CAPTURE_MODE from default AUTO to ALL, DATA_FLUSH_INTERVAL_SECONDS = 60 and INTERVAL_LENGTH_MINUTES = 1 because these will be pretty quick tests.

USE [master]
GO
CREATE DATABASE [surprise_plan]
 ON  PRIMARY 
( NAME = N'surprise_plan', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.CTP_3_0__2019\MSSQL\DATA\surprise_plan.mdf' , SIZE = 139264KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB )
 LOG ON 
( NAME = N'surprise_plan_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.CTP_3_0__2019\MSSQL\DATA\surprise_plan_log.ldf' , SIZE = 270336KB , MAXSIZE = 2048GB , FILEGROWTH = 65536KB )
 WITH CATALOG_COLLATION = DATABASE_DEFAULT
GO
ALTER DATABASE [surprise_plan] SET  MULTI_USER 
GO
ALTER DATABASE [surprise_plan] SET QUERY_STORE = ON
GO
ALTER DATABASE [surprise_plan] SET QUERY_STORE (OPERATION_MODE = READ_WRITE, CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30), DATA_FLUSH_INTERVAL_SECONDS = 60, INTERVAL_LENGTH_MINUTES = 1, MAX_STORAGE_SIZE_MB = 1000, QUERY_CAPTURE_MODE = ALL, SIZE_BASED_CLEANUP_MODE = AUTO)
GO
ALTER DATABASE [surprise_plan] SET  READ_WRITE 
GO

Now let's create stored procedure [surprise_plan1test] for some fun.  Nothing too clever here.  A query is executed against query store as dynamic SQL.  Do it as dynamic SQL so the offset is predictable and it always matches to the same Query Store query_id as long as context_settings_id stays the same.


USE [surprise_plan]
GO
CREATE OR ALTER   PROCEDURE [dbo].[surprise_plan1test] @max_reps INT
AS
SET NOCOUNT ON
DECLARE @ij INT = 0, @query_count INT, @sqlcmd2 NVARCHAR(MAX), @sqlcmd NVARCHAR(MAX) =
'SELECT /*surprise1test*/ qsq.query_id, qsq.query_hash, qsp.plan_id, qsp.query_plan_hash, qsrs.runtime_stats_id,
                      r_last_execution_time = MAX(qsrs.last_execution_time), r_count_executions = SUM(qsrs.count_executions)
 INTO ##results
 FROM sys.query_store_runtime_stats qsrs
 JOIN sys.query_store_plan qsp ON qsrs.plan_id = qsp.plan_id
 JOIN sys.query_store_query qsq ON qsp.query_id = qsq.query_id
 JOIN sys.query_store_query_text qsqt ON qsq.query_text_id = qsqt.query_text_id
 WHERE qsqt.query_sql_text LIKE ''%surprise1test%''
 GROUP BY qsq.query_id, qsq.query_hash, qsp.plan_id, qsp.query_plan_hash, qsrs.runtime_stats_id
 ORDER BY qsrs.runtime_stats_id DESC, r_last_execution_time DESC;'
SELECT @query_count = COUNT(*) FROM sys.query_store_query;
WHILE @ij < @max_reps
BEGIN
DROP TABLE IF EXISTS ##results;
DBCC FREEPROCCACHE WITH no_infomsgs;
EXEC (@sqlcmd);
SET @sqlcmd2 = 'SELECT TOP (1) * 
INTO #tmp 
FROM SYS.QUERY_STORE_RUNTIME_STATS 
WHERE runtime_stats_interval_id = ' + CONVERT(NVARCHAR(10), @query_count + @ij) + ';
DROP TABLE IF EXISTS #tmp;'
EXEC (@sqlcmd2);
EXEC (@sqlcmd2);
SET @ij = @ij +1;
END
SELECT * FROM ##results
ORDER BY runtime_stats_id DESC;



EXEC [dbo].[surprise_plan1test] @max_reps = 1600


All right.  Let's head over to the Query Store activities. Since I know I'm interested in query_id 4, that's the one I'll pull up.


Yeah. That's a lotta plans in the right-hand legend.  22 of them.  In a not very helpful order.  In fact... though I've tried to figure it out, I don't know what type of order is used for that right-hand legend.  It's not chronological.  It's not based on duration which is the metric displayed by the graph. I dunno.

Usually with a few more executions of query_id 4 and a few refreshes of this activity, the right-legend becomes sorted in descending plan_id from top-to-bottom. And that's pretty durable.  But sometimes it switches back to this nonsensical (to me) ordering.

Got plenty more to show but I wanted to keep this blog post short as an introduction to what I'd be working with.

In the field, I see this quite often - a query_id with over 20 plans that show up in the Tracked Queries activity.  And the unhelpful ordering of plans in the right-hand legend, while a minor and fleeting inconvenience, is usability soft spot number 1. 

For an update to the repro code and the next step in looking at Query Store Usability Soft Spots continue at...
SQL Server Query Store: Usability Soft Spots -- Part 2
https://sql-sasquatch.blogspot.com/2019/06/sql-server-query-store-usability-soft_16.html

Thanks for reading!

Ciao!!

Q

Tuesday, May 28, 2019

SQL Server memory node memory footprints

Day-in, day-out i use perfmon to monitor SQL Server resource utilization and behavior, because its observer overhead is so low.

But its useful to have a few TSQL queries in the stable, too, for adhoc use or if a given memory state needs to result in some action taken within SQL Server.

i'm sure there's a clever wayt to use 'pivot', 'unpivot' and/or 'cross apply' to make this query more elegant. But i just wanted to get this here somewhere. if i figure out a more elegant way to get the resultset i'll update this blog post.
~~~~~

;WITH dbc AS (SELECT node=instance_name, dbc = cntr_value
     FROM sys.dm_os_performance_counters 
     WHERE counter_name = 'Database Node Memory (KB)'),
stl AS (SELECT [node]=instance_name, stl = cntr_value
     FROM sys.dm_os_performance_counters
     WHERE counter_name = 'Stolen Node Memory (KB)'),
fre AS (SELECT [node]=instance_name, fre = cntr_value
     FROM sys.dm_os_performance_counters
     WHERE counter_name = 'Free Node Memory (KB)'),
tot AS (SELECT [node]=instance_name, tot = cntr_value
     FROM sys.dm_os_performance_counters
     WHERE counter_name = 'Total Node Memory (KB)'),
tgt AS (SELECT [node]=instance_name, tgt = cntr_value
     FROM sys.dm_os_performance_counters
     WHERE counter_name = 'Target Node Memory (KB)'),
fgn AS (SELECT [node]=instance_name, fgn = cntr_value
     FROM sys.dm_os_performance_counters
     WHERE counter_name = 'Foreign Node Memory (KB)')
SELECT [memNode] = dbc.[node], 
       [dbCache] = dbc.dbc, 
       [stolen]  = stl.stl, 
       [free]    = fre.fre,
       [total]   = tot.tot, 
       [target]  = tgt.tgt, 
       [foreign] = fgn.fgn
FROM dbc
JOIN stl ON dbc.[node] = stl.[node]
JOIN fre ON dbc.[node] = fre.[node]
JOIN tot ON dbc.[node] = tot.[node]
JOIN tgt ON dbc.[node] = tgt.[node]
JOIN fgn ON dbc.[node] = fgn.[node];


These results - in the order of their capture - are much more interesting.





Let's look at the results below a bit.
The target for the instance is 4 * 180, 991, 992 kb =  723, 967, 968 kb = 707, 000 mb.
Total for the instance is 883, 498, 216 = 862, 791 mb. 155, 791 mb is a pretty big overage.



Total foreign memory is 207, 386, 832 kb = 202, 506 mb. That's a lot of foreign memory.







Q.


Thursday, May 23, 2019

SQL Server 2016: Here a NUMA, there a NUMA... Part 2

*****
This post is a continued look at some memory behaviors explored earlier in the following blog post.

SQL Server 2017: here a NUMA, there a NUMA... Part 1
https://sql-sasquatch.blogspot.com/2019/05/sql-server-2017-here-numa-there-numa.html

The previous blog post gives detail around a significant performance degradation event experienced on a 4 node perf/scale test system. This blog post instead details observations from a production 4 node system.
*****

4x14 vcpus, 1472 total GB vRAM (368 GB per vNUMA node).

Max server memory set at 1375000 mb. 

See if you can spot the unexpected system state below.  🙂







The [db cache] on node 000 is ginormous!!

There is only 368 GB of vRAM on the vNUMA node.  Node 000 is accounting for [db cache] that alone approaches the full amount of vRAM on 3 vNUMA nodes!!

How does that happen if instance wide [total] memory stays constant at [max server memory], and [total] memory stays constant on the other 3 NUMA nodes at [target]?

Got memory counted twice.  Comparing [stolen] memory from all 4 nodes to the instance-wide measure of [stolen] memory shows that something's up 🙂  On this system, nodes 001, 002, and 003 each have a portion of their memory which they account for as [stolen] but node 000 accounts for as [db cache].



There were no free list stalls during this period, thankfully.Somehow the instance is able to eke along keeping a small amount of free memory throughout.



But there is still a cost to pay for this unusual memory state.

I suspect that the cost of cross-node accounting and double-counting the memory is in part reflected by latch waits.




Wednesday, May 15, 2019

MADPASS 2019 May 15 - perfmon black box & AMA

Nothing to see here yet :-)

This is a place holder for a follow-up blog post to tonight's 5 minute intro to the value of a locally run perfmon black box, followed by the AMA panel with myself, Joe Obbish, and John Eisbrener.

Once there's something of interest here I'll tweet the link to give it a proper introduction.

Windows groups Performance Monitor Users and Performance Log Users.
Error message when you try to access the Performance Monitor (Perfmon.exe) on a remote computer: "Access Is Denied"
https://support.microsoft.com/en-us/help/969639/


How to create a “black box” performance counter data collector
(Clint Huffman, 2014 May 23)
https://blogs.technet.microsoft.com/clinth/2014/05/23/how-to-create-a-black-box-performance-counter-data-collector/

My m.o. is to use a counters file. I typically keep the counters file in the same directory as my perfmon start script(s).

For example this text file lives at C:\perfmon\blackbox__counters.txt.

Note the final two lines - date and my name.  Because they don't match to legit counters these lines are ignored; i use this for commenting in my counters files.

\NUMA Node Memory(*)\Free & Zero Page List MBytes
\Paging File(_Total)\% Usage
\Memory\Pages Input/sec
\Memory\Pages Output/sec
\Network Interface(*)\Bytes Received/sec 
\Network Interface(*)\Bytes Sent/sec
\Processor Information(*,_Total)\% Processor Time
\LogicalDisk(*)\Current Disk Queue Length
\20190515
\Lonny Niederstadt -- Blackbox


The start script can be run interactively("Run as administrator"), but I also enjoy setting them to run on system start with schtasks.  It lives at C:\perfmon\blackbox__start.bat


:: local perfmon blackbox Lonny Niederstadt 20190515
setlocal enableextensions
cd /d "%~dp0"
logman.exe stop blackbox_%COMPUTERNAME%
logman.exe delete blackbox_%COMPUTERNAME%
logman.exe create counter blackbox_%COMPUTERNAME% -f csv -v mmddhhmm -max 500 -cf blackbox__counters.txt -o "C:\perfmon\blackbox_%computername%" -si 00:00:30 -cnf 24:00:00 -rf 672:00:00
logman.exe start blackbox_%COMPUTERNAME%
::pause

The real magic is in the "logman create counter" command.
logman create counter
https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/logman-create-counter 

Additional resources:

https://techcommunity.microsoft.com/t5/Ask-The-Performance-Team/Two-Minute-Drill-LOGMAN-EXE/ba-p/373061

Monday, May 13, 2019

SQL Server 2017: here a NUMA, there a NUMA... Part 1

aka Performance Bamboozle

*****
This blog post considers a perf/scale testing system.  See a following blog post for similar concerns from a SQL Server 2016 production system.
SQL Server 2016: Here a NUMA, there a NUMA... Part 2
https://sql-sasquatch.blogspot.com/2019/05/sql-server-2016-here-numa-there-numa.html
*****

Let's observe a workload on SQL Server 2017 CU13, running on a 4x24 vcpu VM with 900 GB vRAM. [Max server memory] is set to 750 GB.  From 5:00 am to 12:30 pm [target server memory] is equal to [max server memory]; below I will refer to [target] only.


The yellow box marked in the graphs below is unusual.  Although CPU utilization generally trends with both active requests and parallel workers in the [default] workload group, CPU utilization is far lower in the yellow box than predicted by the number of active requests or parallel workers.  The end of the yellow box coincided with an intervention step.  My quick-thinking colleague Joe Obbish issued a [dbcc dropcleanbuffers] and the recovery was nothing short of miraculous.  But certainly that's not an operation to execute frequently on a busy system.  I'll be referring to the yellow box as a "performance bamboozle" for the remainder of this blog post.  I've gotta do something to keep myself amused.  What went wrong - and why did [dbcc dropcleanbuffers] help?



Only two DOP 8 queries ran for the majority of the yellow box.  Lets compare their Query Store details to those of similar queries with the same query_hash values.  All eight of the query executions detailed in the following 2 result sets ran at DOP 8; the particular queries of interest are highlighted in red.

Wow.  All 8 queries returned a single row.  All 8 had similar CPU ms, physical reads, logical reads, tempdb use, query memory use. But those two highlighted in red had elapsed time roughly 20 to 30 times that of the other queries. What happened?




The workers for these queries spent the majority of the yellow box time in sleep_task wait.  Often sleep_task is often ignored as a benign wait.

Read more about sleep_task waits in the
SQLSkills SQL Server Wait Type Library
https://www.sqlskills.com/help/waits/sleep_task/

Though perfmon includes details of several wait types in [\SQLServer:Wait Statistics(*)\*], sleep_task is not among them.

But perfmon offers other evidence of what was happening in those 2 queries.  Check out those free list stalls!!



The free list stalls call to mind the instance-wide perfmon counter [\SQLServer:Memory Manager\Free Memory (KB)].

SQL Server [total] memory is comprised of three categories: [database cache], [stolen], [free].

Just in case someone out there loves a little perfmon math as much as I do...

  [\SQLServer:Memory Manager\Database Cache Memory (KB)]
+ [\SQLServer:Memory Manager\Stolen Server Memory (KB)]
+ [\SQLServer:Memory Manager\Free Memory (KB)]
= [\SQLServer:Memory Manager\Total Server Memory (KB)]



It seems fairly intuitive that [\SQLServer:Memory Manager\Free Memory (KB)] would be pretty low when free list stalls are occurring.  And indeed that's what is seen in the graph below:the high plateau of free list stalls occurs during a valley of SQLOS free memory.


On this 4 vNUMA node vm, perfmon counters [\SQLServer:Memory Node(*)\Free Node Memory (KB)] will account for the portion of [\SQLServer:Memory Manager\Free Memory (KB)] on each SQLOS node.  The graph below shows that (although every now and then a stray errant value shows up in the individual node values).


The changes in free memory across the SQLOS nodes almost seem to predict something...


OK, let's look at the [\SQLServer:Memory Node(*)\Database Node Memory (KB)] counters.
Interesting that almost no database cache is listed on node 002 during the performance bamboozle.


When looking at [Stolen] memory in the next graph, something doesn't quite add up.  Some memory in the instance is being double-counted by SQLOS: counted on one node as [db cache] and on another as [stolen].  I've blogged about that briefly in the following two blog posts.

SQL Server 2016 Memory Accounting: A Suspicious Surprise 
http://sql-sasquatch.blogspot.com/2018/07/sql-server-2016-memory-accounting.html 

SQL Server 2016 Memory Accounting Part II: Another Suspicious Example
http://sql-sasquatch.blogspot.com/2018/10/sql-server-2016-memory-accounting-part.html

The massive amount of [stolen] memory on node 002 makes sense given the almost complete lack of [Db Cache] on node 002. It still looks kind of unusual.


~~~~~

Now, when free list stalls occur, one way to replenish the free lists is for clean buffer pool pages to be evicted by the lazy writer thread(s). Page writes can make clean pages out of dirty pages 🙂

So, when free list stalls occur, I expect to see page writes coming out of SQL Server.  But there weren't many page writes during this performance bamboozle.  Lazy writer wrote out a few pages.  Notice that as lazy writer wrote pages, the number of free list stalls slightly decreased. Checkpoint writes occurred up to the start of the performance bamboozle.  But not during.

There were no background writer writes, because the active databases were using automatic checkpoints rather than indirect checkpoints.


The slow pace of page writes contributed to the performance bamboozle. Or perhaps share a common underlying cause.

Note especially the second large bout of free list stalls at approximately 10:15 am.  During that round of free lists stalls, SQL Server page writes and lazy writer writes were plentiful.  That's how I expect the system to respond to free list stalls.

The [dbcc dropcleanbuffers] that Joe initiated to end the performance bamboozle was able to do something that the lazywriter was not able to do.  It broke a logjam, freeing up a bunch of memory within SQL Server [total] memory and allowing the lazywriter threads to get back to normal work.

~~~~~~~~~~

Many may already be familiar with the exceptional behavior shown in the next graph.  [Target server memory] is not as often enforced for SQL Server [total] memory as it used to be.  Batch mode columnstore queries especially seem prone to make [total] exceed [target].  This behavior is not unique to systems with multiple NUMA nodes; it can also be seen when there is a single SQLOS memory node.

Notice that [total] exceeds [target] almost immediately after the yellow box below, in addition to other times in the graph..



Description of [total] exceeding [target] can be seen in the following KB article, with a screenshot excerpt following.

Memory configuration and sizing considerations in SQL Server 2012 and later versions
https://support.microsoft.com/en-us/help/2663912/memory-configuration-and-sizing-considerations-in-sql-server-2012-and


For various reasons, I do not consider [total] > [target] to be a contributor to the performance bamboozle.  Rather, the brief time that [total] exceeded [target] after the performance bamboozle was a part of SQL Server recovering to normal behavior.

~~~~~~~~~~

The graph below shows this instance also experienced persistent and gradually accumulating [foreign] memory in the SQLOS memory nodes.  Fluctuation in the amount of [foreign] memory is somehow related to [total] exceeding [target].  This behavior is unique to systems with multiple NUMA nodes and multiple SQLOS memory nodes.


The persistent and accumulating [foreign] memory, similar to [total] > [target] does not seem to have been a contributor to the performance bamboozle.  Rather, the increase in total 


~~~~~~~~~~

At 9:31:07 top level memory blocks were allocated.



Harshdeep Singh discusses the role of top level memory blocks in maintaining free lists in the following blog post.

An In-depth look at memory – SQL Server 2012/2014
https://blogs.msdn.microsoft.com/sqljourney/2015/04/27/an-in-depth-look-at-memory-sql-server-20122014/

Late(r than expected) allocation of top level memory blocks may have contributed to the performance bamboozle.


Q.