Wednesday, June 21, 2017

Are TCP Registry Settings KeepAliveTime & TimedWaitDelay still applicable in Windows Server 2016? Signs point to yes...



Documentation for Windows TCP parameters can be really hard to track down.  Recently I was asked if TCP registry settings KeepAliveTime and TimedWaitDelay still apply in Windows Server 2012 R2.  I haven't seen any Microsoft documentation referencing them since Windows Server 2008R2... neither had they.

The best answer I can come up with for now (since not enough time to test) is: signs point to yes.

Windows Server 2016 'Performance Tuning Network Adapters' page
Windows Server 2012 'Performance Tuning Network Adapters' page
 
Under the Windows Server 2016 'Performance Tuning TCP>>Deprecated TCP parameters', the list of deprecated TCP parameters below appears.
Neither KeepAliveTime nor TimedWaitDelay are listed as deprecated since Windows Server 2003.  Because KeepAliveTime and TimedWaitDelay are omitted from the list of deprecated TCP registry settings, for now I assume that they are still valid even though Windows Server 2012/2012R2 documentation seems to make no mention of them.

Please let me know if you see Windows Server 2012/2012R2/2016 documentation referencing them... or if you test and can confirm active/deprecated.

Thanks!

Monday, June 5, 2017

#SQLServer Checktable OLEDB Waits Part IV - how bow dah?


OLEDB Waits in #SQLServer checktable Part I - Disappearing Act
http://sql-sasquatch.blogspot.com/2017/06/oledb-waits-in-sqlserver-checktable.html

OLEDB Waits in #SQLServer Checktable Part II - the song remains very nearly the same
http://sql-sasquatch.blogspot.com/2017/06/oledb-waits-in-sqlserver-checktable_5.html


Checktable OLEDB Waits Part III - the more things change, the more... wait, WAT?!
http://sql-sasquatch.blogspot.com/2017/06/checktable-oledb-waits-part-iii-more.html


OK.  Through the first 3 parts of this epic saga we've taken a glance at checktable operations on a simple heap(with no indexes), a simple clustered index (no additional indexes).  Let's take a look at a heap with a nonclustered index!

CREATE TABLE OLEDB_TEST_HEAP_NCI
([num] BIGINT INDEX nci_OLEDB_TEST_HEAP_NCI NONCLUSTERED,
 [str] NVARCHAR(4000)
 );

DECLARE @str NVARCHAR(max);
SET @str = REPLICATE ('A', 4000);

;WITH cnum(num) AS
(SELECT TOP (80) 1 FROM sys.columns)
INSERT INTO OLEDB_TEST_HEAP_NCI
SELECT row_number() OVER (ORDER BY (SELECT NULL)),
@str
FROM cnum c1
CROSS JOIN cnum c2;



EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 8, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 7, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 6, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 5, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 4, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 3, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 2, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 1, @PhysicalOnly = 'Y', @TABLOCK = 'N';


The results below look vaguely familiar.  6456 total pages used.  Starting at 6464 OLEDB wait events for maxdop 1 and 2.  Up to 6471 OLEDB wait events at maxdop 8.  In the image below I've also circled a few of the OLEDB wait_ms numbers from system and session level wait DMVs.  Circled because today is the first day I've noticed differences in the numbers - although I've yet to see a difference greater than 1 ms.  Still - as far as I'm concerned even a difference of 1 ms is a hint that *maybe* session and system level waits are tracked separately rather than a single source that updates both of the views.



Now - if we change to running without physical_only, will we get some multiple of the table row count, since there is a nonclustered index?  Recall that the clustered index sustained nearly four times the row/page count as the number of OLEDB waits in a checktable without physical_only.

EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 8, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 7, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 6, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 5, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 4, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 3, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 2, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI', @MAXDOP = 1, @PhysicalOnly = 'N', @TABLOCK = 'N';

Nope.  Just a small incremental additional number of OLEDB wait events, starting at 6519 and working up to 6526.  Huh.




All right.  What if we use a heap with a nonclustered include index?  Based on the table and index definition below, I expect the nonclustered include index nci_OLEDB_TEST_HEAP_NCI_INCLUDE to be remarkably similar to the clustered index we previously used.


CREATE TABLE OLEDB_TEST_HEAP_NCI_INCLUDE
([num] BIGINT,
 [str] NVARCHAR(4000)
 );
CREATE INDEX nci_OLEDB_TEST_HEAP_NCI_INCLUDE ON OLEDB_TEST_HEAP_NCI_INCLUDE(num) INCLUDE([str]);

DECLARE @str NVARCHAR(max);
SET @str = REPLICATE ('A', 4000);

;WITH cnum(num) AS
(SELECT TOP (80) 1 FROM sys.columns)
INSERT INTO OLEDB_TEST_HEAP_NCI_INCLUDE
SELECT row_number() OVER (ORDER BY (SELECT NULL)),
@str
FROM cnum c1
CROSS JOIN cnum c2;

How many pages are we using? 12865.



Let's proceed with 8 physical_only checktable operations.

EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 8, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 7, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 6, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 5, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 4, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 3, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 2, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 1, @PhysicalOnly = 'Y', @TABLOCK = 'N';

How's it look?  Pretty familiar.  12865 pages in use, a small overhead gets us 12909 to 12916 OLEDB wait events.



Ok, now 8 checktable operations without the physical_only option.

EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 8, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 7, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 6, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 5, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 4, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 3, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 2, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP_NCI_INCLUDE', @MAXDOP = 1, @PhysicalOnly = 'N', @TABLOCK = 'N';

And there's that really large jump in OLEDB wait events as the checktable operation drops the physical_only operation.



Checktable OLEDB Waits Part III - the more things change, the more... wait, WAT?!

Third part of an adventure in #SQLServer Checktable.
OLEDB waits in #SQLServer checktable Part I - Disappearing Act
http://sql-sasquatch.blogspot.com/2017/06/oledb-waits-in-sqlserver-checktable.html

OLEDB waits in #SQLServer Checktable Part II - the song remains very nearly the same

OK. We've seen some stuff with a simple heap.  Let's trade in the heap for a clustered index with the same row values.

CREATE TABLE OLEDB_TEST_CLINDEX
([num] BIGINT,
 [str] NVARCHAR(4000)
 constraint pk_OLEDB_TEST_CLINDEX primary key clustered(num));

DECLARE @str NVARCHAR(max);
SET @str = REPLICATE ('A', 4000);

;WITH cnum(num) AS
(SELECT TOP (80) 1 FROM sys.columns)
INSERT INTO OLEDB_TEST_CLINDEX
SELECT row_number() OVER (ORDER BY (SELECT NULL)),
@str
FROM cnum c1
CROSS JOIN cnum c2;

The clustered index uses a few more pages - that makes sense.  More structure to maintain for a clustered index than for a heap.



So let's perform the physical_only checktable operations on OLEDB_TEST_CLINDEX.  Based on the previous results, we'll expect the number of OLEDB wait events to be close to the number of pages in the clustered index.  And we expect OLEDB session waits to not update for MAXDOP 2 or 1, even though instance level OLEDB waits will update. 

EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 8, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 7, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 6, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 5, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 4, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 3, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 2, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 1, @PhysicalOnly = 'Y', @TABLOCK = 'N';


How do the results match our expectations?  Pretty well. OLEDB_TEST_HEAP used 6408 pages, and OLEDB waits started at 6428 for MAXDOP 1 and MAXDOP 2, increasing by one for each increase in MAXDOP from 3 to 8.  OLEDB_TEST_CLINDEX used 6438 pages, and OLEDB waits start at 6458 for MAXDOP 1 and 2, and increase by one for each increase in MAXDOP from 3 to 8.


All right.  Let's move on to removing the physical_only parameter.  With OLEDB_TEST_HEAP we saw a minimal change in the number of OLEDB wait events between a checktable with and without physical_only.


EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 8, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 7, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 6, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 5, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 4, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 3, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 2, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_CLINDEX', @MAXDOP = 1, @PhysicalOnly = 'N', @TABLOCK = 'N';

So how do the results match with expectations?  Whoa!!  25745 OLEDB wait events as the starting point for MAXDOP 1 and 2.  Then single-stepping up to 25752 OLEDB wait events for MAXDOP 8.  25752 is exactly 4 times the used page count of 6438 for the clustered index.  So the OLEDB wait events went from a small number more than the count of pages/rows with the heap... to almost 4 times as many with the clustered index.  *That* was a bit unexpected.


[
Continue with this thrilling tale here...
#SQLServer Checktable OLEDB Waits Part IV - how bow dah?
]
 

OLEDB waits in #SQLServer Checktable Part II - the song remains very nearly the same

In Part I, the main focuse was on OLEDB waits in a checktable with physical_only on a heap table with 6400 rows, at 1 row per 8k page.  Part II is gonna be pretty brief - but there's more coming I promise!!

OLEDB waits in #SQLServer checktable [part I] - Disappearing Act
http://sql-sasquatch.blogspot.com/2017/06/oledb-waits-in-sqlserver-checktable.html

What happens if the logical checks are included in the checktable operation on this table, by removing the physical_only option?


EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 8, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 7, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 6, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 5, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 4, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 3, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 2, @PhysicalOnly = 'N', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 1, @PhysicalOnly = 'N', @TABLOCK = 'N';

In this case, not very much.


In comparison to the physical_only checktable operation, at each DOP the number of OLEDB wait events increased by one.  Maxdop 2 and 1 still have the same number of OLEDB wait events, while from maxdop 3 to 8 the number of wait events increases by one each time maxdop increases by one.  And as was seen previously, at maxdop 2 and maxdop 1 the session waits DMV isn't catching the OLEDB wait activity associated with the checktable operation, even though it is being captured at the system level.

On this very simple heap, with nearly the same number of rows as pages in the heap, the number of OLEDB wait events is nearly the same - only increased by one - when "upgrading" the checktable operation from physical_only to a checktable with logical checks.

[
carry on with this epic saga here...
Checktable OLEDB Waits Part III - the more things change, the more... wait, WAT?!
]
 


OLEDB waits in #SQLServer checktable Part I - Disappearing Act

Let's finally start looking at behavior surrounding the OLEDB wait type during checktable operations.  These tests, as previously, are running on an 8 vcpu Windows server 2012 R2 vm (VMware) with SQL Server 2016 SP1.

Since I plan to do a lot of testing, create a table to store some test definitions and results.


DROP TABLE IF EXISTS oledb_wait_test;

CREATE TABLE oledb_wait_test
(test_id      INT,
 captured     NVARCHAR(256),
 capture_time DATETIME,
 spid         INT,
 tableName    NVARCHAR(256), 
 [maxdop]     INT,
 physicalonly NVARCHAR(1),
 [tablock]    NVARCHAR(1),
 wait_type    NVARCHAR(256),
 ows_wait_ct  BIGINT,
 ows_wait_ms  BIGINT,
 sws_wait_ct  BIGINT,
 sws_wait_ms  BIGINT
 CONSTRAINT pk_oledb_wait_test PRIMARY KEY CLUSTERED (test_id, captured))
 WITH (data_compression = page);

OK. Let's create a very basic heap to submit to checktable operations.


CREATE TABLE OLEDB_TEST_HEAP
([num] BIGINT,
 [str] NVARCHAR(4000));


DECLARE @str NVARCHAR(max);
SET @str = REPLICATE ('A', 4000);

/*for 80 * 80 = 6400 rows OLEDB_TEST_HEAP will allow parallel checktable with physical_only
  for 79 * 79 = 6249 rows OLEDB_TEST_HEAP will only run checktable with physical_only in serial
*/
;WITH cnum(num) AS
(SELECT TOP (80) 1 FROM sys.columns)
INSERT INTO OLEDB_TEST_HEAP
SELECT row_number() OVER (ORDER BY (SELECT NULL)),
@str
FROM cnum c1
CROSS JOIN cnum c2;


Its really, really easy for me to skip a detail like coalescing NULL values in wait tables to zero in order to get valid deltas... so lets encapsulate the test and results capture in a stored procedure.

Here's my simple stored procedure:
  • take parameters for a checktable operation
  • record the parameters and tare OLEDB wait values from system and session levels
  • execute dynamic SQL for checktable operation
  • record system and session level OLEDB wait values after checktable completion
CREATE OR ALTER PROCEDURE CHECKTABLE_OLEDB_WAIT_TEST 
(@tableName NVARCHAR(256), @MAXDOP INT = 8, 
 @PhysicalOnly NVARCHAR(1) = 'Y', @TABLOCK NVARCHAR(1) = 'N')
AS

DECLARE @dynSQL NVARCHAR(1000);
DECLARE @testID INT;

SELECT @testID = COALESCE(max(test_Id), 0) + 1
FROM   oledb_wait_test;

SET @dynSQL = N'DBCC CHECKTABLE ([' + @tableName + ']) WITH MAXDOP = ' 
              + CONVERT(NVARCHAR(2), @MAXDOP) 
              + CASE WHEN @PhysicalOnly = N'Y' THEN N', PHYSICAL_ONLY' ELSE '' END
              + CASE WHEN @TABLOCK = N'Y' THEN N', TABLOCK' ELSE '' END
     + ';';

INSERT INTO oledb_wait_test
SELECT @testId       AS test_Id,
       'BEFORE'      AS captured,
       getdate()     AS capture_time,
       @@spid        AS spid,
       @tableName    AS tableName, 
       @MAXDOP       AS [maxdop],
       @PhysicalOnly AS physicalonly,
       @TABLOCK      AS [tablock],
       ows.wait_type AS wait_type,
       COALESCE(ows.waiting_tasks_count, 0) AS ows_wait_ct,
       COALESCE(ows.wait_time_ms, 0)        AS ows_wait_ms,
       COALESCE(sws.waiting_tasks_count, 0) AS sws_wait_ct,
       COALESCE(sws.wait_time_ms, 0)        AS sws_wait_ms
FROM   sys.dm_os_wait_stats ows
LEFT OUTER JOIN
       sys.dm_exec_session_wait_stats sws
ON     sws.wait_type  = ows.wait_type
AND    sws.session_id = @@spid
WHERE  ows.wait_type  = 'OLEDB';

EXEC (@dynSQL);

INSERT INTO oledb_wait_test
SELECT @testId       AS test_Id,
       'AFTER'       AS captured,
       getdate()     AS capture_time,
       @@spid        AS spid,
       @tableName    AS tableName, 
       @MAXDOP       AS [maxdop],
       @PhysicalOnly AS physicalonly,
       @TABLOCK      AS [tablock],
       ows.wait_type AS wait_type,
       COALESCE(ows.waiting_tasks_count, 0) AS ows_wait_ct,
       COALESCE(ows.wait_time_ms, 0)        AS ows_wait_ms,
       COALESCE(sws.waiting_tasks_count, 0) AS sws_wait_ct,
       COALESCE(sws.wait_time_ms, 0)        AS sws_wait_ms
FROM   sys.dm_os_wait_stats ows
LEFT OUTER JOIN
       sys.dm_exec_session_wait_stats sws
ON     sws.wait_type = ows.wait_type
AND    sws.session_id = @@spid
WHERE  ows.wait_type = 'OLEDB';

All right.  Let's run 8 tests - from MAXDOP 8 in decreasing order to MAXDOP 1.


EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 8, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 7, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 6, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 5, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 4, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 3, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 2, @PhysicalOnly = 'Y', @TABLOCK = 'N';
EXEC CHECKTABLE_OLEDB_WAIT_TEST @tableName = N'OLEDB_TEST_HEAP', @MAXDOP = 1, @PhysicalOnly = 'Y', @TABLOCK = 'N';

And what was captured for OLEDB wait stats during each checktable?

They say a picture is worth a thousand words...


OLEDB wait count at the system level ranged from 6428 at maxdop 1 and maxdop 2 up to 6435 at maxdop 8.  Interesting.  The table OLEDB_TEST_HEAP was constructed with 80 * 80 = 6400 rows, each of which are expected to consume an 8k block (because of the 4000 character NVARCHAR and the lack of compression on the table).  Since these results are from physical_only tests, could the number of waits be related to the number of blocks in the heap?

That seems plausible.


Let's resize OLEDB_TEST_HEAP to 10000 rows and see what happens.


TRUNCATE TABLE OLEDB_TEST_HEAP;
DECLARE @str NVARCHAR(max);
SET @str = REPLICATE ('A', 4000);

;WITH cnum(num) AS
(SELECT TOP (100) 1 FROM sys.columns)
INSERT INTO OLEDB_TEST_HEAP
SELECT row_number() OVER (ORDER BY (SELECT NULL)),
@str
FROM cnum c1
CROSS JOIN cnum c2;

Now run the same tests as above, with maxdop decreasing from 8 to 1, and display the results captured.  Interesting.  Very interesting.


So... yeah.  Definitely a relationship between the number of pages in the heap and the number of OLEDB waits.  Something(s) increase the number of OLEDB waits beyond the number of data pages, the number of used pages and the number of reserved pages in a way that I haven't deciphered yet, though.


One thing is clear - above maxdop 2, one OLEDB wait is added for each incremental increase in maxdop.  That's interesting.  Another interesting point: at maxdop 2 and maxdop 1 the number of OLEDB system waits are equal rather than decreasing again by 1.  Hmmm.

And the most interesting thing: at maxdop 2 and maxdop 1, the session OLEDB waits seem not to update.

I created a connect item for the OLEDB wait disappearing act in the session wait DMV sys.dm_exec_session_wait_stats.  The repro in this blog post is better than the one I put in the details of that connect item :-)  I'll update the connect item later.

OLEDB waits not always updating in sys.dm_exec_session_waits
https://connect.microsoft.com/SQLServer/feedback/details/3135073

There's more to come as I continue to look at OLEDB wait behavior in checktable operations...


[
carry on with this epic saga here...
OLEDB waits in #SQLServer Checktable Part II - the song remains very nearly the same
http://sql-sasquatch.blogspot.com/2017/06/oledb-waits-in-sqlserver-checktable_5.html



]


Tuesday, May 30, 2017

A Quick Look at scalability [Part II]: #SQLServer Checktable without physical_only option

In my last post, we looked at the scalability of 'dbcc checktable... with physical_only' for two tables, from dop 1 to dop 8.

A Quick Look at Scalability [Part I]: #SQLServer Checktable with physical_only
http://sql-sasquatch.blogspot.com/2017/05/a-quick-look-at-scalability-sqlserver.html

While the operation achieved high utilization of a single vcpu for both tables at dop 1, as dop increased cpu utilization did not keep pace.  Total cpu_ms for the operation was fairly stable.  The results was that elapsed time did not decrease in the same proportion as cpu resources were added by increasing dop.  The main decreases in elapsed time were realized by dop 4, further decreases from dop 5 to 8 were minimal.  The following two graphs summarize these findings well.




All right.   What if we look at full checktable operations, without the physical_only option?  Does that operation scale the same, better, or worse than physical_only?

Here's the results from TableA.  Cool!  Once again, cpu_ms for the operation remained fairly stable.  Rather than continuing to increase as dop increased, idle_ms reached an early peak and remained fairly stable afterward.  The result is decreasing elapsed time with each increase in dop from 1 to 8.  It looks as if elapsed time may continue to decrease beyond dop 8... someday I'll get to test :-) But the gains after dop 8 look as if they'll be smaller and will almost certainly drop off by dop 12.


The results from TableB sent me back to test again, and dig further into the data I gathered.  Check it out.  The cpu_ms is still fairly stable.  Elapsed time decreases with each increase of dop.  But... whoa.  What an unexpected shape from idle_ms.


I thought the 'hill' must be due to a 'one-off' abnormality, not likely to be repeated.  So I repeated tests for dop 2, 3, 4, and 5.  Plugged the results into the graph and... whoa.


Here's one more test run.  This is probably the most characteristic... but so many factors can throw the numbers off.


So the good news for today is that checktable operations without the physical_only option scale farther/better on my test tables than checktable with physical_only.  While with physical_only scaling benefits in elapsed time are primarily seen only to dop 4, without the physical_only option elapsed time benefits to increasing dop extend at least to dop 8.

And we saw that the shape of scalability graphs is pretty volatile :-)  That's largely because modest changes in elapsed time are multiplied by dop in this calculation to arrive at the idle_ms number - that idle_ms number is the one that changes shape most readily.

Friday, May 26, 2017

A Quick Look at Scalability [Part I]: #SQLServer Checktable with physical_only

Ewald Cress and I have been discussing the OLEDB wait for a while - and at this point he's been pulling all the weight in terms of blogging :-)
His post on the nature of OLEDB wait is great - I highly recommend it.

Scheduler stories: OLEDB, the external wait that isn’t preemptive
http://sqlonice.com/scheduler-stories-oledb-the-external-wait-that-isnt-preemptive/

Before I start digging into the waits experienced by checktable, and in particular OLEDB wait, I want to take a high level look at the scalability of checktable operations, starting with checktable with physical_only.  Below are graphs and additional information showing checktable executions on my 8 vcpu VM (with 64 gb vRAM).  Got results from maxdop 1 to maxdop 8. (Although I actually ran the tests in descending order of maxdop for reasons unknown even to me 😜 )

Lets talk about the difference between "performance" and "scalability".  Performance is a measure of the pace of work that a workload can achieve on a given system - or alternatively the measure of time required to complete a unit of work on that system.  Scalability is the capacity of the workload and system to increase pace of work as additional resources are added.

So we can describe the performance of a checktable operation on a given table and system, at a given degree of parallelism (since the degree of parallelism as well as the number of (v)CPUs on the system may both be limiters of compute resources for the workload).  As the degree of parallelism changes from 1 to the number of (v)CPUs on the system, the performance of checktable may change.  Describing that change in performance will characterize the scalability of checktable on that system.  In the final analysis, a scalability limit may come from the workload(application level coding or database level coding), or it may come from the system.    

All of the numbers below are from perfmon - collected with a 1 second interval, logged to a csv.   The X axis is number of seconds since the start of the checktable(1-indexed rather than 0-indexed).  Don't worry - as I continue I'll be pulling in lots of stuff from DMVs that can't easily be gleaned from perfmon. :-)  But any of the info I grab from DMVs will be summary of time period fgrom start to end opf operation, rather than in 1 second increments.  Observer overhead of the DMV queries is too high for my tastes otherwise.

This system is running SQL Server 2016 SP1.

Here's the operation at DOP 1.  All of the checktable operations are running in a Resource Pool named after me.  In this one you'll see something interesting that I can't explain.  For approximately 60 seconds at the tail end of the activity, CPU utilization hovered near 25%.  My DOP 1 checktable was the only active request in SQL Server, though (confirmed by looking for active requests in the default and internal Resource Pools, the only other Resource Pools in the instance).

Something outside of SQL Server was using 12.5% CPU utilization in the VM.  Oh well... it didn't have enough of a footprint in memory or disk to cause an issue, and my checktable was happily running along on its own vcpu.  Notice that neither the rate of disk read bytes and the rate of logical scan read bytes was able to achieve previous maximum sustained levels, and the CPU usage reported for my resource pool remained at a steady 12.5% throughout.  


At DOP 2, on the 8 vcpu vm the checktable could account for up to 25% of vm-wide cpu utilization.  But wait time appears to have increased substantially, and cpu utilization reaches 25% only briefly near the end.


At maxdop 3, checktable could consume 37.5% cpu.  But it seems to be falling farther from its potential maximum.


Maxdop 4 *could* see up to 50% CPU utilization... doesn't get too close.


62.5% maximum possible at maxdop 5 - but it seems that wait time increases must be outpacing declines in elapsed time.  This checktable does not seem to be scaling very well.


Maxdop 6 would allow for a maximum of 75% CPU utilization.


Maxdop 7 would allow for up to 87.5% CPU utilization.  Even the brief spikes don't get too close.



At maxdop 8 on the 8 vcpu vm, it appears about half of the cpu time is spent in a wait state.



I think its very important to look at time series graphs to understand behavior... hope I didn't lose you to boredom by including all eight of the time series.  There's interesting stuff I hope to return to later in there - especially that always-present peak in CPU utilization in the parallel operations about 2/3 of the way in.

But to compare scalability of checktable across tables, and across options such as physical_only, we'll have to look at things from a higher level.

Each of the schedulers engaged by checktable is either accruing cpu_ms for the checktable (as reported by sys.dm_exec_requests) or not at any given time.  Since these tests are running in isolation on the vm, for now we can assume that time for the schedulers/vcpus that isn't accounted for in cpu_ms is idle_ms.

So we can use this formula when dop, elapsed_ms, and cpu_ms are known:

DOP * elapsed_ms = cpu_ms + idle_ms


That allows the 8 checktable operations to be summarized in this graph.  From DOP1 to DOP 8 the cpu_ms of the operation is extremely steady.  From DOP 1 to DOP 4 there are significant decreases in elapsed time as dop increases.  After dop 4, reduction in elapsed time is slight.  Throughout the tested range, idle_ms increased at a nearly linear rate.




I've got another table - TableB that I've been working with extensively.
Here's how scalability for checktable with physical_only looked on TableB.  Remarkably similar to scalability for TableA.  Hmmm...