Wednesday, September 23, 2020

SQL Server 2016++, checkdb/checktable, and trace flags 2549 & 2562

 TL;dr

Trace flag 2549 is no longer required for checkdb or checktable in SQL Server 2016++.

Trace flag 2562 may still benefit checkdb on a system (depends on disk characteristics and tempdb) in SQL Server 2016++, but is not expected to benefit checktable due to the already-limited scope of checktable activity.

~~~~~~~~~~~~~

Trace flags 2549 and 2562 were introduced in kb2634571, linked below.

2549 - by default checkdb (and checktable) before SQL Server 2016 planned its concurrent disk IO per Windows drive letter, in order to achieve a reasonable pace without overloading disk subsystems.

The problem came in for systems using mount points.  One system might put SQL Server data files on 8 Windows volumes as eight separate drive letters. Another system could put the data files on 8 Windows volumes under a single drive letter, using mount points.

The system with eight separate drive letters would experience much higher concurrent disk traffic from checkdb/checktable than the system with a single drive letter.  And if that system could handle the higher disk IO pace, checkdb/checktable would perform better on that system.

Trace flag 2549 was introduced to even the playing field - increasing the pace of checkdb/checktable disk io for systems using mountpoints.  This was done by planning the disk IO per database file, rather than per drive letter. 

In SQL Server 2016, there were changes to components used by checkdb/checktable explained in the second link below that make trace flag 2549 no longer needed. The second link below describes MultiObjectScanner vs CheckScanner (introduced in SQL Server 2016).


2562 - by default, checkdb groups the heaps and b-tree indexes into batches of up to 512 for its activity.

With trace flag 2562 enabled, checkdb puts all heaps and b-tree indexes into a single batch.

This trace flag often speeds the performance of checkdb with physical_only on systems using hard drive storage (by minimizing disk head traffic and lowering average disk read service times).

However tempdb is a serious consideration for this trace flag when checkdb is running without physical_only (when logical checks are run).

Tempdb use for logical checks increases until the batch of heaps and b-tree indexes is complete.  Watching tempdb use by checkdb on a system with many batches, tempdb use can be seen to increase as each batch runs, then drop with the start of another batch.

If all heaps and b-tree indexes are in a single batch due to trace flag 2562, tempdb use increases until checkdb is complete.

Another consideration with trace flag 2562: a single nonpartitioned table will not require more than one batch, it will already fit in a single batch.

So trace flag 2562 is not expected to effect the operation of checktable.

 

Improvements for the DBCC CHECKDB command may result in faster performance when you use the PHYSICAL_ONLY option

https://support.microsoft.com/en-us/help/2634571/improvements-for-the-dbcc-checkdb-command-may-result-in-faster-perform

 

SQL 2016 - It Just Runs Faster: DBCC Scales 7x Better

https://docs.microsoft.com/en-us/archive/blogs/psssql/sql-2016-it-just-runs-faster-dbcc-scales-7x-better

Friday, September 18, 2020

SQL Server 2019 - the strangest thing happened on the way to my cardinality estimate

Today was kind of a rough day.  It started with a colleague asking if I knew any handy tricks that could magically make dozens of queries go back to the seconds or minutes they ran at under SQL Server 2012 rather than the increments of half-hours they were taking since an upgrade to SQL Server 2016.

Well - I know a few.  Was trace flag 4199 globally enabled on 2012? Yep. And on 2016?  Database Compat level 2016, and optimizer hotfixes enabled in database scoped configuration.

Ok.  Well, in SQL Server 2012 they were using the Legacy CE because it was the only one around.  How about on SQL Server 2016?  Legacy CE enabled in database scoped configuration.

Oh.

How about ascending key? Trace flag 4139 was globally enabled in SQL Server 2012.  And in SQL Server 2016? Yep, globally enabled there, too.

Trace flag 4138 to disable row goal adjustments has sometimes made a huge difference.  I didn't mention it yet, though.  Better to see an affected plan first.  And once I saw an affected plan... well, it really didn't look like trace flag 4138 could help, either.  The bad plan was all nested loop joins, filters, parallelism operators and compute scalar operators.  By adding a HASH JOIN hint, the plan looked more familiar to my colleague and query performance went back to the few seconds normally expected.  But the developers wanted to understand the regression, and if possible avoid adding the hint to dozens or maybe even more than a hundred queries.

That was just about the end of my quick-n-easy list.  For good measure my colleague checked what happened to the plan with SQL Server 2012 compat level: no substantive change.  And with the default CE: no substantive change.

I started staring at statistics.  Maybe some very important, very popular stats had fallen prey to extremely unlucky samples.

I became very suspicious when I noticed that a fairly important auto-created statistics object was on a varchar(66) column.  But the query used integer values for that column heavily in the query predicate.

Sure 'nuf, the range_high_values were all integers, but because the column was varchar, the range_high_keys were dictionary sorted.  That always makes me kinda suspicious.

And in that original query plan, a three million row table was pushed through a filter with a row estimate of 1 coming out.  That's very suspicious.

So for several hours I tested various hinky ideas.

And finally, late on a Friday night, I think I discovered the problem.  I won't know for sure until Monday when similar tests can be run on-site against their data on a SQL Server 2016 and 2012 system.  But... even if this isn't *that* problem, it is *a* problem and its present in SQL Server 2019 CU6.

Time for show and tell.  This work is being done in SQL server 2019 CU6.

Let's create a simple 3 column table. Each of the columns are varchar data type.  Create statistics on col1.  Populate the table with 50,000 rows.  The first column col1 gets populated with integers from 9900 to 14900, in multiples of 100.  Then let's update statistics.

CREATE TABLE [dbo].[stats_test4](
	[col1]  [varchar](66)   NOT NULL,
	[fluff] [varchar](4200)     NULL,
	[col2]  [varchar](66)   NOT NULL
) ON [PRIMARY];

CREATE STATISTICS stats_test4__col1 ON stats_test4(col1);

;with nums as
(select top (1000) n = row_number() over (order by (select null))
 from master..spt_values)
, recur as
(select top (50) n = row_number() over (order by (select null))
 from master..spt_values)
, fluff as
(select z = replicate('Z', 4100))
 insert into stats_test4
 select 10000 + 100 * (nums.n % 50 - 1), fluff.z, 1
 from nums
 cross join recur
 cross join fluff;

UPDATE STATISTICS stats_test4(stats_test4__col1);

Here's what the statistics look like... there are 47 steps and only 26 of them are in the picture below... but the last 21 are boring anyway.


Let's generate an estimated plan for a query with an IN list in the WHERE clause.  These two examples are with the legacy cardinality estimator.  There are 19 integer members of that IN list.  You can see the implicit conversion warning on the SELECT operator below.  Column col1 is varchar(66), but it is being evaluated against a list of integer values.  That filter operator is where the relevant action is right now.  An estimate of almost 16000 rows is fine by me.

But the world is not enough.  Nineteen integers?  How about 20?

With 20 integers in the IN list, the estimate at the filter drops from nearly 16000 to... 1.

The downstream effects can easily be that desired hash joins become loop joins due to low row estimates.

Now let's see what happens with default cardinality estimator.  Let's start with a single integer IN list.

An estimate of 1000.

And with two integers in the IN list the estimate is at 1492.47.

The decreasing rate of growth is the work of exponential backoff...


With three integers in the IN list the exponential backoff is even more evident...


Here's 4 integers...


At 5 integers in the IN list, the estimate of 1858.58 hasn't grown from the previous estimate.  From here on out the estimate won't grow. 


Certainly there are other details surrounding this issue.  I'll be back to fill in more as I learn more.  In particular on Monday I should be able to transplant what I've learned in SQL Server 2019 to some on-site 2012 and 2016 databases.  That should conclusively indicate whether this is part of the current bad stew... or something I should remember for the future when I run into it again.


Friday, September 11, 2020

SQL Server snapshot db temporary stats - created or updated

For a great introduction to this topic, please see this post at SQLSkills from Joe Sack.

Temporary Statistics for Database Snapshots in SQL Server 2012

https://www.sqlskills.com/blogs/joe/temporary-statistics-for-database-snapshots-in-sql-server-2012/

Today's test instance is running SQL Server 2019 CU6.  

OK, let's create a test database.

USE [master]
GO

CREATE DATABASE [snap_test_src]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'snap_test_src', SIZE = 8192KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB 
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\snap_test_src.mdf')
 LOG ON 
( NAME = N'snap_test_src__log', SIZE = 8192KB , MAXSIZE = 2GB , FILEGROWTH = 65536KB 
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\snap_test_src__log.ldf')

ALTER DATABASE [snap_test_src] SET AUTO_CREATE_STATISTICS ON
ALTER DATABASE [snap_test_src] SET AUTO_UPDATE_STATISTICS ON
ALTER DATABASE [snap_test_src] SET AUTO_UPDATE_STATISTICS_ASYNC OFF

ALTER DATABASE [snap_test_src] SET ANSI_NULL_DEFAULT OFF
ALTER DATABASE [snap_test_src] SET ANSI_NULLS OFF
ALTER DATABASE [snap_test_src] SET ANSI_PADDING OFF
ALTER DATABASE [snap_test_src] SET ANSI_WARNINGS OFF 
ALTER DATABASE [snap_test_src] SET ARITHABORT OFF
ALTER DATABASE [snap_test_src] SET AUTO_CLOSE OFF 
ALTER DATABASE [snap_test_src] SET AUTO_SHRINK OFF
ALTER DATABASE [snap_test_src] SET CURSOR_CLOSE_ON_COMMIT OFF 
ALTER DATABASE [snap_test_src] SET CURSOR_DEFAULT  GLOBAL 
ALTER DATABASE [snap_test_src] SET CONCAT_NULL_YIELDS_NULL OFF 
ALTER DATABASE [snap_test_src] SET NUMERIC_ROUNDABORT OFF 
ALTER DATABASE [snap_test_src] SET QUOTED_IDENTIFIER OFF
ALTER DATABASE [snap_test_src] SET RECURSIVE_TRIGGERS OFF
ALTER DATABASE [snap_test_src] SET DISABLE_BROKER 
ALTER DATABASE [snap_test_src] SET DATE_CORRELATION_OPTIMIZATION OFF
ALTER DATABASE [snap_test_src] SET TRUSTWORTHY OFF 
ALTER DATABASE [snap_test_src] SET ALLOW_SNAPSHOT_ISOLATION OFF 
ALTER DATABASE [snap_test_src] SET PARAMETERIZATION SIMPLE 
ALTER DATABASE [snap_test_src] SET READ_COMMITTED_SNAPSHOT OFF 
ALTER DATABASE [snap_test_src] SET HONOR_BROKER_PRIORITY OFF 
ALTER DATABASE [snap_test_src] SET RECOVERY FULL 
ALTER DATABASE [snap_test_src] SET PAGE_VERIFY CHECKSUM  
ALTER DATABASE [snap_test_src] SET DB_CHAINING OFF 
ALTER DATABASE [snap_test_src] SET FILESTREAM( NON_TRANSACTED_ACCESS = OFF ) 
ALTER DATABASE [snap_test_src] SET TARGET_RECOVERY_TIME = 60 SECONDS 
ALTER DATABASE [snap_test_src] SET DELAYED_DURABILITY = DISABLED 
ALTER DATABASE [snap_test_src] SET QUERY_STORE = OFF
ALTER DATABASE [snap_test_src] SET READ_WRITE
ALTER DATABASE [snap_test_src] SET MULTI_USER
 

Now let's create a very boring table and put 1000 numbers into each of 2 columns.

 
USE snap_test_src;
CREATE TABLE stats_test(col1 INT NOT NULL, col2 INT NOT NULL);

;WITH num AS 
(SELECT TOP (1000) n = ROW_NUMBER()
                       OVER (ORDER BY (SELECT NULL))
 FROM master..spt_values)
INSERT INTO stats_test
SELECT n, 1001 - n
FROM num;

Because auto_create_statistics is on in this database, a little query will result in creating a statistic on col1.

SELECT * FROM stats_test where col1 < 3;

And here is that statistic.

SELECT ss.name, ss.stats_id, ss.auto_created, ss.user_created, ss.is_temporary
     , sp.last_updated
FROM snap_test_src.sys.stats ss
CROSS APPLY snap_test_src.sys.dm_db_stats_properties(ss.object_id, ss.stats_id) sp
WHERE ss.object_id = object_id('stats_test');

Now let's make that statistic stale 😆 A thousand more numbers oughtta do it.
 
USE snap_test_src;

;WITH num AS 
(SELECT TOP (1000) n = ROW_NUMBER()
                       OVER (ORDER BY (SELECT NULL))
 FROM master..spt_values)
INSERT INTO stats_test
SELECT n, 1001 - n
FROM num;

All right.  Got a table with 2000 rows and a single auto-created statistics object that's now stale.
Let's create a snapshot database!!

CREATE DATABASE snap_test ON
    ( NAME = snap_test_src, 
	  FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\snap_test_src.ss')
AS SNAPSHOT OF snap_test_src;

OK... now for a query in the snapshot database...

USE snap_test
SELECT * FROM stats_test
WHERE col1 < 3 AND col2 > 998;

Let's check those stats... now there are two stats and both are marked as temporary.

SELECT ss.name, ss.stats_id, ss.auto_created, ss.is_temporary
     , sp.last_updated
FROM snap_test.sys.stats ss
CROSS APPLY snap_test.sys.dm_db_stats_properties(ss.object_id, ss.stats_id) sp
WHERE ss.object_id = object_id('stats_test');

Let's take another look, and compare the stats in the source database and the snapshot database.

(*** update 2020 September 24 ***
this little trick where I reference dm_db_stats_properties in a different database by adding the database name to the reference works in SQL Server 2019 CU6.  But it didn't work on a SQL Server 2016 instance I tried the same trick on - looks like both references to the function resolved to the database context my query was in.)

USE snap_test
SELECT ss.name, source_rows = sp2.rows, source_sample = sp2.rows_sampled
     , snap_rows = sp.rows, snap_sample = sp.rows_sampled
FROM snap_test.sys.stats ss
CROSS APPLY snap_test.sys.dm_db_stats_properties(ss.object_id, ss.stats_id) sp
OUTER APPLY snap_test_src.sys.dm_db_stats_properties(ss.object_id, ss.stats_id) sp2
WHERE ss.object_id = object_id('stats_test');


The source database and the snapshot database both have a stats_test table with 2000 rows, of course.  But the source database has only a statistic on col1, not col2.  And the col1 statistic was last updated when the table had 1000 rows.  The snapshot database has stats on col1, also.  They've been updated - as a temporary statistic - with the row count of 2000 and the sample of all 2000 rows.  That was current as of the time of the snap.

 Consider a snapshot database which is created daily.  The purpose is to provide analytics reporting access while maintenance or data loads take place in the source database.  In the snapshot database, analytics reports have no locking concerns from the activity in the underlying source database.  And the temporary statistics provided by SQL Server, combined with the statistics inherited from the source database, provide a lot of information to the optimizer for query plan selection.

But what if significant fact tables are queried in the snapshot and leave a situation like col2 in stats_test?  A column which generates an auto-created stat in the snapshot, but never gets a statistic created in the source database.  Each day, the cost of creating that statistic and every statistic like it will be paid as part of the workload.  Even if the underlying table is a now-stable dimension. 

If a snapshot database is being used to provide reporting access, it may be valuable to check for temporary statistics that are perpetually being created - or updated - in the snapshot.  In some cases, creating or updating the statistic in the underlying source database may provide a benefit by eliminating redundant work.

q