Friday, September 11, 2020

SQL Server snapshot db temporary stats - created or updated

For a great introduction to this topic, please see this post at SQLSkills from Joe Sack.

Temporary Statistics for Database Snapshots in SQL Server 2012

https://www.sqlskills.com/blogs/joe/temporary-statistics-for-database-snapshots-in-sql-server-2012/

Today's test instance is running SQL Server 2019 CU6.  

OK, let's create a test database.

USE [master]
GO

CREATE DATABASE [snap_test_src]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'snap_test_src', SIZE = 8192KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB 
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\snap_test_src.mdf')
 LOG ON 
( NAME = N'snap_test_src__log', SIZE = 8192KB , MAXSIZE = 2GB , FILEGROWTH = 65536KB 
, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\snap_test_src__log.ldf')

ALTER DATABASE [snap_test_src] SET AUTO_CREATE_STATISTICS ON
ALTER DATABASE [snap_test_src] SET AUTO_UPDATE_STATISTICS ON
ALTER DATABASE [snap_test_src] SET AUTO_UPDATE_STATISTICS_ASYNC OFF

ALTER DATABASE [snap_test_src] SET ANSI_NULL_DEFAULT OFF
ALTER DATABASE [snap_test_src] SET ANSI_NULLS OFF
ALTER DATABASE [snap_test_src] SET ANSI_PADDING OFF
ALTER DATABASE [snap_test_src] SET ANSI_WARNINGS OFF 
ALTER DATABASE [snap_test_src] SET ARITHABORT OFF
ALTER DATABASE [snap_test_src] SET AUTO_CLOSE OFF 
ALTER DATABASE [snap_test_src] SET AUTO_SHRINK OFF
ALTER DATABASE [snap_test_src] SET CURSOR_CLOSE_ON_COMMIT OFF 
ALTER DATABASE [snap_test_src] SET CURSOR_DEFAULT  GLOBAL 
ALTER DATABASE [snap_test_src] SET CONCAT_NULL_YIELDS_NULL OFF 
ALTER DATABASE [snap_test_src] SET NUMERIC_ROUNDABORT OFF 
ALTER DATABASE [snap_test_src] SET QUOTED_IDENTIFIER OFF
ALTER DATABASE [snap_test_src] SET RECURSIVE_TRIGGERS OFF
ALTER DATABASE [snap_test_src] SET DISABLE_BROKER 
ALTER DATABASE [snap_test_src] SET DATE_CORRELATION_OPTIMIZATION OFF
ALTER DATABASE [snap_test_src] SET TRUSTWORTHY OFF 
ALTER DATABASE [snap_test_src] SET ALLOW_SNAPSHOT_ISOLATION OFF 
ALTER DATABASE [snap_test_src] SET PARAMETERIZATION SIMPLE 
ALTER DATABASE [snap_test_src] SET READ_COMMITTED_SNAPSHOT OFF 
ALTER DATABASE [snap_test_src] SET HONOR_BROKER_PRIORITY OFF 
ALTER DATABASE [snap_test_src] SET RECOVERY FULL 
ALTER DATABASE [snap_test_src] SET PAGE_VERIFY CHECKSUM  
ALTER DATABASE [snap_test_src] SET DB_CHAINING OFF 
ALTER DATABASE [snap_test_src] SET FILESTREAM( NON_TRANSACTED_ACCESS = OFF ) 
ALTER DATABASE [snap_test_src] SET TARGET_RECOVERY_TIME = 60 SECONDS 
ALTER DATABASE [snap_test_src] SET DELAYED_DURABILITY = DISABLED 
ALTER DATABASE [snap_test_src] SET QUERY_STORE = OFF
ALTER DATABASE [snap_test_src] SET READ_WRITE
ALTER DATABASE [snap_test_src] SET MULTI_USER
 

Now let's create a very boring table and put 1000 numbers into each of 2 columns.

 
USE snap_test_src;
CREATE TABLE stats_test(col1 INT NOT NULL, col2 INT NOT NULL);

;WITH num AS 
(SELECT TOP (1000) n = ROW_NUMBER()
                       OVER (ORDER BY (SELECT NULL))
 FROM master..spt_values)
INSERT INTO stats_test
SELECT n, 1001 - n
FROM num;

Because auto_create_statistics is on in this database, a little query will result in creating a statistic on col1.

SELECT * FROM stats_test where col1 < 3;

And here is that statistic.

SELECT ss.name, ss.stats_id, ss.auto_created, ss.user_created, ss.is_temporary
     , sp.last_updated
FROM snap_test_src.sys.stats ss
CROSS APPLY snap_test_src.sys.dm_db_stats_properties(ss.object_id, ss.stats_id) sp
WHERE ss.object_id = object_id('stats_test');

Now let's make that statistic stale 😆 A thousand more numbers oughtta do it.
 
USE snap_test_src;

;WITH num AS 
(SELECT TOP (1000) n = ROW_NUMBER()
                       OVER (ORDER BY (SELECT NULL))
 FROM master..spt_values)
INSERT INTO stats_test
SELECT n, 1001 - n
FROM num;

All right.  Got a table with 2000 rows and a single auto-created statistics object that's now stale.
Let's create a snapshot database!!

CREATE DATABASE snap_test ON
    ( NAME = snap_test_src, 
	  FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\snap_test_src.ss')
AS SNAPSHOT OF snap_test_src;

OK... now for a query in the snapshot database...

USE snap_test
SELECT * FROM stats_test
WHERE col1 < 3 AND col2 > 998;

Let's check those stats... now there are two stats and both are marked as temporary.

SELECT ss.name, ss.stats_id, ss.auto_created, ss.is_temporary
     , sp.last_updated
FROM snap_test.sys.stats ss
CROSS APPLY snap_test.sys.dm_db_stats_properties(ss.object_id, ss.stats_id) sp
WHERE ss.object_id = object_id('stats_test');

Let's take another look, and compare the stats in the source database and the snapshot database.

(*** update 2020 September 24 ***
this little trick where I reference dm_db_stats_properties in a different database by adding the database name to the reference works in SQL Server 2019 CU6.  But it didn't work on a SQL Server 2016 instance I tried the same trick on - looks like both references to the function resolved to the database context my query was in.)

USE snap_test
SELECT ss.name, source_rows = sp2.rows, source_sample = sp2.rows_sampled
     , snap_rows = sp.rows, snap_sample = sp.rows_sampled
FROM snap_test.sys.stats ss
CROSS APPLY snap_test.sys.dm_db_stats_properties(ss.object_id, ss.stats_id) sp
OUTER APPLY snap_test_src.sys.dm_db_stats_properties(ss.object_id, ss.stats_id) sp2
WHERE ss.object_id = object_id('stats_test');


The source database and the snapshot database both have a stats_test table with 2000 rows, of course.  But the source database has only a statistic on col1, not col2.  And the col1 statistic was last updated when the table had 1000 rows.  The snapshot database has stats on col1, also.  They've been updated - as a temporary statistic - with the row count of 2000 and the sample of all 2000 rows.  That was current as of the time of the snap.

 Consider a snapshot database which is created daily.  The purpose is to provide analytics reporting access while maintenance or data loads take place in the source database.  In the snapshot database, analytics reports have no locking concerns from the activity in the underlying source database.  And the temporary statistics provided by SQL Server, combined with the statistics inherited from the source database, provide a lot of information to the optimizer for query plan selection.

But what if significant fact tables are queried in the snapshot and leave a situation like col2 in stats_test?  A column which generates an auto-created stat in the snapshot, but never gets a statistic created in the source database.  Each day, the cost of creating that statistic and every statistic like it will be paid as part of the workload.  Even if the underlying table is a now-stable dimension. 

If a snapshot database is being used to provide reporting access, it may be valuable to check for temporary statistics that are perpetually being created - or updated - in the snapshot.  In some cases, creating or updating the statistic in the underlying source database may provide a benefit by eliminating redundant work.

q

No comments:

Post a Comment