Saturday, June 12, 2021

#SQLServer Snapshot Isolation Level - increase query CPU time and elapsed time by a factor of 8!

OK. This is a little test in my SQL Server 2019 CU11 instance.  But the behavior I will demonstrate goes way back... and seems like it just may go way forward, too.

First, let's create a test database, with a nice little 3 column CCI table in it.

USE [master];
GO

CREATE DATABASE [test_SI] 
ON PRIMARY (	NAME = N'test_SI'
,				FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.V2019CU11\MSSQL\DATA\test_SI.mdf')
LOG ON (		NAME = N'test_SI_log'
,				FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.V2019CU11\MSSQL\DATA\test_SI_log.ldf');

ALTER DATABASE [test_SI] SET ALLOW_SNAPSHOT_ISOLATION ON;
GO
USE [test_SI]
DROP TABLE IF EXISTS dbo.SI_DeletedRows;

CREATE TABLE dbo.SI_DeletedRows
(	x_key		[bigint]	NOT NULL
,	x_value		[int]		NULL
,	x_comment	[varchar](512)	NULL
,	INDEX [CCI__SI_DeletedRows] CLUSTERED COLUMNSTORE
);

Let's take 2 minutes (or less) to insert 104,203,264 rows into the clustered columnstore index created. 

;WITH sixteen AS
(	SELECT num = 1 UNION ALL SELECT 1
        UNION ALL SELECT 1 UNION ALL SELECT 1
        UNION ALL SELECT 1 UNION ALL SELECT 1
        UNION ALL SELECT 1 UNION ALL SELECT 1
        UNION ALL SELECT 1 UNION ALL SELECT 1
        UNION ALL SELECT 1 UNION ALL SELECT 1
        UNION ALL SELECT 1 UNION ALL SELECT 1
        UNION ALL SELECT 1 UNION ALL SELECT 1)
INSERT INTO     dbo.SI_deletedRows
SELECT ROW_NUMBER() OVER (ORDER BY s1.number DESC), s1.number, s1.name FROM master..spt_values s1
CROSS JOIN	master..spt_values s2	
CROSS JOIN sixteen;
/* 104203264 rows
min - 1:37
max - 1:56 */

All right - now let's check on the rowgroup quality.  Pretty nice; 100 rowgroups. 99 of them at maximum size, and 1 with almost 400,000 rows.

SELECT		csrgps.state_desc, num_groups = count(*), rows_in_group = csrgps.total_rows
,		csrgps.trim_reason_desc, csrgps.transition_to_compressed_state_desc
FROM		sys.dm_db_column_store_row_group_physical_stats csrgps
WHERE		csrgps.object_id = object_id('dbo.SI_DeletedRows')
GROUP BY	csrgps.state_desc, csrgps.total_rows, csrgps.trim_reason_desc
,		csrgps.transition_to_compressed_state_desc;


Let's take a minute and delete 1 row out of every 15 from the CCI.

DELETE FROM dbo.SI_DeletedRows WHERE x_key % 15 = 0;
/* 6946884 rows, 6.67%
0:51 */ 

Here's a quick little query against the CCI, in Read Committed transaction isolation level. The query is run all by itself on the instance, at DOP 1.

SET TRANSACTION ISOLATION LEVEL Read Committed
SET STATISTICS TIME ON
SELECT	COUNT(1) FROM dbo.SI_DeletedRows
WHERE	x_comment IN ('money', 'smallmoney', 'offline', 'Little Vito')
OR	x_comment IS NULL
OPTION (MAXDOP 1);
/*  SQL Server Execution Times:
    CPU time = 312 ms,  elapsed time = 327 ms. */

Let's run the query again at DOP 1. But change the isolation level to Snapshot. No other queries are running on the instance, so there is no competition for resources.  There is nothing in the version store, and there are no other open transactions against the objects in the query.  Snapshot isolation all by itself is responsible for increasing the CPU time and the elapsed time of this query on a CCI with deleted rows by a factor of eight.

SET TRANSACTION ISOLATION LEVEL SNAPSHOT
SET STATISTICS TIME ON
SELECT	COUNT(1) FROM dbo.SI_DeletedRows 
WHERE	x_comment IN ('money', 'smallmoney', 'offline', 'Little Vito')
OR	x_comment IS NULL
OPTION (MAXDOP 1);
/* SQL Server Execution Times:
   CPU time = 2625 ms,  elapsed time = 2638 ms. */

I've seen this behavior on production systems.  But in the field, the effect can be even more pronounced than this example.  I've seen more complicated queries involving CCIs have snapshot isolation increase their runtime from under 1 second to over 60 seconds.

That's too bad, since snapshot isolation is a canonical solution that can facilitate read analytics queries against the same database supporting read-write workloads.

If Snapshot Isolation does not meet workload goals, often RCSI is evaluated as an option. In this case, RCSI introduces the same performance overhead as snapshot isolation.

An Availability Group readable secondary also experiences the same performance overhead.

Removing all deleted rows from a CCI allows CCI queries in Read Committed and Snapshot Isolation levels to perform comparably (assuming no competition for resources or locks).  But, CCI reorgs will only remove deleted rows from rowgroups with at least 10% of their rows deleted.  And CCI rebuilds may not be appropriate given resource and other constrains.