Friday, March 10, 2023

Big-Big #SQLServer Table Migration via bcp (echoes of Friday, February 8, 2013)

Today (2023 March 10) I am having memories of helping with bcp from one SQL Server 2008 R2 database to another of a 9.5 billion+ row table back in 2013.  Yikes. bcp out - no need to think about uniqueifiers and error 666.

bcp in of 9.5 billion rows to a table with a clustered index and some NCIs on the other hand...

SELECT text FROM sys.messages WHERE message_id = 666 and language_id=1033

The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID %I64d. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.
SQL 2005 kb 937533 is relevant.  Table spool may only be able to handle 2,147,483,648 input rows before exhausting the available values for uniqueifier.

Here's a SQL Server 2008 example from the wayback machine.

First posted 2010 March 8 
General considerations around uniqueifier and error 666 here.
2018 February 16
Now i *think* if the -b parameter for bcp specifies a batch size (maybe 100,000 or a million in this case) for an humongous import it will:
- lower the liability for transaction log full that accompanies simple recovery model, ADR not enabled in SQL Server 2019 and beyond, and a clustered index which is non-empty to start with
- lower the maximum footprint of spool or spill in tempdb for sort before clustered index insert
- avoid potential uniqueifier 666 errors
Once i get the most recent trouble with big-big table migration sorted out, i'll work on a repro of the problem.

Tuesday, January 31, 2023

#SQLServer - USE HINT ('DISABLE_OPTIMIZER_ROWGOAL') - SURPRISE! TOP operator in graphical query plan


 Sometimes perhaps the things I care about will really leave people wondering. :-)

In this case, it's a SQL Server graphical query plan oddity.

I don't think this behavior is necessarily related to a performance problem.

But I think this behavior can make it *harder* to troubleshoot SQL Server graphical query plans for performance.  This is a case where a TOP operator shows up in a graphical plan unexpectedly - and seemingly against the expressed intent of the USE HINT ('DISABLE_OPTIMIZER_ROWGOAL') hint.

Here are some sources for additional background about the general idea.

KB2667211 - A query may take a long time to run if the query optimizer uses the Top operator in SQL Server 2008 R2 or in SQL Server 2012

KB4051361 - Optimizer row goal information in query execution plan added in SQL Server 2014, 2016 and 2017

Inside the Optimizer: Row Goals In Depth
Paul White
2010 August 18

All right. Game on.

A little bit of info about the system.

Let's do some setup (I'll append all of the T-SQL at the end of the blog post for those that want to follow along at home).


OK, let's see what our folly hath wrought. Good, good. Let the A9 flow...

I know this query seems almost non-sensical.  But I assure you it was lovingly distilled from a much more complex case I ran into in the field.  Sure, the query could be simplified.  But that's not the point of this exercise :-) Just grabbing the estimated plan here, because that's all I care about in this repro.

(If t1 and t0 were actually different heaps this would be about as good SQL Server could do without any indexes.)

OK, it's a funky query... but nothing too unusual yet in the graphical plan.

So we've got a query that probably no *person* would write and the graphical plan isn't really too objectionable.  Why don't we add a hint that no *person* would add to this query, for good measure?  One that would be expected to have no effect? If there isn't an apparent row goal already, what change could come about if we added the DISABLE_OPTIMIZER_ROWGOAL hint?

Oh. That's weird.  A TOP operator magically appeared in the plan.

The attribute EstimateRowsWithoutRowGoal (discussed in the second KB article linked above) doesn't appear in the plan XML.  So... I guess this is "row goal lite" or something?


Martin Smith recommended testing the a hint to disable a rule: QUERYRULEOFF GbAggToConstScanOrTop.

The surprise TOP operator still shows up in the estimated plan.


And... I didn't realize this until after I initially published this blog post.

The sum of the costs of the operators in the graphical plans with the surprise TOP operator... is 106%.



Here's the T-SQL for those that want to have their own fun :-)


, n2 AS (SELECT n = 1 FROM n1 a1 CROSS JOIN n1 a2)
, n3 AS (SELECT n = 1 FROM n2 a1 CROSS JOIN n2 a2)
, n4 AS (SELECT n = 1 FROM n3 a1 CROSS JOIN n3 a2)
, n5 AS (SELECT n = 1 FROM n4 a1 CROSS JOIN n4 a2)
, n6 AS (SELECT n = 1 FROM n5 a1 CROSS JOIN n3 a2)
            WHEN n%10=0 THEN 'A2'
            WHEN n%9=0 THEN 'A3'
            WHEN n%8=0 THEN 'A4'
            WHEN n%7=0 THEN 'A5'
            WHEN n%6=0 THEN 'A6'
            WHEN n%5=0 THEN 'A7'
            WHEN n%4=0 THEN 'A8'
            ELSE 'A9'
FROM n7;

SELECT   col1, how_many = count(*) 
FROM     #temp 

SELECT  col1
FROM    #temp t0
					FROM   #temp t1
					WHERE  t1.col1 = 'A9'
					AND    t0.col1 = t1.col1 )

SELECT  col1
FROM    #temp t0
					FROM   #temp t1
					WHERE  t1.col1 = 'A9'
					AND    t0.col1 = t1.col1 )

Thursday, December 8, 2022

Let's check up on the SQL Server transaction log buffers in SQL Server 2019 CU 16!!!

 I'm looking into a severe LOGFLUSHQ spinlock issue in SQL Server 2019 CU16.

The issue first presented itself afaik in CU 15.

I'll probably blog more thoroughly about all that another day.

Here I just wanted to return to a detail from yesteryear.

Last time I checked was probably SQL Server 2014 - at that time each database had 128 in-memory transaction log buffers, each of which could be up to 60 kb (hence why each transaction log write could be up to 60 kb).

Since the logflushq spinlock contention is in the context of transaction log flush/write queue management, I figured I'd better check if the number of buffers had increased, or if the maximum size of those buffers had increased.

The T-SQL code I used is down below...

These results from SSMS show that SQL Server 2019 is what I am used to so far:

128 transaction log buffers, each with a max size of 60k.

And lc_state for those buffers can be retrieved from DBCC DBTABLE.

([ParentObject] VARCHAR(255),
[Object] VARCHAR(255),
[Field] VARCHAR(255),
[Value] VARCHAR(255)

'dbcc dbtable(' + quotename(db_name()) + ') with tableresults, no_infomsgs';

EXEC (@sqlT);

SELECT SQLServer_Version = @@version;

SELECT ParentObject
FROM #table
WHERE LEFT(ParentObject, 
CASE WHEN PATINDEX('% @%', ParentObject) > 2 
THEN PATINDEX('% @%', ParentObject) - 1 ELSE 1 END) = 'LogMgr'
GROUP BY ParentObject;

(SELECT ParentObject
FROM #table
WHERE LEFT(ParentObject, 
CASE WHEN PATINDEX('% @%', ParentObject) > 2 
THEN PATINDEX('% @%', ParentObject) - 1 ELSE 1 END) = 'LogMgr'
GROUP BY ParentObject),
LogMgr_objects AS
CASE WHEN PATINDEX('% @%', t.Object) > 2 
THEN PATINDEX('% @%', t.Object) - 1 ELSE 1 END),
FROM #table t
JOIN LogMgr ON t.ParentObject = LogMgr.ParentObject
GROUP BY LEFT(t.Object, 
CASE WHEN PATINDEX('% @%', t.Object) > 2 
THEN PATINDEX('% @%', t.Object) - 1 ELSE 1 END), t.object)
SELECT LO.OBJECT_TYPE, object_count = COUNT(*)
FROM LogMgr_objects LO

SELECT t.Field, t.value, NUM_log_buffers = count(*)
FROM #table t
WHERE t.object LIKE 'LC @0x%'
AND t.field = 'lc_maxDataSize'
GROUP BY t.field, t.value;

SELECT t.Field, t.value, NUM_log_buffers = count(*)
FROM #table t
WHERE t.object LIKE 'LC @0x%'
AND t.field = 'lc_state'
GROUP BY t.field, t.value;

Monday, April 4, 2022

SQL Server - recursive CTE to retrieve dependency chain of Resource Governor Classifier Function

SQL Server Resource Governor classifier functions can include references to functions as well as tables in the master database.  And each referenced function can in turn reference additional functions and tables.

In order to get a thorough understanding of how sessions are classified on a given system, one may need some familiarity with each function or table in the dependency tree.

This recursive function will retrieve all object names in the dependency tree (object names will be retrieved but column names will not be retrieved, due to the referenced_minor_id = 0 qualifier.

(	SELECT		root_id		= classifier_function_id
, referencing_id = classifier_function_id , referenced_id = classifier_function_id , NestLevel = 1 FROM sys.dm_resource_governor_configuration UNION ALL SELECT root_id         = rgc.classifier_function_id , d1.referencing_id , d1.referenced_id , NestLevel + 1 FROM sys.sql_expression_dependencies d1 JOIN DTree r ON d1.referencing_id = r.referenced_id CROSS JOIN sys.dm_resource_governor_configuration rgc WHERE d1.referenced_minor_id = 0) SELECT classifier_function = OBJECT_NAME(t1.root_id) , referencing_name = OBJECT_NAME(t1.referencing_id) , referenced_name = OBJECT_NAME(t1.referenced_id) , referenced_type = so.type_desc , t1.NestLevel FROM DTree t1 JOIN sys.objects so ON so.object_id = t1.referenced_id ORDER BY t1.NestLevel, t1.referencing_id;

SQL Server - recursive CTE to retrieve rowstore index IAM chain

 A SQL Server recursive CTE can reach a maximum level of recursion of 32768.

Strting with the first IAM page for a rowstore index, the entire IAM chain can be retrieved, as each IAM page indicates the next page in the chain.

So a recursive CTE could reliably retrieve IAM chains for the rowstore indexes in a filegroup until the filegroup exceeds 128 TB in aggregate file size. (Because each IAM page accounts for database file usage in a ~4GB range of a file, for a single rowstore index.)

This CTE will retrieve an IAM chain. AFAIK, it's performance will beat the brakes off any similar query relying on sys.dm_db_database_page_allocations.

DECLARE @index_id	SMALLINT	= 1; 
DECLARE @type_desc	VARCHAR(20)	= 'IN_ROW_DATA';
;WITH start_page AS 
(	SELECT				fp.file_id
	,				fp.page_id
	FROM				sys.system_internals_allocation_units siau
	JOIN				sys.partitions sp 
	ON				siau.container_id	= sp.partition_id
	CROSS APPLY			sys.fn_PhysLocCracker(siau.first_iam_page) fp
	WHERE				sp.object_id		= @object_id
	AND				sp.index_id		= @index_id
	AND				siau.type_desc		= @type_desc	)
, rec_cte AS
(	SELECT				next_page_file_id	= CONVERT(BIGINT, sp.file_id)
	,				next_page_page_id	= CONVERT(BIGINT, sp.page_id)
	,				iam_page_ordinal	= 1
	FROM				start_page sp
	SELECT				CONVERT(BIGINT, dpi.next_page_file_id)
	,				CONVERT(BIGINT, dpi.next_page_page_id)
	,				iam_page_ordinal = iam_page_ordinal + 1
	FROM				rec_cte
	CROSS APPLY			sys.dm_db_page_info(DB_ID(), next_page_file_id, next_page_page_id, 'DETAILED') dpi
	WHERE				dpi.next_page_file_id <> 0    ) 
SELECT		rc.iam_page_ordinal, file_id = rc.next_page_file_id, page_id = rc.next_page_page_id
FROM		rec_cte rc
WHERE		rc.next_page_file_id IS NOT NULL
ORDER BY	rc.iam_page_ordinal
OPTION		(maxrecursion 32767);

Monday, February 28, 2022

#SQLServer Trivial Plans for Inserts; Stats Expectations and Reality

OK.  SQL Server trivial plans for rowstore table INSERT. And related optimizer stats interaction.

TL;DR cached trivial plans for INSERT can be surprisingly stubborn. If a query matches to one, it won't perform or queue a stats update even if the stats are stale.  If the stats have been updated and would otherwise warrant a per-index plan - but there is a matching cached trivial plan for a per-row plan... outta luck. Might hafta DBCC FREEPROCCACHE or add OPTION(RECOMPILE) hint to make sure a cached trivial plan doesn't prevent a per-index update for an INSERT when you really want one.


The blog post referenced below, first published 2015 March 16, is pretty good.

The relevant takeaway from that post is: in a database with AUTO_UPDATE_STATISTICS = ON, a statistics update will NOT invalidate a relevant cached query plan. A subsequent query which qualifies to use a cached plan will continue to use the cached trivial plan. This is Scenario 1 in the blog post. The following text is directly from the blog post.

When a plan is trivial, it’s unnecessary to recompile the query even statistics has been updated.  Optimizer generates trivial plan for very simple queries (usually referencing a single table).  In XML plan, you will see statementOptmLevel="TRIVIAL". In such case, it’s futile and you won't get a better or different plan.

The demo code in the blog post uses SELECT queries only. In my examples in my own blog post, I'm concerned with INSERT queries.  Are INSERT queries eligible for both trivial plans and plans that have been more fully optimized? If INSERT queries can use a trivial plan, is it possible for a cached trivial plan to remain in use when a better and/or different plan could be selected?

Perhaps the reader is familiar with the Paul White blog post referenced below, from 2013 January 26. In that blog post, Paul White discusses wide, per-index INSERT query plans and narrow, per-row INSERT query plans.  How do these plan types relate to trivial plans?

OK. Enough jibber-jabber. Time to do something. Like create a stored procedure.  Which will be used in a database with AUTO_UPDATE_STATISTICS = ON and AUTO_UPDATE_STATISTICS_ASYNC = OFF.

This stored procedure will drop two tables if they already exist, then create those two tables.  The source table has three columns and a single index - the clustered primary key.  The target table also has three columns and a clustered primary key. In addition the target table has two non-clustered indexes.

The @insert_no_rows parameter controls whether or not there will be an INSERT query from the empty source table to the empty target table. That wouldn't move any data - but maybe it will cache a plan?

The INSERT query is dynamic SQL - both in the stored procedure and when I issue it adhoc later.  It isn't necessary for this query to be dynamic SQL - it was just easier that way to make sure the batch SQL text matched and plan re-use happened when it was a possibility. While writing this blog post at one point the query SQL text in the stored procedure has a different number of tabs preceding the formatted T-SQL than appeared in my adhoc query afterward and it took me forever to figure out why plan re-use wasn't occurring. :-) 

CREATE OR ALTER PROCEDURE test__trivial_plan_inserts @insert_no_rows INT = 0, @x INT = 0
	DROP TABLE IF EXISTS trivial_test_source;
	DROP TABLE IF EXISTS trivial_test_target;

	CREATE TABLE trivial_test_source
	(	col1 INT CONSTRAINT pk__trivial_test_source PRIMARY KEY CLUSTERED
	,	col2 INT
	,	col3 INT

	CREATE TABLE trivial_test_target
	(	col1 INT CONSTRAINT pk__trivial_test_target PRIMARY KEY CLUSTERED
	,	col2 INT INDEX nci__trivial_test_target__col2
	,	col3 INT INDEX nci__trivial_test_target__col3

	-- white space in the INSERT dynamic SQL must match exactly
	-- in stored procedure and adhoc EXEC for cached plan reuse
	IF @insert_no_rows = 1
		DECLARE @sqlT NVARCHAR(1000) = N'INSERT INTO trivial_test_target SELECT * FROM trivial_test_source;'
		EXEC (@sqlT);
	-- white space in the INSERT dynamic SQL must match exactly
	-- in stored procedure and adhoc EXEC for cached plan reuse

	;WITH	n0 AS (	SELECT TOP (32) n = 1 
					FROM master.dbo.spt_values)
					FROM n0 t1 CROSS JOIN n0 t2)
	INSERT INTO	trivial_test_source
	SELECT		TOP (@x) n, n, n
	FROM		n1;

For the first experiment, let's not worry about caching plans. Let's call the stored procedure like this:

EXEC test__trivial_plan_inserts @insert_no_rows = 0, @x = 1023;

So, give me 1023 rows in table trivial_test_source.

After the stored procedure executes, let's grab an actual plan from this simple INSERT query:

DECLARE @sqlT NVARCHAR(100) = N'INSERT INTO trivial_test_target SELECT * FROM trivial_test_source;'
EXEC (@sqlT);
-- white space in the INSERT dynamic SQL must match exactly
-- in stored procedure and adhoc EXEC for cached plan reuse

And here's the actual plan.  That's a narrow, per-row plan. Note the estimate of 1023 rows; dead-on.

The Object properties in the graphical plan specify the target table indexes which will be updated for each row inserted into the clustered index. So far, so good.

If we look at the plan XML, we can see this is a trivial plan.

OK. Now let's call the stored procedure like this:
EXEC test__trivial_plan_inserts @insert_no_rows = 0, @x = 1024;

That will drop the source and target tables, recreate the tables, and populate the source table with 1024 rows instead of the previous 1023.

Now get an actual plan for the insert again:

DECLARE @sqlT NVARCHAR(100) = N'INSERT INTO trivial_test_target SELECT * FROM trivial_test_source;'
EXEC (@sqlT);
-- white space in the INSERT dynamic SQL must match exactly
-- in stored procedure and adhoc EXEC for cached plan reuse

And... bam!!! Yeah, the graphical plan below certainly looks like a different plan. This is what Paul White refers to as a wide, per-index update.

First, at the right of the blue box below, the source table scan operator feeds rows into the clustered index insert operator. No sort is needed to optimize the insert because the source and target tables have the same primary key clustered index definition. These rows are also fed into a table spool at the left of the blue box below.  The table spool at the left of the blue box is the same table spool at the left of the gold box - just different zones of the same plan.

Table spool populated, it is used as a source for a sort, then insert into a non-clustered index.  This happens once in the upper blue box, then again in the lower gold box.

Demonstrating the benefit of a per-index update over a per-row update for an INSERT of many thousand rows is left as an exercise for the reader.

While I won't investigate at this time, I do want to make note of an interesting switcheroo that takes place. Recall that within the CREATE TABLE, the clustered primary key and the two non-clustered indexes on col2 and col3 were created with inline syntax.

How did the non-clustered index on col3 become index_id 2, while the nonclustered index on col2 became index_id 3?  I don't know - and i don't know of anywhere that could be consequential. Yet.

How did I notice that little switcheroo?  Well, every per-index INSERT graphical plan I've seen has shown indexes in index_id order from top-down.  And when I've observed large per-index updates with sys.dm_tran_locks, the acquisition of locks over time by the session indicates indexes are tended to in index_id order.  I won't at this time be investigating further whether per-index updates always handle indexes in index_id order, or how the little index_id switcheroo occurred. Just an interesting bread crumb.

Before we do the next part I want to make sure the database is set up like I think...
Excellent, exactly what I wanted.

So. This time we drop the tables and recreate them. We issue the INSERT INTO... SELECT while both source and target tables are empty. Then populate the source table with 1024 rows.

EXEC test__trivial_plan_inserts @insert_no_rows = 1, @x = 1024;

And now we are ready to grab an actual plan...

DECLARE @sqlT NVARCHAR(100) = N'INSERT INTO trivial_test_target SELECT * FROM trivial_test_source;'
EXEC (@sqlT);
-- white space in the INSERT dynamic SQL must match exactly
-- in stored procedure and adhoc EXEC for cached plan reuse

So now we have 1024 rows - that number of rows previously got a per-index update plan.  But this time it got a per-row plan. Huh.

Well, what the heck.  If we look at the plan XML we see a clue. RetrievedFromCache.

Now wait just a minute. That source table went from 0 rows to 1024 rows; did the stats get updated?

Wow. I guess not. When the insert matched a cached trivial plan, the stats did not get updated even though they were stale and auto update stats is true in this database.

So - what if I try this again? Drop the tables, recreate the tables, run the insert with zero rows to put a plan in the cache. Put 1024 rows in the source table. Then... I'll explicitly update stats.

EXEC test__trivial_plan_inserts @insert_no_rows = 1, @x = 1024;
UPDATE STATISTICS trivial_test_source;

OK, stats are updated.

So what happens now? Ouch. It doesn't matter.  If the stats are stale and there's a matching cached trivial plan - the stats don't get updated.  If the stale stats ARE updated before the query is executed - the matching trivial plan isn't invalidated. So that matching cached trivial plan will still be used. Even though I don't want it to be used. I just want SQL Server to take a fresh look at what's going on.

And that fresh look can solve the problem.  Clearing the plan cache with DBCC FREEPROCCACHE (or ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE CACHE in the relevant database) would cause SQL Server to take a fresh look, and decide if the INSERT should get a per-index plan instead of a per-row plan*.  Tacking OPTION(RECOMPILE) hint on to the end of that INSERT would also work in a much more targeted manner. Sometimes, in order to prevent a cached trivial plan from forcing a per-row update when we really want a per-index update, gonna hafta encourage SQL Server to take a fresh look in one of those ways.

*There are a number of SQL Server instance-wide configuration settings changes to which will also clear plan cache, such as maxdop and cost threshold for parallelism. But i don't recommend changing any of them solely to clear plan cache. Microsoft lists the configuration options which clear plan cache on the following BOL page.

Tuesday, December 7, 2021

#SQLServer - Database Cache Suppression

Third blog post in a quick series about potential trouble at the high AND low side of [\SQLServer:Buffer Manager\Target pages].  So far I've primarily talked about trouble on the low side of [Target pages], due to artificial restriction on the size of the database cache instance-wide, and an uneven expectation of [Free Memory] across the SQLOS memory nodes which can result in some SQLOS memory nodes having almost no database cache.  This blog post is another example of the uneven demand for [Free Memory] across SQLOS memory nodes.

Here are the first two blog posts, in order, where I started discussing this topic.

#SQLServer Column Store Object Pool -- the Houdini Memory Broker Clerk AND Perfmon [\SQLServer:Buffer Manager\Target pages]

#SQLServer Database Cache Memory Magic at SQLOS Node Level -- Chaotic or Even A Little Bit Sinister

The previous two posts have focused on December 1 on a particular system. That system is unusual in that it has 7 vNUMA nodes.  That's a pretty surprising number, and it comes from the vNUMA configuration of that 80 vcpu vm not being aligned with physical resources beneath.  That'll get straightened out sometime soon and it'll live the rest of its life as a 4x20 as soon as it does.

How much of the problem do I expect to go away once that system is a 4x20?  A little. But it won't remove the problem entirely.

Let's look at a different system in this post - a 4x22, 88 vcpu system set up as good as I know how.

Well... yeah, there's something I don't necessarily like to see.  Shortly after 3:15 am the database page read rate increases but database cache size decreases.   Sometimes that happens, due to other memory commitments.  There's only so much memory and if there's another need, database cache might hafta shrink even if it means more reads and more read waits.  But in this case it isn't apparent there is a need for ~256 gb of SQLOS [Free Memory] to be maintained for over an hour.  It looks like maybe something was anticipated, but never happened?

Sure enough - as in the example we've been looking at previously, there's suddenly a huge swing in [Target pages], and it predicts the change in [Free Memory] retention.

Adjust the ratio of Y-axes to 8:1 to account for the left axis measured in KB, and the right axis measured in 8kb pages...

Now [Target pages] is pretty nicely redictive of [Database Cache Memory (KB)].  Until nearly 4:45 am, anyway, when [Target pages] goes back to the stratosphere. 

So for well over an hour, on this SQL Server 2019 instance on a nicely laid out 4x22, 88 vcpus vm SQLOS maintained over 256 gb of [Free Memory].  While also under considerable database page read load.  That very likely lead to much higher page re-reads than would have been necessary otherwise.  Other than that - how fairly distributed among the SQLOS memory nodes was production and maintenance of [Free Memory]?  Ouch. The intensity of page freeing on individual SQLOS nodes is just as intense on this 4 node system as it was on the 7 node system.  Individual nodes still get down to almost no database cache.  But conditions seem to change faster - things stay really bad for less time than on the 7 node system.

I haven't looked at a 2 node system yet.  Or an 8 node sysem.  I definitely predict the 8 node system will look... scary.  I might be able to round up a 2 node system before an 8 node system to evaluate.