Wednesday, December 7, 2016

30,000 maximum non-indexed column stats in a SQL Server database? << Breakin' the law, Breakin' the law...

**Update 2016 12 07**
Turns out the per-table limit is 30,000 statistics :-)
See end of post for my boring test that showed this.
***** 

Current SQL Server documentation (as of 2016 December 7) indicates that the maximum  number of non-indexed columns with statistics in a database is 30,000.

https://msdn.microsoft.com/en-us/library/ms143432.aspx






I suspect that is an old limitation that hasn't been updated.  (Or perhaps there's a wrinkle I haven't considered?)

I know of a pretty robust database :-) Check version so my results can be reproducible...

SELECT @@version AS SQLServer_version


SQLServer_version
Microsoft SQL Server 2014 - 12.0.4422.0 (X64) 
Jul 27 2015 16:56:19 
Copyright (c) Microsoft Corporation

Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.3 <X64> (Build 9600: )

That's SQL Server 2014 SP1 CU2.
So how many auto-generated stats and how many columns in the database?

SELECT
(SELECT COUNT (*)
 FROM  sys.stats s
 JOIN sys.tables t ON t.object_id = s.object_id
 WHERE t.type_desc = 'user_table'
 AND t.is_ms_shipped = 0
 AND s.auto_created = 1
 AND NOT EXISTS (SELECT 1 FROM sys.indexes
                 WHERE s.stats_id = i.index_id 
                 AND s.object_id = i.object_id)
 ) AS user_table_auto_created_stats,
(SELECT COUNT (*) 
 FROM  sys.columns c
 JOIN sys.tablesON c.object_id = t.object_id
 WHERE t.type_desc = 'user_table'
 AND t.is_ms_shipped = 0) AS user_table_columns


user_table_auto_created_stats user_table_columns
64765 158796

Wow.  That's a lot more than the documented limit of 30,000 for stats on nonindexed columns.

I thought about it a bit... maybe some of those auto-created stats weren't first ordinal position in the index - yet still considered "indexed columns" for this purpose?

Modified the query a little, as below, to exclude columns that are accounted for in sys.index_columns - even the "include" columns.  That exclusion didn't decrease the number of auto-created stats in the database at all.  So it looks to me that the documented maximum is well below the actual maximum.  Maybe next year I'll poke around to find the actual maximum... not all that important to me at the moment, just an oddity that I came across.

SELECT COUNT (*) AS user_table_auto_created_stats
FROM  sys.stats s
JOIN sys.tables t ON t.object_id = s.object_id
JOIN sys.stats_columns sc ON sc.object_id = s.object_id AND sc.stats_id = s.stats_id
WHERE t.type_desc = 'user_table'
AND t.is_ms_shipped = 0
AND s.auto_created = 1
AND NOT EXISTS (SELECT 1 FROM sys.indexes i 
                WHERE s.stats_id = i.index_id 
                AND s.object_id = i.object_id)
AND NOT EXISTS (SELECT 1 FROM sys.index_columns ic 
                WHERE s.stats_id = ic.index_id 
                AND s.object_id = ic.object_id
                AND sc.column_id = ic.column_id)


user_table_auto_created_stats
64765

***** Updated *****

A friend pointed out that the same references indicates a maximum of 30,000 columns in a wide table.  That got me thinking - maybe 30,000 stats is a per-table maximum?

Not too hard to test.  Yep - limit per table.

PRINT @@version;
CREATE TABLE dbo.max_stats (bigint_1 BIGINT);
DECLARE @count INT = 1, @sqlText NVARCHAR(1000)
WHILE @count < 30100
BEGIN
SET @sqlText = 'CREATE STATISTICS max_stats' + CONVERT(NVARCHAR(256),@count) + ' ON dbo.MAX_STATS(bigint_1)'
EXEC (@sqlText)
SET @count = @count+1
END


Microsoft SQL Server 2014 - 12.0.4422.0 (X64)
    Jul 27 2015 16:56:19
    Copyright (c) Microsoft Corporation
    Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.3 <X64> (Build 9600: )


Msg 1910, Level 16, State 1, Line 1
Could not create statistics 'max_stats30001' because it exceeds the maximum of 30000 allowed per table or view.


00:03:13 

Wednesday, November 30, 2016

Three Minutes to compile a very simple #SQLServer Query plan?!?

If this issue isn't reproducible again tomorrow I'll be upset.  Hoping to finally get to the bottom of this.  Let me know if you have ideas to investigate.  My plan is to use trace flag 9292 and 9204 to find out which stats the optimizer is interested in,  Then create a new, empty copy of table Fact, transplanting the stats to the empty table.  And hopefully compiling the query against the empty table will take just as long.  If I can reproduce the problem in a completely clean context I'll have the most options for collaborative investigation.

A very simple validation query is being issued (below) - a filtered include index exists that satisfies all conditions with the first row returned.

And in fact execution time itself is very, very fast.  But... compile time is ridiculous.  Over 3 minutes.  Output from trace flag 8675 is below.  Time to compile was almost identical with and without trace flag 8757 added to the mix.

This system is SQL Server 2012 SP3. Because its a production system and production data, no windbg.  But if the issue is present in a query against an empty table with transferred stats, next step will be trying to port the empty table, stats, and problem to a dev instance :-)




SELECT  TOP 1 Id_1 
FROM    Fact 
WHERE   Category_1 = '55' 
AND     Id_1 IS NOT NULL 
AND     Bit = 1 
AND     String_1 IS NOT NULL
OPTION (QUERYTRACEON 3604, --redirect trace flag output
        QUERYTRACEON 8675, --optimizer stages & timing
        QUERYTRACEON 8757, --no early quit with trivial plan
        RECOMPILE)

End of simplification, time: 0 net: 0 total: 0 net: 0

end exploration, tasks: 36 no total cost time: 198.649 net: 195.236 total: 198.65 net: 195.237

end search(1),  cost: 0.0032832 tasks: 74 time: 0 net: 0 total: 198.65 net: 195.237

End of post optimization rewrite, time: 0 net: 0 total: 198.651 net: 195.237

End of query plan compilation, time: 0 net: 0 total: 198.651 net: 195.237

****

The elapsed time for the session was almost completely CPU time - not dominated by waits at all.  There was some other activity on the server at the time of observation - but not much.  Here are the spinlock deltas from measurements before and after the query was executed.

name collisions spins sleep_time backoffs
SOS_BLOCKDESCRIPTORBUCKET 365291 893170987 827998 223884
SOS_OBJECT_STORE 117525 425010979 126664 90510
SOS_CACHESTORE_CLOCK 787 138375000 2319 584

That doesn't tell me much... since just about the only web search hits for SOS_BLOCKDESCRIPTORBUCKET are my own tweets wondering what they might mean :-)


An idea occurred to me - maybe this resulted from an overly full plan cache hash bucket?

SELECT plans_in_bucket,count(*) as number_of_buckets
FROM
(
SELECT bucketid,count(*) as plans_in_bucket
FROM sys.dm_exec_cached_plans
GROUP BY bucketid
) inr
GROUP BY plans_in_bucket

ORDER BY plans_in_bucket DESC

plans_in_bucket
number_of_buckets
46
1
39
1
36
1
34
1
30
1
24
2
23
1
22
1
20
2
19
4
17
2
16
4
15
7
14
8
13
15
12
16
11
34
10
128
9
264
8
708
7
1499
6
3115
5
5158
4
7484
3
8631
2
7398
1
4242

Not likely with the high count bucket at 46.  Total cached plans: 140630.

Tuesday, November 1, 2016

A little bit of play when measuring #SQLServer Resource Governor resource pool reads vs LogicalDisk reads


I use perfmon a lot :-)
So I'm always looking for new patterns.

Today I just happened to graph these two perfmon metrics against each other.  Perfmon was collected with a 30 second interval here.

Its not too surprising that there is a difference.  But how could there be *more* read IOs per second for the default workload group than the total on the server? (See just before 4:00 for a great example.)


Maybe if I sum the two metrics over the 4 hour period they'll even out?

23446639.19   << sum of LogicalDisk reads
22787670        << sum of default pool reads

Huh.  LogicalDisk reads are 2.89% higher than default pool reads.  Not too hard to believe - but for the small stretches where the default pool reads are significantly higher.

Oh well.

Works well for trending.  Not so well if trying to accurately count individual reads.


Friday, October 28, 2016

SQL Server Memory Accounting with Perfmon

After learning to account for database memory accounting with DMVs & DMFs, the numbers from perfmon can be downright confusing.  Lack of alignment with the DMV/DMF values notwithstanding* I believe using perfmon to profile SQL Server resource utilization and response times can be extremely valuable.  For me, its an indespensible tool.  Using a perfmon counters file and a bat script with a logman command, I can easily distribute a tool to collect metrics from a physical server or vm running SQL Server in 30 second intervals.  I wouldn't want to try to collect the same information in 30 second or even 1 minute intervals with SQL Agent or service broker.  Anyway...

Let's look at the total of SQL Server shared memory for SQL Server 2012 and beyond.  As I've mentioned before, the shared memory can be fully accounted for with database cache, stolen, and free categories.


By zooming in on the 108 GB to 124 GB range we can see that the sum of the three categories is equal to "Total Server Memory".


Lets look at stolen memory a bit.  The relationship between memory grants and stolen memory is probably the least intuitive relationship.  Remember - if a query gets a memory grant the grant happens at the beginning of query execution.  Its just a promise of sort/hash memory to be made available when the query needs it.  The grant memory isn't stolen immediately - rather its stolen in small allocations over time as needed by the query.

In the graph immediately below, the outstanding grants are shown over time.  There are no pending grants during the observation period.  Granted memory and reserved memory are both shown as areas, with reserved memory in front of granted memory.  Granted memory is consistently greater than reserved memory (in this case, no resource pools have been added beyond the pre-existing default and internal pools).  This is how we can determine that the reserved memory is granted memory which hasn't been stolen yet.


Here is the difference between granted memory and reserved memory, graphed as single measure area graph below, with the number of outstanding grants in a line graph.

Now lets stack several memory components in an area graph in front of stolen memory.  Let's put in several perfmon metrics directly as well as the difference between granted and reserved memory.



Notice that plan cache pages increase slowly throughout the capture period.  The amount of query memory used (granted kb - reserved kb) is more variable, and together with the other memory metrics selected trends very well with total stolen memory.

There's definitely at least 1 significant stolen memory consumer since up to 5 GB of stolen memory us currently unaccounted for.

Main things I wanted to show here:
-used query memory is accounted for in perfmon stolen memory
-although plan cache bloat is a fairly common topic, some workloads exeperience much more pressure from workspace memory than from the plan cache.

*as well as some other issues with perfmon that I'll detail at a later date

Tuesday, October 25, 2016

Harmonic mean of SQLOS Buffer Node PLEs on NUMA servers

By default, on a (v)NUMA server/vm, SQL Server will partition its shared memory resources by using a count of SQLOS memory nodes equal to the visible (v)NUMA nodes.

In that strategy, each SQLOS memory node gets its own IO completion port, its own lazy writer and in SQL Server 2016 there will be one transaction log writer per node (up to 4, on (v/l)cpus 0-3 as needed).  Each SQLOS node gets a portion of the database cache, stolen memory, and free memory.
When pages read from disk are inserted into the bpool for a worker, they are inserted into the database cache for the SQLOS memory node associated with the compute node for the scheduler running the worker.

Put all that together: each SQLOS memory node may be experiencing different rate and footprint of pages read, first-writes to new database pages, steal against memory grant (or other steals) and freeing memory.  I don't know the formula used for PLE... but each of those factors are part of the calculation.  So... each SQLOS node has its own PLE, and these PLE values can vary greatly.


Yet the node level PLE is a metric seldom checked.  Rather, the overall calculated PLE for the instance is the metric usually consulted.  Here's that metric added into the same graph.




I've been puzzling for quite a while how the overall PLE was calculated.  Its fairly obvious its not an arithmetic mean of the PLEs - it varies gradually even as individual PLEs vary greatly.  I figured perhaps it was a weighted average, maybe with weight determined by the amount of database cache on the individual nodes.

But that could also be quite volatile, since especially in the case of large hashes, memory can be stolen against a grant very quickly - rapidly shrinking the amount of database cache on an individual SQLOS node.

Eventually this Paul Randal blog post was pointed out to me.  The post is originally from 2011, but after Matt Slocum pointed out that arithmetic mean didn't fit for deriving the overall PLE it was updated in 2015. 

"The calculation is: add the reciprocals of (1000 x PLE) for each node, divide that into the number of nodes and then divide by 1000."

OK... let's plug that formula into Excel along with the data for the 4 SQLOS nodes and the overall PLE to see if it fits.



That's a pretty good fit.  The very first harmoonic mean I calculate rounds to 142 rather than the value of 144 that is the overall PLE.  That's not surprising - after seeing how good the fit is otherwise, I suspect that was a timing issue - volatility in the 000 and 001 PLEs probably lead to a slightly different value between the time that the individual SQLOS node values were reported and the value a short time later when the overall PLE was reported.

A graph of my calculated harmonic mean and the overall PLE shows just how good the fit is...



That's good enough for me :-)

I'm glad that Paul Randal and Matt Slocum dug into this... one fewer question in my heap.







Thursday, October 6, 2016

Migration to pvscsi from LSI for SQL Server on VMware; It *really* matters

I expect it to make a big difference.  Even so, I'm still pleasantly surprised how much of a difference it makes.

About the VM:
8 vcpu system.
19 logicaldisk/physicaldisks other than the C Windows install drive.
About the guests vdisks:
Each guest physicaldisk is its own datastore, each datastore is on a single ESXi host LUN.

On July 27th, the 19 SQL Server vdisks were distributed among 3 LSI vHBA (with 1 additional LSI vHBA reserved for the C install drive).

I finally caught back up with this system.  An LSI vHBA for the C install drive has been retained.  But the remaining 3 LSI vHBA have been switched out by pvscsi vHBA.

The nature of the workload is the same on both days, even though the amount of work done is different.  Its a concurrent ETL of many tables, with threads managed in a pool and the pool size is constant between the two days.

Quite a dramatic change at the system level :-)

Lets first look at read behavior before and after the change.  I start to cringe when read latency for this workload is over 150 ms.  100 ms I *might* be able to tolerate.  After changing to the pvscsi vHBA it looks very healthy at under 16 ms.


OK, what about write behavior?

Ouch!! The workload can tolerate up to 10ms average write latency for a bit.  5 ms is the performance target.  With several measures above 100 ms write latency on July 28th, the system is at risk of transaction log buffer waits, SQL Server free list stalls, and more painful than usual waits on tempdb.  But after the change to pvscsi, all averages are below 10 ms with the majority of time below 5 ms.  Whew!



Looking at queuing behavior is the most intriguing :-) Maximum device and adapter queue depth is one of the most significant differences between the pvscsi and LSI vHBA adapters. The pvscsi adapter allows increasing the maximum adapter queue depth from default 256 all the way to 1024 (by setting a Windows registry parameter for "ringpages"). Also allows increasing device queue depth from default 64 to 256 (although storport will pass no more than 254 at a time to the lower layer).  By contrast, LSI adapter and device queue depths are both lower and no increase is possible.

It may be counter-intuitive unless considering the nature of the measure (instantaneous) and the nature of what's being measured (outstanding disk IO operations at that instant).  But by using the vHBA with higher adapter and device queue depth (thus allowing higher queue length from the application side), the measured queue length was consistently lower.  A *lot* lower. :-)



Wednesday, August 31, 2016

Perfmon SSIS Counters missing? Try running as admin...

A colleague and I were hoping to review SSIS perfmon counters on a VM.  We use a logman command with a counters file to log perfmon to csv.

Opened up the csv that was captured on the VM... there were all of my typical SQL Server counters... but the following SSIS counters were missing.

\SQLServer:SSIS Service\SSIS Package Instances
\SQLServer:SSIS Pipeline\Buffer memory
\SQLServer:SSIS Pipeline\Buffers in use
\SQLServer:SSIS Pipeline\Buffers spooled
\SQLServer:SSIS Pipeline\Flat buffer memory
\SQLServer:SSIS Pipeline\Flat buffers in use
\SQLServer:SSIS Pipeline\Private buffer memory
\SQLServer:SSIS Pipeline\Private buffers in use
\SQLServer:SSIS Pipeline\Rows read
\SQLServer:SSIS Pipeline\Rows written

Huh.

Got on the vm, and used:
typeperf -q | find "SSIS"

That command returned one line:
\SQLServer:SSIS Service 11.0\SSIS Package Instances

Huh.

I recalled recently learning that the VMware perfmon counters require privilege to display.  As I prattled on about that, my colleague simply launched a cmd.exe as an administrator.

In that elevated cmd.exe, the previous typeperf command returned all of the SSIS counters we expected.

Except they were all versioned, so we updated the counters file like so:
\SQLServer:SSIS Service 11.0\SSIS Package Instances
\SQLServer:SSIS Pipeline 11.0\Buffer memory
\SQLServer:SSIS Pipeline 11.0\Buffers in use
\SQLServer:SSIS Pipeline 11.0\Buffers spooled
\SQLServer:SSIS Pipeline 11.0\Flat buffer memory
\SQLServer:SSIS Pipeline 11.0\Flat buffers in use
\SQLServer:SSIS Pipeline 11.0\Private buffer memory
\SQLServer:SSIS Pipeline 11.0\Private buffers in use
\SQLServer:SSIS Pipeline 11.0\Rows read
\SQLServer:SSIS Pipeline 11.0\Rows written

With the perf counters file updated, and executing the logman command as an administrator, all of the expected counters appeared in our csv.

Now I should point out: there are other reasons the counters may be missing.  Sometimes they need to be reloaded with a lodctr command, like the following post.

Perfmon Counters for SSIS Pipeline

But that'll have to be a story for another day.

Finé