Monday, August 25, 2014

NTFS cache & SQL Server 2014 perfmon Resource Pool Stats (*)\Disk Read Bytes/sec

I like using SQL Server backup to null device to characterize read throughput of a system.

Some background reading on the idea.
BACKUP DATABASE TO DISK = N'NUL' – and misconceptions
http://sqlservermct.wordpress.com/2011/10/07/backup-database-to-disk-n%E2%80%98nul%E2%80%99-%E2%80%93-and-misconceptions/

Here's me performing a backup to NUL on one of the performance test rigs while the main RDBMS performance testing engineer is away at VMWorld - sssshhhhh :-)





In this case the database data footprint is 2.24 TB nearly evenly split over 8 LUNs and 8 equally sized data files, as detailed above.

568 MB/sec average over 1 hour 8 minutes 59 seconds isn't too bad on this server, really.  The SAN storage limits this server to 800 MB/sec of bandwidth anyway.

Still... I always look over time graphs when I want to learn something.  And my only real plan for job security is: try to learn things really, really fast :-)

Here's something pretty interesting.  Perfmon yields a lot of useful information for Resource Governor Pools.  You can separate some disk IO measurements at the Resource Governor level with counters such as these:
\SQLServer:Resource Pool Stats(*)\Disk Read IO/sec   
\SQLServer:Resource Pool Stats(*)\Disk Read Bytes/sec   
\SQLServer:Resource Pool Stats(*)\Avg Disk Read IO (ms)

And many DBAs are excited about the ability to throttle SQL Server IO with Resource Governor:
\SQLServer:Resource Pool Stats(*)\Disk Read IO Throttled/sec

I've got my reasons to be wary of IO throttling at the database level... for now I'll let the perfmon capture illustrate one of the reasons.

During the 'Arbor' database backup to NUL, no other significant activity was taking place within SQL Server or on this server.  Since 800 mb/sec was previously established as disk bandwidth upper bound on this server, "\SQLServer:Resource Pool Stats(internal)\Disk Read Bytes/sec" peak values nearly double the bandwidth limit may seem out of place.






 However, if you are familiar with the ideas discussed in the following blog posts, you may have an idea how the server is seemingly exceeding its upper bound for disk throughput.
http://www.oraclerealworld.com/oracle-physical-io-not-always-physical/
http://blog.delphix.com/mhayward/2013/02/21/lies-damned-lies-and-io-statistics/

The use of filesystem cache can skew SQL Server database IO numbers just as it can Oracle IO numbers.  The following perfmon counters quantify the read bytes & read operations satisfied from NTFS filesystem cache.
\System\File Read Bytes/sec   
\System\File Read Operations/sec

Notice in the graph above that "\LogicalDisk(_Total)\Disk Read Bytes/sec" remains well below 800 mb/sec.  But if those values have the values for "\System\File Read Bytes/sec" stacked on top... the result is "\SQLServer:Resource Pool Stats(internal)\Disk Read Bytes/sec".

The graph immediately below shows the same to be true for read operations as well as read bytes/second.  There was a bit of an outlier for a few minutes at noon - lots of file read operations that aren't accounted for in the Resource Pool stats.   If I were desperate I'd dig into the data to find out what happened - maybe startup of a third party monitoring agent, maybe a scheduled job outside of SQL Server.  If I were dishonest I'd just remove the outlier and smooth the data :-) But its pretty easy to see that this is really how these counters fit together... with some variance introduced by activity across SQL Server Resource Governor Resource Pools and potentially even outside SQL Server itself.



Notice that at peak, about half of the IOPs accounted for in the Resource Pool Stats are actually satisfied from memory.

Hmmm...



Satisfying the read into the buffer pool from filesystem cache rather than from the SAN is much faster.  So much faster that in this case, where 50% of read activity comes from filesystem cache, the read latency reported by the Resource Pool is actually only *half* the read latency from storage!

I'll follow up on this in the future; remaining questions:
1. Are reads from filesystem cache counted by Resource Governor against IOPs limits?  I imagine they are... but at the same time that means that admins may be in part limiting access to filesystem cache when they want to limit access to storage.
2. Are the read latencies reported in the virtual file stats DMV also representative of both "physical reads" and "filesystem cache reads"?

Keep lookin',
keep learnin'
...