Friday, November 22, 2013

Multiple tempdb data files for performance - What could go wrong?

Sure enough, there can be trouble with too many tempdb files on a single LUN.  I am seeing it more and more often these days.  Believe me, its none too easy to convince folks to change when, for the most part, the most respected names in SQL Server performance talk frequently about multiple tempdb data files, but almost never about using multiple LUNs to host those files.
Lets dive into that, at least for a quick lap.

OK.  Lets start with a reasonable measuring stick for whether or not there is a performance problem.  I'll use Paul Randal's recent categorization of disk latency (though for DSS/Data Warehouse systems I typically categorize IO performance in a different manner).
http://www.sqlskills.com/blogs/paul/are-io-latencies-killing-your-performance/

    Excellent: < 1ms
    Very good: < 5ms
    Good: 5 – 10ms
    Poor: 10 – 20ms
    Bad: 20 – 100ms
    Shockingly bad: 100 – 500ms
    WOW!: > 500ms

Ok... here's the system: HP DL380 with two 3.07GHz hex-core Xeon x5675 processors, just shy of 96GB RAM, 2 dual port Emulex HBA adapters.
The OS is WS2008R2 SP1, and the database version is Microsoft SQL Server 2008 R2 (SP2).

Details as they appear in a smash-n-grab...

****
OS Name:                   Microsoft Windows Server 2008 R2 Enterprise
OS Version:                6.1.7601 Service Pack 1 Build 7601

Total Physical Memory:     98,294 MB

System Manufacturer:       HP
System Model:              ProLiant DL380 G7
System Type:               x64-based PC
Processor(s):              2 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 44 Stepping 2 GenuineIntel ~3066 Mhz

Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz

Emulex AJ763A/AH403A, PCI Slot 2, Storport Miniport Driver 
Emulex AJ763A/AH403A, PCI Slot 2, Storport Miniport Driver                         
Emulex AJ763A/AH403A, PCI Slot 3, Storport Miniport Driver 
Emulex AJ763A/AH403A, PCI Slot 3, Storport Miniport Driver
****

This is not a monster system by any means.  The storage is pretty standard: virtual pools on an EMC VNX.

The file layout includes 24 tempdb files - one for each logical processor (hyperthreading is enabled).  There are 12 tempdb files on each of two LUNs.

U:\tempdb\tempdb.mdf
U:\tempdb\Tempdev_2.ndf
...
...
U:\tempdb\Tempdev_12.ndf
V:\tempdb\Tempdev_13.ndf
...
...
V:\tempdb\Tempdev_24.ndf
W:\MSSQL\Log\templog.ldf



Perfmon data was collected from the system with a 15 second interval.
Lets tackle an easy question first.  On this system is it accurate to say that 'most of tempdb is in the buffer pool'?  Lets use these perfmon counters to determine that.
\\SASQ1024\SQLServer:Buffer Manager\Database pages
\\SASQ1024\SQLServer:Transactions\Free Space in tempdb (KB)


During the observed window, tempdb ranges from 172 GB to 320 GB of free space.  At a minimum, the highwater mark of tempdb data footprint was 148 GB during this timeframe.
At the same time, the bpool database pages ranged from 38.3 GB to 58.8 GB.  Even if the bpool database pages contained only tempdb, it would contain less than half of the highwater mark of the tempdb data footprint.
So, on this system disk concerns for tempdb can be a significant performance factor.

How busy is this system?  Not so busy, according to these perfmon metrics.
\\SASQ1024\Processor(_Total)\% Processor Time   
\\SASQ1024\SQLServer:Buffer Manager\Page lookups/sec
 

 So, the server is not too busy.  How do read and write latency look?
   
\\SASQ1024\LogicalDisk(U:)\Avg. Disk sec/Read
\\SASQ1024\LogicalDisk(V:)\Avg. Disk sec/Read


Uh-oh.  The server is not very busy, but spending a lot of time above the 'poor' threshold, with lots of 'shockingly bad' values and some 'Wow!'.

Lets try to correlate the high read latency to other perfmon metrics, in order to find out how to help.

\\SASQ1024\LogicalDisk(U:)\Disk Reads/sec
\\SASQ1024\LogicalDisk(V:)\Disk Reads/sec
\\SASQ1024\LogicalDisk(U:)\Disk Read Bytes/sec   
\\SASQ1024\LogicalDisk(V:)\Disk Read Bytes/sec





 


Hmmm... so read throughput, whether operations or bytes, doesn't seem to correlate well to the spikes in latency.

What about write throughput?

\\SASQ1024\LogicalDisk(U:)\Disk Writes/sec
\\SASQ1024\LogicalDisk(V:)\Disk Writes/sec
\\SASQ1024\LogicalDisk(U:)\Disk Write Bytes/sec
\\SASQ1024\LogicalDisk(V:)\Disk Write Bytes/sec







 So, writes and write bytes to tempdb seem pretty well correlate to the spikes in read and write latency.  Interesting. Why would writes interfere with read latency in such a strongly correlated way?
  
The 'current disk queue length' give us a clue, although they don't explain the whole story.  And neither will I.  Not today.  I'm sleepy.
\\SASQ1024\LogicalDisk(U:)\Current Disk Queue Length
\\SASQ1024\LogicalDisk(V:)\Current Disk Queue Length






I'll end for now by repeating my recommendations for tempdb, which I recently gave out on the #sqlhelp hashtag. 

Go ahead and use multiple tempdb files (a reasonable number) - but one file per LUN.  This helps:
1. limit IO weaving
2. limit filesystem fragmentation when files expand
3. limit the contention for the disk queue when multiple files on a single LUN are hot

Consider using trace flags 1117 (trigger all database files in the same filegroup to grow together) and trace flag 1118 (no mixed extents).

Increase the maximum transfer size on the fibre channel HBA from the WIndows default of 512kb to 2mb.  This is good for all kindsa reasons - fewer round trips on the data network when SQL Server wants a big read, when larger reads are common there will be lower disk queue length for the same total amount of data, etc.

OK.. Be well, y'all.









2 comments:

  1. Very interesting. How was the issue resolved? Did the recommendations help?

    ReplyDelete
  2. I'll post a followup in the future. Increasing the maximum transfer size had the intended effect of lowering the IOPs and current disk queue length across the board. But along the way we've realized there is a yet-undiagnosed issue limiting the overall throughput to the server, as well. Even though the full bandwidth at the host HBA and at the array front end adapter ports should allow for 1.6 gb/second, the server is never getting above 800 mb/second even though there is CPU to spare, long queue lengths (although shorter than before) and read/write latency still well below the targets.
    So we opted not to change the tempdb configuration yet - once the bandwidth issue is addressed we can re-baseline. Then comparing that baseline against new perfmon data after tempdb config has changed will show the benefit of changing tempdb configuration. If we changed tempdb config while bandwidth was constrained, part of the benefit would be obscured. Removing the bandwidth bottleneck later would result in a performance benefit from the combined effect of increased bandwidth and more favorable tempdb config, and it would be very hard to sort out the level of benefit to the individual changes.

    ReplyDelete