Wednesday, August 31, 2016

Perfmon SSIS Counters missing? Try running as admin...

A colleague and I were hoping to review SSIS perfmon counters on a VM.  We use a logman command with a counters file to log perfmon to csv.

Opened up the csv that was captured on the VM... there were all of my typical SQL Server counters... but the following SSIS counters were missing.

\SQLServer:SSIS Service\SSIS Package Instances
\SQLServer:SSIS Pipeline\Buffer memory
\SQLServer:SSIS Pipeline\Buffers in use
\SQLServer:SSIS Pipeline\Buffers spooled
\SQLServer:SSIS Pipeline\Flat buffer memory
\SQLServer:SSIS Pipeline\Flat buffers in use
\SQLServer:SSIS Pipeline\Private buffer memory
\SQLServer:SSIS Pipeline\Private buffers in use
\SQLServer:SSIS Pipeline\Rows read
\SQLServer:SSIS Pipeline\Rows written


Got on the vm, and used:
typeperf -q | find "SSIS"

That command returned one line:
\SQLServer:SSIS Service 11.0\SSIS Package Instances


I recalled recently learning that the VMware perfmon counters require privilege to display.  As I prattled on about that, my colleague simply launched a cmd.exe as an administrator.

In that elevated cmd.exe, the previous typeperf command returned all of the SSIS counters we expected.

Except they were all versioned, so we updated the counters file like so:
\SQLServer:SSIS Service 11.0\SSIS Package Instances
\SQLServer:SSIS Pipeline 11.0\Buffer memory
\SQLServer:SSIS Pipeline 11.0\Buffers in use
\SQLServer:SSIS Pipeline 11.0\Buffers spooled
\SQLServer:SSIS Pipeline 11.0\Flat buffer memory
\SQLServer:SSIS Pipeline 11.0\Flat buffers in use
\SQLServer:SSIS Pipeline 11.0\Private buffer memory
\SQLServer:SSIS Pipeline 11.0\Private buffers in use
\SQLServer:SSIS Pipeline 11.0\Rows read
\SQLServer:SSIS Pipeline 11.0\Rows written

With the perf counters file updated, and executing the logman command as an administrator, all of the expected counters appeared in our csv.

Now I should point out: there are other reasons the counters may be missing.  Sometimes they need to be reloaded with a lodctr command, like the following post.

Perfmon Counters for SSIS Pipeline

But that'll have to be a story for another day.


Wednesday, August 24, 2016

Windows Guest: VMware Perfmon metrics intermittently missing?

I'm using logman to collect Windows, SQL Server, and VMware metrics in a csv file(30 second intervals).

Not sure what's causing the sawtooth pattern below.  I thought it was a timekeeping problem on the ESXi host, now not so sure.

See this Ryan Ries blog post for what may be a similar issue, caused by clock sync of guest with ESXi host.!
But that is also a reversal of the problem somewhat.  In that case, guest metrics were returning -1.  In my case, its the VMware metrics (passed through from the host) that are missing.

The same pattern for "Host processor speed in MHz" adds to the mystery.
If only "Effective VM Speed in MHz" was missing, I'd chalk it up to a possible arithmetic issue, with a negative denominator resulting from time skew between guest and host.
But... "Host processor speed in MHz" should be a constant 2600 in this case.  Maybe somehow its still calculated with a time interval, and time skew can screw it up?

For now, I've got a loop running on this VM logging local time, and using NET TIME to retrieve time from another VM as well.  That's turned up variations, but it seems to be variations of up to 5 seconds in contacting the other VM rather than large time skew between the VMs.

Guess I'll see what turns up...

So far this is the closest I've found.  Similar problem reported - no resolution.

"Windows VM perfmon counters - VM Processor"

Thursday, August 4, 2016

Modeling SQL Server Behavior for Performance Intervention: Feed the CPUs

Before recommending any changes on a SQL Server system, I like to collect a number of days worth of perfmon data.  -- I also like to ask if there are particular windows of time or particular queries that are performance concerns.  That can help prevent me from spending lots of time on time windows or queries that are considered low value :-) But that's a story for another day.

When reviewing perfmon data with the intent of finding system optimizations, I like to say:
"It helps to have expected models of behavior, and a theory of performance."

I don't say that just to sound impressive - honest (although if people tell me it sounds impressive I'll probably say it more frequently).  There are a lot of thoughts behind that, I'll try to unpack them over time.

When I work with SQL Server batch-controlled workflows, I use the theory "feed the CPUs".  That's the simplest positive adaptation I could come up with of Kevin Closson's paradigm "Everything is a CPU problem" :-)

What I mean by "Feed the CPUs" is that memory and disk response times are primary factors determining the maximum rate for the CPUs to process the data.  Nuts & bolts of such a model for SQL Server are slightly different than a similar model for Oracle.  SQL Server access to persistent data is always through database cache, while Oracle uses shared access to database cache in SGA and private access to persistent data through direct access in PGA.

Here's where I'll throw in a favorite quote.

"Essentiallyall models are wrongbut some are useful."
Box, George E. P.; Norman R. Draper (1987)
Empirical Model-Building and Response Surfaces, p. 424
Wiley ISBN 0471810339

The workflows I work with most often are quite heavy on disk access.  The direction of change of CPU utilization over time will usually match the direction of change in read bytes per second.  So if you want to improve performance - ie shorten the elapsed time of a batch workload on a given system - Feed the CPUs.  Increase average disk read bytes per second over the time window.

That's a very basic model.  It ignores concepts like compression.  In my case, all tables and indexes should be page compressed.  If not I'll recommend it :)  It kinda sidesteps portions of the workload that may be 100% database cache hit.  (Although I consider that included in the "usually" equivocation.)  It definitely doesn't account for portions of the workload where sorting or other intensive work (like string manipulation) dominates CPU utilization.  Still, a simple model can be very useful.

Here's a 2 hour time window from a server doing some batch ETL processing.  You can kinda see how CPU utilization trends upward as read bytes per second trends upward, right?  I've highlighted two areas with boxes to take a closer look.

Let's look at the 8:15 to 8:30 am window first.  By zooming in on just that window & changing the scale of the CPU utilization axis, the relationship between CPU utilization and read bytes/second is much easier to see.  That relationship - direction of change - is the same as other parts of the 2 hour window where the scale is different.  Depending on the complexity of the work (number of filters, calculations to perform on the data, etc) with the data, the amount of CPU used per unit of data will vary.  But the relationship of directionality* still holds.  I'm probably misusing the term 'directionality' but it feels right in these discussions :-)

All right.  Now what about the earlier time window?  I've zoomed in a bit closer and its 7:02 to 7:07 that's really interesting.

Assuming that SQL Server is the only significant CPU consumer, and assuming that that whole 5 minute time isn't dominated by spinlocks (which at ~40% CPU utilization doesn't seem too likely), the most likely explanation is simple: 5 minutes of 100% cache hit workload, no physical reads needed.  That can be supported by looking at perfmon counters for page lookups/second (roughly logical IO against the database cache) and the difference between granted memory and reserved memory (which is equal to the amount of used sort/hash memory at that time).

OK... well, that's all the time I want to spend justifying the model for now :-)

Here's the original graph again, minus the yellow boxes.  There's plenty of CPU available most of the time.  (No end users to worry about on the server at that time so we can push it to ~90% CPU utilization comfortably.) Plenty of bandwidth available.  If we increase average read bytes/sec we can increase average CPU utilization.  Assuming that the management overhead is insignificant (spinlocks, increased context switching among busy threads, etc), increasing average CPU utilization will result in a corresponding decrease in elapsed time for the work.

All right.  So next we can investigate how we'll increase read bytes per second on this system.

Ciao for now!

Tuesday, August 2, 2016

SSAS Multidimensional Cube Processing - Shrinking the unfamiliar window

Last week I posted some of my perfmon graphs from an SSAS server.  I want to model the work happening on an SSAS server during multidimensional cubes processing - both dimension and fact processing.

That's here:
SSAS Multidimensional Cube Processing - What Kind of Work is Happening?

There was a window at the end of the observation period where data was still flowing across the network, CPU utilization was still high enough to indicate activity, but the counters for rows read/converted/written/created per second were all zero.  The index rows/sec counter was also zero.

A colleague read my post, looked at my data... and pointed out:
"One of your graphs shows what was happening there!"

Sure enough, he was right.  Even though I time-shifted my graphs to somewhat obscure the data, he noticed what activity filled the gaps.  I'm gonna hafta work hard to make sure these guys don't lap me soon. :-)

In-memory file work, mainly for dimensions. Well, that shrinks the mystery window for me :-)

Let's look at the very tail end of the observation.

At the very tail end of the observation period, data stops coming over the network.  But CPU utilization increases. And CPU utilization drops off as "In-Memory Other File KB" drops off.

So this "other file" was somewhat expensive to handle, even though only for a limited time.

Wonder what that file was, since it wasn't a Dimension Index, Property, or String file?

By now you're probably wondering - why don't you just use profiler already? :-)

I'll get there - soon, probably.  But I'm trying to lay the groundwork for understanding system activity independent of knowing the individual cubes being processed.  My colleagues and I will likely be reviewing many perfmon captures across many cubes servers.  Some of the captures will almost certainly be started after cubes processing has begun.

So I'm hoping to get as far as possible from just the perfmon files, before returning for another source of information.

But certainly - being able to tie out system activity to process step beginning and end from profiler is preferable if in a position to influence cube design and development from the front end :-)