Wednesday, February 13, 2013

MSSQL bpool disfavoring

I expect trace flag 8015 + trace flag 8048 to be an all around win for the systems I work with, when on NUMA servers. These systems have two primary workloads: regularly scheduled batch ETL (thousands of tables, concurrency managed by an external agent) and regularly scheduled report batches (thousands of reports, concurrency managed from report app servers).

Some simulated workflow testing, and live system observation, can confirm or negate my expectations. But messing with memory management and task scheduling with a single trace flag is a big deal. The main workflows are not the only important considerations. To etl and batch report, I'll add as important considerations, at least: dbcc checkdb. Index rebuilds. Statistics updates. Backups. Recovery. Restore. Tx logging.

In order to understand more fully the implications of trace flag 8015 on a NUMA system, I want to learn more about the management of the bpool.

Trace flag 8015 is expected to help in three ways:
1. More balanced scheduling of tasks across all schedulers.
2. More consistently balanced use of bpool across the server, instead of very imbalanced PLE and buffer churn across memory nodes.
3. Elimination of potential query memory inflation due to away buffer lists, leading to more consistent query memory use for a given query for trending, investigation, and optimization purposes.

The following blog post includes some points about bpool management that I haven't seen addressed anywhere else. (There's also mention in Bob Dorr's SQL server io presentation that the touch count algorithm previously used has been abandoned in favor of time-of-last-access; must be time of penultimate access for LRU-K/2 algorithm? Oracle still uses touch count, I believe.)

No questions on the following two matters, yet :)
-"bstat" in DBCC PAGE output in page header has one bit, that says if the page is disfavoured or not, but these bits are not documented
-DBCC CHECKDB disfavours pages read in the buffer pool
-sys.dm_db_index_physical_stats also uses disfavouring

But the following two points raise lots of questions.
##UPDATE 2/13/2013## I don't feel too badly about having lots of questions about the disfavor of scans based on size relative to the bpool.  Paul Randal posted about the disfvoring done by dbcc checkdb today, and mentioned that he hasn't reproduced the scan resulting in disfavoring yet.

-Lazywriter uses the BUF structure to know the last 2 times, when a page was accessed
-When a scan reads data of more than 10% of the buffer pool, the pages are disfavoured immediately

The questions so far:
*Does BUF structure track physical access times(such as physical write and readahead), or logical only?

*In Oracle there is a stat "prefetched blocks aged out before use". Under bpool pressure can SQL Server evict readahead blocks before they are used?

* On a NUMA server, does disfavor of a scan begin at 10% of relevant NUMA node buffer pool?

* How does this work with bpool ramp up?

* is the decision to disfavor made before the scan starts?

* Do all pages of the scan get disfavored, or only those after the 10% threshold is exceeded?

* if the first scan pages to be disfavored are those after the 10% threshold, are the pages already in the bpool 'retro-disfavored'?

No comments:

Post a Comment