Thursday, September 17, 2015

Planning for VMware database OIO

I've mentioned before that planning for database disk io is a negotiation-ideally among application owners, DBA, and storage administrators (additional stakeholders may be virtualization admins or server administrators).
This blog post will demonstrate why I think of this optimization as a negotiation.

A bunch of thoughts about the blog post below, which I'll clean up and then reference in the comments there.  Maybe even add pictures?

I don't want this to be recieved as a rant, moreso a method for planning to accommodate expected levels of database outstanding io (OIO) on VMware systems.


Disk queues always fill from application/database level down-eg guest LUN => guest vHBA adapter => host LUN => host HBA adapter.

Since there is only 1 guest LUN for the database, and maximum guest LUN service queue depth is 254, that's the max oio that LUN will pass to the guest adapter*, which will get passed to the host LUN, and on to the host adapter.  Additional slots at lower levels won't be used.  The 512 command per host LUN limit leads to lots of empty slots if the guest can only send 254.  Add another guest LUN on the host LUN, and max OIO for both guest LUNs becomes 508, a better match.

*But* in the article EMC VNX & VMAX storage are mentioned. For the VNX host LUN queue depth should be limited: if using virtual provisioning no greater than 32 to prevent QFULL messages.  Otherwise, (14*data disks)+32 is the VNX max LUN queue depth to avoid QFULL messages.

The VMAX is a bit more generous. 64 is commonly used as a LUN queue depth, and presented LUNs are guaranteed resources for a queue depth of 64.  Resources can be borrowed to serve a queue length up to 384 if the queue depth was set higher than 64.

So - it's not just the front end port max aggregate queue depth that needs considered in planning, but host LUN max queue depth as well. Hitachi VSP is another example: recommended max host LUN queue depth of 32.

Application side determines io needs, and can design from top down.  Disk Io needs can be expressed in terms of peak read & write bytes/sec, iops, outstanding io with a duration of peak.  Further characterization can come from descriptions of burstiness, sequential/random, and distribution of IO sizes.

On the hardware side, design from bottom up.  Read IOPs & read bytes/sec are important considerations for determining a sufficient number of HDDs or other devices to meet spec. 

Write bytes/sec is used to validate the amount of SAN write cache available.  Rate of destaging that write cache is dependent on how many devices underneath, RAID CPU and write amplification factors, and how many read cache misses are satisfied while write cache is destaging.

Total front end IOPs is used to validate sufficiency of front end CPU.

Total bytes/sec(read and write) is used to validate storage network bandwidth end-to-end.  

The host LUN queue depth is based on array considerations.
The amount of oio the application needs determines the number of host LUNs.  Guest LUNs are in the same count, with a guest LUN queue depth that agrees with the host LUN queue depth.
Now, present as many host LUNs (with guest LUNs on top) to meet the outstanding IO requirement.  If outstanding IO requirement is 512 and LUN queue depth is 32, a minimum of 16 LUNs are needed.

*An extra consideration, especially for Windows guests: Windows has an aggregate limit per Windows PhysicalDisk of 256 for wait queue and service queue.  Once (service queue length + wait queue length) is 256, additional threads that want to submit io must sleep until a queue slot opens for them.





Monday, September 14, 2015

I got a hunch about SQL Server Fragmentation Part V

Picking up from previous work in a recent post, which can be found here.

I got a hunch about SQL Server Fragmentation Part IV
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server_14.html

Lets create two more tables, with the same structure as sq_CI.


CREATE TABLE [dbo].[sq_CI_2]
            ([ID_INT] [int] NOT NULL,
             [VARCHAR_4096] [varchar](4096) NULL
             CONSTRAINT [PK_sq_CI_2] 
   PRIMARY KEY CLUSTERED([ID_INT] ASC) ON [PRIMARY]
             ) ON [PRIMARY];
CREATE NONCLUSTERED INDEX nci_sq_CI_2 
       ON sq_CI_2 ( ID_INT ASC ) INCLUDE ( VARCHAR_4096 ); 

CREATE TABLE [dbo].[sq_CI_3]
            ([ID_INT] [int] NOT NULL,
             [VARCHAR_4096] [varchar](4096) NULL
             CONSTRAINT [PK_sq_CI_3] 
   PRIMARY KEY CLUSTERED([ID_INT] ASC) ON [PRIMARY]
             ) ON [PRIMARY];
CREATE NONCLUSTERED INDEX nci_sq_CI_3 
       ON sq_CI_3 ( ID_INT ASC ) INCLUDE ( VARCHAR_4096 );

K. Now lets populate them.


INSERT INTO sq_CI_2 SELECT * FROM sq_CI ORDER BY ID_INT;
INSERT INTO sq_CI_3 SELECT * FROM sq_CI ORDER BY ID_INT;

How do they look?


SELECT   object_name(pstats.object_id) AS tname,
         index_type_desc,index_level,
         avg_fragment_size_in_pages,
         fragment_count,page_count,
         avg_fragmentation_in_percent 
FROM     sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE    pstats.object_id IN (object_id('sq_CI'),object_id('sq_CI_2'),object_id('sq_CI_3'))
ORDER BY pstats.object_id ;



I guess that's expected. The original table sq_CI is still optimized from previous work. The new tables sq_CI_2 and sq_CI_3 look pretty good  - but not as good as sq_CI.

Lets look a little closer, though, with Merrill Aldrich's File Layout Viewer*.
Table sq_CI (below) looks as expected - a large, mainly contiguous chunk for the clustered index PK_sq_CI, and another large, mainly contiguous chunk for nonclustered index NCI_sq_CI.


What about tables sq_CI_2 and sq_CI_3?  Here's what sq_CI_2 looks like in the viewer.  The more blue-based of the shades is the clustered index - several large chunks for the 4088 rows/pages.  The more red-based shade is the nonclustered index, with the same contiguity and interleaving as the clustered index. Table sq_CI_3 looks the same - is no other activity in this SQL Server instance, there aren't any other data structures to become interleaved with other than those I am creating, and there's enough contiguous free space in the single mdf file to allow SQL Server its first choice for extents and pages as growth occurs.


That's actually kinda boring, though.  I'm really interested in what happens if the two full-table inserts are happening concurrently.


TRUNCATE TABLE sq_CI_2;
TRUNCATE TABLE sq_CI_3;

So I'll use a .bat script to execute sqlcmd, with "START /b" so that a single script can launch both inserts nearly simultaneously**.

Contents of my .bat script (local to the SQL Server).

START /b sqlcmd -E -S localhost -d sq_E_single -Q "INSERT INTO sq_CI_2 SELECT * FROM sq_CI ORDER BY ID_INT"
START /b sqlcmd -E -S localhost -d sq_E_single -Q "INSERT INTO sq_CI_3 SELECT * FROM sq_CI ORDER BY ID_INT"


After executing the .bat script, let's take a look with Merrill Aldrich's viewer (can't recommend this tool* enough for exploration of fragmentation considerations).  Yeah, that's what I wanted to show.


What does sys.dm_db_index_physical_stats have to say?


SELECT   object_name(pstats.object_id) AS tname,
         index_type_desc,index_level,
         avg_fragment_size_in_pages,
         fragment_count,page_count,
         avg_fragmentation_in_percent 
FROM     sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE    pstats.object_id IN (object_id('sq_CI'),object_id('sq_CI_2'),object_id('sq_CI_3'))
ORDER BY pstats.object_id ;




So... avg_fragmentation_in_percent for sq_CI_2 and sq_CI_3 - the same as it was earlier. 
The avg_fragmentation_in_percent value is so low at ~0.0245% that I don't think anyone would do anything about it.

But fragment_count and avg_fragment_size_in_pages hint at the physical interleaving that is graphically displayed by the viewer.

What happens with a fullscan of sq_CI_3 clustered index (since it has the most reported fragments)?


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO ON;
SELECT * INTO #blackhole FROM sq_CI_3 WHERE varchar_4096 like '%b%'
SET STATISTICS IO OFF;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);

Well, first of all I shouldn't assume the clustered index will be used :-) In this case, although I can't be sure its why the optimizer chose the NCI, a good reason would be expected lower IO cost due to fewer fragments for full scan.


Of course, I didn't think of that until after I ran the query :-) So here's what sys.dm_io_virtual_file_stats and STATISTICS IO had to say.

320 reads, 33742848 bytes read => 4119 eight kb database pages read.  2 of those reads were counted by STATISTICS IO as physical reads, the remaining 318 reads reported by sys.dm_io_virtual_file_stats included the 4103 read-ahead pages. Remember that the optimized clustered index AND nonclustered index of sq_CI was shown to do full scan in 68 reads, 33619968 bytes read - 4104 8k pages.  This nonclustered index is using 252 more reads (and reading 15 more 8k pages) to scan the same logical contents.

We were coming for the clustered index, though.


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO ON;
SELECT * INTO #blackhole FROM sq_CI_3 WITH (INDEX(pk_sq_CI_3)) WHERE varchar_4096 like '%b%'
SET STATISTICS IO OFF;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);

Check the estimated plan for a clustered index scan cuz I don't wanna make the same mistake twice.


Good to go.  Execute the SQL batch, check the results.



338 reads, 33742848 bytes = 4119 eight kb pages.  Very similar to the results from the nonclustered index, but requiring an additional 18 reads (which I'm fairly certainly were all to cover the read-ahead pages, even though both clustered and nonclustered index full scans used read-ahead for 4103 pages.

So, there you have it.  In a single database file, populating these two tables in parallel rather than one-at-a-time resulted in 252 (NCI) or 270 (CI, slightly less contiguous than NCI) more reads than the optimized sq_CI table.  The 15 more eight kb pages read by the non-optimized full scans in my opinion pales in comparison to the inflation of read operations.

There will be more fun, I promise!!  We've still gotta see what happens when we try to correct fragmentation in a "mature" database file, gotta see the effects of multiple files in a filegroup, and once we've got multiple files in the filegroup can't stay away from a -E startup option showdown!  (I'm getting there, @DBArgenis, its just taking me a while :-})


*The SQL File Layout Viewer v1.2 can be found at the link below. Its good stuff. Mainly for educational purposes, in current form not suitable for large databases.
http://sqlblog.com/blogs/merrill_aldrich/archive/2013/10/09/sql-file-layout-viewer-1-2.aspx
**Additional information on "start" command.
https://technet.microsoft.com/en-us/library/bb491005.aspx




I got a hunch about SQL Server fragmentation Part IV

In my previous post, I used a rather small table.  Clustered index containing 7 rows, 7 data pages + 1 index page.  Optimally 1 extent, though not always placed that way. Nonclustered index also of 8 total pages.  Optimally 1 extent, but not always placed that way.

You can find that adventure here.
I got a hunch about SQL Server fragmentation Part III
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server_13.html 
If you want to continue after this post, keep going here.
I got a hunch about SQL Server fragmentation Part V
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server_79.html

Let's crank it up a notch, shall we?


TRUNCATE TABLE sq_CI
SET NOCOUNT ON
DECLARE @string VARCHAR(4096) = 'AA', @maxVal INT = 4088;
DECLARE @ij     INT           = 1   , @jk     INT = 12; 
--stuff @string to 4096 characters
WHILE @jk > 1
      BEGIN
            SELECT @string = STUFF(@string,2,0,@string);
            SELECT @jk = @jk - 1;
      END ;

WHILE @ij < @maxVal+1
      BEGIN
            INSERT INTO sq_CI VALUES (@ij, @string); 
            SELECT @ij=@ij+1;
      END ;


SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))


4088 data pages may seem like a mighty odd choice, especially given the total number of allocated pages at 4103. But I got my reasons :-)

First note that the avg_fragmentation_in_percent is equal to 14/4088 - the number of level 1 index pages divided by the number of level 0 or data pages.

15 fragments for almost 32mb isn't bad, but I think I can do better.


ALTER INDEX PK_sq_CI ON sq_CI
REBUILD
WITH 
(SORT_IN_TEMPDB = ON,
 MAXDOP = 1,
 ONLINE = ON);

SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))



Yep, that's even better... at least according to what's reported in sys.dm_db_index_physical_stats. I've never been one to stop at "better". After playing around a bit, I found that my best bet for lowest reported fragmentation in a clustered index is to rebuild it without the online option.





USE sq_E_single
ALTER INDEX PK_sq_CI ON sq_CI
REBUILD
WITH 
(SORT_IN_TEMPDB = ON,
 MAXDOP = 1);
 --ONLINE = ON);

SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))



I've got to admit, I like the looks of that. Exactly 4096 pages. 0% reported avg_fragmentation_in_percent! Pretty cool.

The reported 0% fragmentation surprised me - I was expecting at least 7/4088 or  .17%.  Something interesting is happening here.

Well - how about a full scan?


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO ON;
SELECT * INTO #blackhole FROM sq_CI WHERE varchar_4096 like '%b%';
SET STATISTICS IO OFF;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);






Not bad. 68 reads, 33619968 bytes read - 4104 8k pages.
2 physical reads reported by STATISTICS IO, leaving 66 read operations to cover the 4088 read-ahead pages.

Now, if I were really an expert I'd go straight to my "gaps and islands" query of sys.dm_db_database_page_allocations to show each contiguous range of pages and reconstruct the reads.  I'd probably also use procmon to capture each read as an individual call with file offset.

But its late and I'm not all that good at those things.  Maybe 3 or 4 posts from now :-)

Lets add a nonclustered index.


CREATE NONCLUSTERED INDEX nci_sq_CI 
       ON sq_CI ( ID_INT ASC ) INCLUDE ( VARCHAR_4096 ) 
       WITH ( SORT_IN_TEMPDB = ON, MAXDOP = 1, ONLINE = ON );
SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'));


Pretty good. Not as good as the clustered index. What if we create the index without the online option?


DROP INDEX sq_CI.nci_sq_CI;
CREATE NONCLUSTERED INDEX nci_sq_CI 
       ON sq_CI ( ID_INT ASC ) INCLUDE ( VARCHAR_4096 ) 
       WITH ( SORT_IN_TEMPDB = ON, MAXDOP = 1 );
SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'));


Sweet!

Now if we do a full scan of the nonclustered index?


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO ON;
SELECT * INTO #blackhole FROM sq_CI WITH (INDEX(nci_sq_CI)) WHERE varchar_4096 like '%b%' --NCI fullscan
SET STATISTICS IO OFF;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);


68 reads. 33619968 bytes read. 2 physical reads reported by Statistics IO, leaving 66 reads to cover the 4088 read-ahead pages reported.  Exactly the same for the nonclustered index as for the clustered index.

So - for today we've seen that the ONLINE option for CREATE INDEX and ALTER INDEX REBUILD can have a slight effect on fragmentation as reported by sys.dm_db_index_physical_stats.  If a clustered index is rebuilt without ONLINE = ON, it may have a little bit of fragmentation which may be eliminated by a rebuild with ONLINE = ON.  OTOH, creating a nonclustered index seems to be most optimal without ONLINE = ON.

And we've seen that its possible to have a 32 mb clustered index AND a 32 mb nonclustered index which both report 0% average fragmentation via  sys.dm_db_index_physical_stats. 

Sunday, September 13, 2015

I got a hunch about SQL Server fragmentation Part III

In my last post about SQL Server fragmentation, I showed that
avg_fragmentation_in_percent from sys.dm_db_index_physical_stats can report ~14% fragmentation for a clustered index with 7 data pages that is properly ordered and within a single extent.  For the table/clustered index I used as an example, there is no operation that will remove the "fragmentation" reported.

You can see me previous musing here
I got a hunch about SQL Server fragmentation Part II
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server.html
If you're seeking more after this post, you can find the next one at
I got a hunch about SQL Server fragmentation Part IV
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server_14.html

 

What about a non-clustered index? I'll create one that's patently ridiculous - a nonclustered index that is redundant to the clustered index, including the varchar column. I'm using sort_in_tempdb to eliminate the possibility that intermediate sort runs are injecting fragmentation. MAXDOP is set to 1 in the index creation, to eliminate the possibility of fragmentation due to contention during allocation (my tests are the only activity in this instance of SQL Server).


CREATE NONCLUSTERED INDEX nci_sq_CI 
       ON sq_CI (ID_INT ASC)
       INCLUDE ( varchar_4096 )
       WITH ( SORT_IN_TEMPDB = ON, MAXDOP = 1 );

SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))



That makes sense.  I've constructed the clustered index and the nonclustered index so they have the same contents, in the same order.

So full scans of the CI and the NCI should look the same - same number of bytes read, same number of reads.  Let's check it out.


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SET STATISTICS IO ON;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SELECT * INTO #blackhole FROM sq_CI WITH (INDEX(pk_sq_CI)) WHERE varchar_4096 like '%b%' --CI fullscan
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO OFF;


Full scan of the clustered index: 1 read, 65536 bytes.  IO statistics agrees.



OK, now a full scan of the nonclustered index. Isn't this just glorious fun?


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SET STATISTICS IO ON;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SELECT * INTO #blackhole FROM sq_CI WITH (INDEX(nci_sq_CI)) WHERE varchar_4096 like '%b%' --NCI fullscan
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO OFF;
 
 

Well now!!  That's peculiar!  More peculiar than I can dig into for today.  3 reads and 180224 read bytes reported by sys.dm_io_virtual_file_stats. Note IO statistics read-ahead count 14 is higher than logical read count 9.  No time for that today :-)

Read-ahead count 14 is also by itself higher than the 3 reads reported by sys.dm_io_virtual_file_stats. Only got time today to give a quick explanation, not demonstrate: in IO statistics, read-ahead reads is a count of pages that were read by read-ahead - not a count of the read operations that were performed.  And the read operations which handled the read-ahead reads are not in the physical read count reported by IO statistics - they are in addition to that count.  So sys.dm_io_virtual_file_stats reported 3 reads: one read accounted for by IO statistics physical reads = 1, and two of those reads accomplished the read-ahead of the 14 pages reported by IO statistics as read-ahead reads = 14.

OK.  Let's go to sys.dm_db_database_page_allocations to figure this out.


SELECT 
extent_page_id,allocated_page_page_id,is_allocated,
page_type_desc,page_level,next_page_page_id,previous_page_page_id
FROM sys.dm_db_database_page_allocations(DB_ID(),object_id('sq_CI'),2,NULL,'DETAILED')
WHERE page_type_desc <> 'IAM_PAGE'


So page 664 is the first page in an extent, and its the level 1 page.  The 7 level 0 pages are in logical order:
624=>632=>633=>634=>635=>636=>637
The first page is by itself in an extent.  The next 6 level 0 pages are in the 3rd extent.
3 extents.  3 reads.  Same reported avg_fragmentation_in_pct as the clustered index with all 7 data pages and an index page in a single 64kb extent - against which a fullscan required 1 read of 65536 bytes.

Turns out in this case, sys.dm_db_index_physical_physical_stats alone doesn't help predict the difference in behavior for the nonclustered index and the clustered index.

Now what happens when I rebuild the NCI?


ALTER INDEX NCI_sq_CI ON sq_CI
REBUILD
WITH 
(SORT_IN_TEMPDB = ON,
 MAXDOP = 1,
 ONLINE = ON);

IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SET STATISTICS IO ON;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SELECT * INTO #blackhole FROM sq_CI WITH (INDEX(nci_sq_CI)) WHERE varchar_4096 like '%b%' --NCI fullscan
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO OFF;


All right!  1 read now for the full scan, 65536 bytes.  But, what about sys.dm_db_index_physical_stats?


SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))


Oh.  So rebuilding the NCI got all of the pages into a single extent.  Full scan of the NCI dropped from 3 physical reads for a total of 180224 bytes to 1 read for a total of 65536 bytes.  But sys.dm_db_index_physical_stats gives no hint of that change in read behavior - it looks exactly as it did before.

On the other hand, sys.dm_db_database_page_allocations shows what happened with the index rebuild.


SELECT 
extent_page_id,allocated_page_page_id,is_allocated,
page_type_desc,page_level,next_page_page_id,previous_page_page_id
FROM sys.dm_db_database_page_allocations(DB_ID(),object_id('sq_CI'),2,NULL,'DETAILED')
WHERE page_type_desc <> 'IAM_PAGE'


OK, now I really have to go for today...

Saturday, September 12, 2015

I got a hunch about SQL Server fragmentation Part II

Time to talk about another hunch I had regarding SQL Server fragmentation.
Previous musing here.
I got a hunch about SQL Server fragmentation Part I
http://sql-sasquatch.blogspot.com/2015/08/i-got-hunch-about-sql-server.html
And if you are hungering for more after this, please proceed to
I got a hunch about SQL Server fragmentation Part III
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server_13.html

I'm working in a newly created test database, which has a single database file.

I'll start by creating my favorite (for now) type of table - two columns with one an integer clustered index key, and the second a 4096 character VARCHAR.  Because the clustered index is not compressed, each row in this table will be 1 page.

CREATE TABLE [dbo].[sq_CI]
            ([ID_INT] [int] NOT NULL,
             [VARCHAR_4096] [VARCHAR](4096) NULL
             CONSTRAINT [PK_sq_CI] 
               PRIMARY KEY CLUSTERED([ID_INT] ASC) ON [PRIMARY]
             ) ON [PRIMARY];

Important to disclose that in the SQL Server instance for my testing, trace flag 1118 is enabled globally.
You can read more about trace flag 1118 in Paul Randal's post below.

Misconceptions around TF 1118
http://www.sqlskills.com/blogs/paul/misconceptions-around-tf-1118/

Lets populate the table, and grab fragmentation information.


DECLARE @string VARCHAR(4096) = 'AA', @maxVal INT = 7;
DECLARE @ij     INT           = 1   , @jk     INT = 12; 
--stuff @string to 4096 characters
WHILE @jk > 1
      BEGIN
            SELECT @string = STUFF(@string,2,0,@string);
            SELECT @jk = @jk - 1;
      END ;

WHILE @ij < @maxVal+1
      BEGIN
            INSERT INTO sq_CI VALUES (@ij, @string); 
            SELECT @ij=@ij+1;
      END ;

SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))


The result:


Huh. 14% fragmentation, 2 fragments.  I thought it'd be better than that.

What happens if we checkpoint(to write out any dirty database pages), dropcleanbuffers and do a full tablescan?

CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SET STATISTICS IO ON;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SELECT * FROM sq_CI where varchar_4096 like '%b%'
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO OFF;
The results:

1 Read. 65536 bytes.  That's one extent.  How can two fragments be read in one 64kb read?
Maybe sys.dm_db_database_page_allocations can help clear the matter up?


This information makes it clear that all of the DATA_PAGE pages are in the same 64kb extent.
In fact, the physical order of DATA_PAGE pages is in agreement with the logical order of DATA_PAGE pages:
78168=>78170=>78171=>78172=>78173=>78174=>78175
So the ordering of the clustered index DATA_PAGE pages is good - in fact it can't be made any better. (Dare ya to try :-})
 All of DATA_PAGE pages can be retrieved in a single disk read.
So why does sys.dm_db_index_physical_stats report over 14% fragmentation?
Why does sys.dm_db_index_physical_stats report 2 fragments, when all pages can be retrieved in one read?

I suspect an algorithm error.  My guess is that the algorithm is looking at the next logical DATA_PAGE for each DATA_PAGE, and comparing to the next ALLOCATED_PAGE_PAGE_ID that IS_ALLOCATED for the Index Id. By that algorithm, the placement of INDEX_PAGE 78169 divides the 7 DATA_PAGE pages incorrectly into two "fragments".

Instead the algorithm should look at the next logical DATA_PAGE for each DATA_PAGE, and compare to the next ALLOCATED_PAGE_PAGE_ID that IS_ALLOCATED for the Index Id for DATA_PAGE page_type_desc.

That's a subtle distinction that would make ~14% "fragmentation" magically disappear in this case.

That's it for now I guess...