Monday, September 14, 2015

I got a hunch about SQL Server fragmentation Part IV

In my previous post, I used a rather small table.  Clustered index containing 7 rows, 7 data pages + 1 index page.  Optimally 1 extent, though not always placed that way. Nonclustered index also of 8 total pages.  Optimally 1 extent, but not always placed that way.

You can find that adventure here.
I got a hunch about SQL Server fragmentation Part III
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server_13.html 
If you want to continue after this post, keep going here.
I got a hunch about SQL Server fragmentation Part V
http://sql-sasquatch.blogspot.com/2015/09/i-got-hunch-about-sql-server_79.html

Let's crank it up a notch, shall we?


TRUNCATE TABLE sq_CI
SET NOCOUNT ON
DECLARE @string VARCHAR(4096) = 'AA', @maxVal INT = 4088;
DECLARE @ij     INT           = 1   , @jk     INT = 12; 
--stuff @string to 4096 characters
WHILE @jk > 1
      BEGIN
            SELECT @string = STUFF(@string,2,0,@string);
            SELECT @jk = @jk - 1;
      END ;

WHILE @ij < @maxVal+1
      BEGIN
            INSERT INTO sq_CI VALUES (@ij, @string); 
            SELECT @ij=@ij+1;
      END ;


SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))


4088 data pages may seem like a mighty odd choice, especially given the total number of allocated pages at 4103. But I got my reasons :-)

First note that the avg_fragmentation_in_percent is equal to 14/4088 - the number of level 1 index pages divided by the number of level 0 or data pages.

15 fragments for almost 32mb isn't bad, but I think I can do better.


ALTER INDEX PK_sq_CI ON sq_CI
REBUILD
WITH 
(SORT_IN_TEMPDB = ON,
 MAXDOP = 1,
 ONLINE = ON);

SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))



Yep, that's even better... at least according to what's reported in sys.dm_db_index_physical_stats. I've never been one to stop at "better". After playing around a bit, I found that my best bet for lowest reported fragmentation in a clustered index is to rebuild it without the online option.





USE sq_E_single
ALTER INDEX PK_sq_CI ON sq_CI
REBUILD
WITH 
(SORT_IN_TEMPDB = ON,
 MAXDOP = 1);
 --ONLINE = ON);

SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'))



I've got to admit, I like the looks of that. Exactly 4096 pages. 0% reported avg_fragmentation_in_percent! Pretty cool.

The reported 0% fragmentation surprised me - I was expecting at least 7/4088 or  .17%.  Something interesting is happening here.

Well - how about a full scan?


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO ON;
SELECT * INTO #blackhole FROM sq_CI WHERE varchar_4096 like '%b%';
SET STATISTICS IO OFF;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);






Not bad. 68 reads, 33619968 bytes read - 4104 8k pages.
2 physical reads reported by STATISTICS IO, leaving 66 read operations to cover the 4088 read-ahead pages.

Now, if I were really an expert I'd go straight to my "gaps and islands" query of sys.dm_db_database_page_allocations to show each contiguous range of pages and reconstruct the reads.  I'd probably also use procmon to capture each read as an individual call with file offset.

But its late and I'm not all that good at those things.  Maybe 3 or 4 posts from now :-)

Lets add a nonclustered index.


CREATE NONCLUSTERED INDEX nci_sq_CI 
       ON sq_CI ( ID_INT ASC ) INCLUDE ( VARCHAR_4096 ) 
       WITH ( SORT_IN_TEMPDB = ON, MAXDOP = 1, ONLINE = ON );
SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'));


Pretty good. Not as good as the clustered index. What if we create the index without the online option?


DROP INDEX sq_CI.nci_sq_CI;
CREATE NONCLUSTERED INDEX nci_sq_CI 
       ON sq_CI ( ID_INT ASC ) INCLUDE ( VARCHAR_4096 ) 
       WITH ( SORT_IN_TEMPDB = ON, MAXDOP = 1 );
SELECT object_name(pstats.object_id) AS tname,
       index_type_desc,index_level,
       avg_fragment_size_in_pages,
       fragment_count,page_count,
       avg_fragmentation_in_percent 
FROM   sys.dm_db_index_physical_stats(db_id(),NULL,NULL,NULL,'DETAILED') AS pstats
WHERE  pstats.object_id IN (object_id('sq_CI'));


Sweet!

Now if we do a full scan of the nonclustered index?


IF OBJECT_ID('tempdb..#blackhole') IS NOT NULL DROP TABLE #blackhole 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);
SET STATISTICS IO ON;
SELECT * INTO #blackhole FROM sq_CI WITH (INDEX(nci_sq_CI)) WHERE varchar_4096 like '%b%' --NCI fullscan
SET STATISTICS IO OFF;
SELECT num_of_reads,num_of_bytes_read from sys.dm_io_virtual_file_stats(DB_ID(),1);


68 reads. 33619968 bytes read. 2 physical reads reported by Statistics IO, leaving 66 reads to cover the 4088 read-ahead pages reported.  Exactly the same for the nonclustered index as for the clustered index.

So - for today we've seen that the ONLINE option for CREATE INDEX and ALTER INDEX REBUILD can have a slight effect on fragmentation as reported by sys.dm_db_index_physical_stats.  If a clustered index is rebuilt without ONLINE = ON, it may have a little bit of fragmentation which may be eliminated by a rebuild with ONLINE = ON.  OTOH, creating a nonclustered index seems to be most optimal without ONLINE = ON.

And we've seen that its possible to have a 32 mb clustered index AND a 32 mb nonclustered index which both report 0% average fragmentation via  sys.dm_db_index_physical_stats. 

No comments:

Post a Comment