If intrigued by this idea of determining which parallel query thread thread at a given DOP a value or values will be assigned to by hash partitioning, please also see this post by Joe Obbish.
A Serial Parallel Query
The SQL below will create a table with 2^(n-1) rows with num value n, for each integer n between 1 and 16. The total number of rows in the table is 2^15 - 1 or 65535.
This first draft code isn't elegant or very efficient. It took 8 minutes and 11 seconds to complete on the test vm. But a table with this structure should be very valuable in experiments to learn about hash distribution of rows among parallel threads. Originally I though it might also lead to leaning a bit about how the parallel page supplier works. (Not so much.)
Joe Obbish read my code above and had some comments:
- that code is so confusing to read
- if change it like so it'll finish in 2 seconds
- still not the way I'd probably do it if starting from scratch
Thanks, Joe!! Sure enough, I modified the code according to his suggestions and 65535 rows were created in 2 seconds. So I increased the expected total number of rows from 65535 to 131071 which is where I wanted to be anyway. That took four seconds. To create 131071 rows with my original code would have taken... a *lot* longer. 😁
After Joe's help the code looked like this.
Much faster, much nicer. So here's the idea: since each integer value n 1 to 17 is represented in the table with 2^n rows, when rows are distributed to parallel threads by hash the number of rows for the thread will tell me which values went to each thread, too.
Will this also allow me to make observations about the parallel page supplier? The threads of a parallel scan get their page sets from the parallel page supplier... but there's not much info beyond that publicly available
Later on I thought "hey, I'll save myself some thought if I do this in base 10 rather than base 2." I also wanted to see how Joe Obbish would have approached creating this type of table from scratch. I asked, and he taught me a new trick (which I'll likely have to keep coming back to here otherwise I won't remember how it worked).
This isn't exactly how Joe did it, but its as close as I get from memory.
This query shows how it works.
So here's how it comes together to make my table for exploring.
That created 111111 rows in 47 seconds. Let's check up on it... how many data pages?
111,111 data pages - good.
OK, now let's have some fun.
Let's take a look at the spread of rows/pages in that top right clustered index scan. Thread 1 with 55366 rows and thread 2 with 55745. That's a pretty even split of work between the threads. That's what we want out of the storage engine scan - split the work fairly evenly among worker threads.
What about the clustered index scan right below it? That's about a 90-10 split. The storage engine parallel page supplier isn't dividing pages up for the threads based on values in the clustered index - neither of the splits show the kind of numbers needed for the distribution to align with my crafty set of data. Figuring out how the parallel page supplier is coming up with sets of pages will take more investigation than this post can afford - since it'll require at least one separate investigation method. I'll have to explore the parallel page supplier more at another time.
On to a peek at how rows are being distributed among parallel threads by a hash function. The Repartition Streams operator details indicates using a hash to distribute the rows. Thread 1 handled almost 100 times as many rows as thread 2. Based on the number of rows of this table each thread handled, we can tell that thread 1 handled the rows with values 5 and 6 while thread 2 handled rows with values 1, 2, 3, and 4.
Here's what distribution of rows looked like at that operator for each DOP from 2 to 6. I'm intrigued, to say the least. Especially interesting to me is that DOP 5 and DOP 6 each had only 4 threads active. But the 4 active threads had different values going to them at DOP 5 and DOP 6.
All right. The role of the fluff was to help explore the parallel page supplier. I won't get anywhere with that tonight, so let's cut the fluff. GTFOH - Get the fluff out of here! 😁 And let's go back to base 2 - it'll allow me to work more values into a given number of rows. Might as well page compress them, too. This got me 262143 rows in 30 seconds.
Pretty tiny table, though. There's room for more - this can only get more interesting with more values in the table. But, I'll stick with this for now.
Here's the query I'm using. In this case, I'll look at DOP 3 to DOP 12. At DOP 2, the Repartition Streams operator highlighted before isn't present. So for now, just look at DOP 3 to DOP 12.
For DOP 3 to DOP 7...
And for DOP 8 to DOP 12...
Ciao for now!