Which Compute ID for me?

The first time you look at some compute code, you need to work out what each thread is going to do. Since everything is driven from the system IDs passed to the entry point, you need to know what each one means. Then later when you’re writing your own compute code, you need to remember the names and values of those system IDs. And every time I need to open the ID3D11DeviceContext::Dispatch() page to get the pretty/confusing diagram, and then I’m still challenged to work out the one I need. Not any more! Here’s what you need based on what you’re doing:

1D Processing

  • Use uint SV_DispatchThreadID/S_DISPATCH_THREAD_ID to index an element in 1D array.
  • Use uint SV_GroupIndex/S_GROUP_INDEX (and SV_GroupThreadID/S_GROUP_THREAD_ID in the 1D case) to index within the group – maybe not for sharing between threads, but you could use LDS as a per-thread value cache.
  • Use uint SV_GroupID/S_GROUP_ID to know which group of 64 you’re in – if you wanted to do a reduction.

For example, assume we have N elements to process. We’ll handle 64¹ at a time with thread groups defined as numthreads(64,1,1), requiring a group count of (N+63)/64 and a dispatch(groupCount, 1, 1). Here is a visual concept of what that means:

1D Dispatch IDs

2D Processing

  • Use uint SV_GroupIndex/S_GROUP_INDEX to index linearly within the group for LDS access.
  • Use uint2 SV_GroupThreadID/S_GROUP_THREAD_ID to index the pixel in the tile.
  • Use uint2 SV_DispatchThreadID/S_DISPATCH_THREAD_ID to index pixel/texel.
  • Use uint2 SV_GroupID/S_GROUP_ID to index a matching tile in metadata (assuming 8×8).

In this case we’ll consider a case of a 2D array of dimensions W by H. These will be split into 8×8 tiles with numthreads(8,8,1) mean we have (W+7)/8 tiles in X and (H+7)/8 tiles in Y and will be starting the shader with dispatch(tilesX, tilesY, 1). In the case of a 16×16 array (or 2×2 tiles), we get these values:

2D Dispatch IDs

Something to note

One thing to know is that the only values that need to be passed to your shader are SV_GroupID/S_GROUP_ID and SV_GroupThreadID/S_GROUP_THREAD_ID. The other values are calculated based on these combined with the values from numthreads:

SV_DispatchThreadID = SV_GroupID * numthreads
                    + SV_GroupThreadID;

SV_GroupIndex = SV_GroupThreadID.z*numthreads.x*numthreads.y
              + SV_GroupThreadID.y*numthreads.x
              + SV_GroupThreadID.x;

This means there are implicit multiply-adds to calculate these values and on some platforms we can shave a few cycles by manually calculating them and using the 24bit versions of the multiplies rather than the full 32bit that the compiler may select. The minor problem with this is that you need to duplicate the numthreads values into the handwritten version (assuming you have less than 16M (2^24) odd threads). Check your assembly!

¹ 64 – it’s always going to be 64!

One response to “Which Compute ID for me?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s