DXIL has no concept of subgroup mask ops, relative
shuffle ops, and everything is scalar.
Most wave broadcast ops support i1 overloads, except
for quad swap operations. Go figure. Use lower_bit_size
to promote those to i32 instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20801>
DXIL doesn't have a "subgroup ID" or "num subgroups" construct,
so add lowering to construct them. Subgroup ID is done using
once-per-subgroup atomics on a workgroup-shared variable, and
then broadcasting that (using read_first_invocation) to the other
threads. Num subgroups is just a division with the workgroup size.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20777>
Some D3D12 drivers, like my PC's AMD driver, don't like using a
dynamic index to load from a constant buffer that's bound via
root constants. Instead, just go ahead and load the full set of
vertex data and just bcsel which one to use.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20778>
This is a weird way to do queries, but in multiview, each query
takes up N slots, where N is the number of views. D3D doesn't do
it that way, and only has one result, which fortunately is a valid
way to do Vulkan queries. We just need to take care to zero out
the other view results, and make sure they get "signaled" when
the cmdbuf is submitted.
Note that it is invalid in D3D to use ResolveQueryData on query
slots that have never actually been begun/ended, so we zero out
the data by copying zeroes into the buffer. This probably could
be optimized but oh well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20650>
For draws, when we're emulating multiview, we need to loop them
and set up the right sysval. For clears, we always need to loop.
When not emulating, we also need to set up the right view instance
mask.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20650>
D3D's view instancing is an optional feature, and even when it's
supported, it only goes up to 4 views, where Vulkan requires a
minimum of 6 supported views. So, we need to have a path for handling
the cases where we can't use the native feature.
In this mode, pass the view ID as a runtime var. The caller is then
responsible for looping the draw calls and filling out the constant
buffer value correctly for each draw. When we get to the last pre-rast
stage, we'll additionally want to write out gl_Layer to select the
right RTV array slice. Lastly, for the fragment shader, if there's
any input attachments, those get loaded using the RTV slice instead
of the view ID. RTV slice input into the PS is done with a signature
entry (which must be output from the previous stage) rather than a
system value.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20650>
This adds support for D3D12-native view instancing to the compiler.
Essentially, it's just the ability to load SV_ViewID (dx.op.viewID),
set the right capability, and fill out some more PSV data. Note that
the PSV data is currently garbage. Ideally, we'd fill out a proper
input -> output and viewID -> output dependency table, but AFAIK
this is only used to enforce D3D API validation, and drivers ignore
it, so it's less critical to get it right.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20650>
In addition to the DLL names being different, we don't have to do the versioning work since we don't have to fuss with known bad versions (for example).
Co-authored-by: Ethan Lee <flibitijibibo@gmail.com>
Co-authored-by: David Jacewicz <david.jacewicz@protonmail.com>
Co-authored-by: tieuchanlong <tieuchanlong@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19022>