It's mildly tempting to reuse the src0_alpha source for color1 since
the two features should never overlap, but for now we add an extra
optional source.
We require SIMD16 for now as we only have SIMD16 messages. Eventually,
we're likely to want to support SIMD32 with 2x16 sends, but this gets
us going for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
Surprisingly, this actually appears to come up. Two Baldur's Gate 3
shaders optimized away to have unconditional "demote" in a shader with
no other side-effects, meaning no writes occur and we can eliminate the
entire program. One of the shaders still did a fair amount of math to
produce color values that were never used.
We introduce a pass to detect store_render_target_intel intrinsics
where discard == true and eliminate them. We then DCE and see if we
eliminated the entire program other than "demote" or "terminate" and
drop those too. We then add back a Null RT store if needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
otherwise we get garbage in the other lanes. this was a pain to debug.
dEQP-VK.subgroups.clustered.compute.subgroupclusteredand_bvec2
this should be optimized (and maybe reworked/simplified too) but now this should
be /correct/ at least.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
The stack pointer starts out at b.shader->scratch_size, plus per-lane
offsets. Every time we spill/fill, we adjust the stack pointer to
the offset for our desired memory location, and leave it there. Over
the course of each block's spills/fills, we track the current delta from
the original value, and restore it to there at the end of the block.
However, when we started clobbering lane 0 and rematerializing it,
we were recreating it as the original base value (b.shader->scratch_size
+ sizeof(uint32_t) * 0). We need to include sp_delta_B too, or else we
will calculate our deltas incorrectly for that lane, and restore it
incorrectly at the end of the block too.
Found while debugging the issue fixed by the previous commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
The idea here is that the scratch surface address is stored in
ADDRESS_REGISTER, while per-lane offsets are stored in `sp', an
array of UGPR[dispatch_width]. When we encounter an opcode that
needs to clobber the address register, we stash it in the first
UGPR of `sp'. This clobbers the first lane offset, but that's
easy to reconstruct since it's lane 0. When we need to spill/fill,
we restore the address register and rematerialize the offset for
lane 0.
This is all good. However, we were saving the address register
every time we found an opcode that clobbered it...even if we'd
_already_ clobbered it. So if you had back to back shuffles,
the first would save the scratch surface address, and the second
would save...some part of the first shuffle. So we'd never get
the scratch address back again. Easy fix, only save if valid.
Fixes misrendering in Baldurs Gate 3 compute shaders.
Fixes: 64acab1d69 ("jay/lower_spill: use 1 less temporary")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
instead of appending instructions like brw_eu helpers, just construct a single
gen_inst at a time. this involves a lot less indirection.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
Equivalent now that the IR allows it.
For the dynamic case:
< (32&W) mov.u16 g0, g38<16,8,2> │ I@1
---
> (32&W) mov.u16 g0, g38<2> │ I@1
For the constant case it's actually better since copyprop can see through it:
< (1&W) mov.u32 u0.0, 0xaaaaaaaa │
< (32&W) mov.u16 g1, u0.0 │ I@1
---
> (32&W) mov.u16 g0, 0xaaaa │
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
Previously we had special ops doing data model breaking things on GPRs. But
there's no real reason for that, we can calculate lane IDs as UGPR vectors
within the Jay data model just fine. Adjust jay_ir/jay_validate to define packed
16-bit UGPR vectors, giving them the natural semantics, then use that to
calculate lane IDs, peeling back all the hacks we added along the way.
This also unfortunately pessimizes inverse_ballot() but only in a corner case
that could be revisited later. Stats are net positive.
In addition to the code clean up, this has 3 other benefits:
* Now that we can rematerialize the lane ID code anywhere we want, we could
theoretically reduce register pressure in some scenarios. Stats show this
doesn't help in the current implementation, though.
* Now that we can calculate lane IDs in control flow, the issues with divergent
function calls all go away. (Well, the lane ID issue. There are other issues.)
* Now that we use UGPRs for this, we don't need a stride=16 GRF in shaders that
don't actually use 16-bit math, meaning less shuffling from bad partitions.
That's reflected in the positive stats here.
SIMD16:
Totals from 1643 (62.07% of 2647) affected shaders:
Instrs: 2227750 -> 2221032 (-0.30%); split: -0.44%, +0.14%
CodeSize: 33138416 -> 33034224 (-0.31%); split: -0.52%, +0.20%
SIMD32:
Totals from 1643 (62.07% of 2647) affected shaders:
Instrs: 2864583 -> 2806217 (-2.04%); split: -2.22%, +0.19%
CodeSize: 43088064 -> 42171504 (-2.13%); split: -2.29%, +0.17%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
look at what the program actually does instead of hardcoding a worst-case.
SIMD16:
Totals from 1965 (74.23% of 2647) affected shaders:
Instrs: 2603230 -> 2539932 (-2.43%); split: -3.44%, +1.01%
CodeSize: 38826160 -> 37811904 (-2.61%); split: -3.59%, +0.97%
Number of spill instructions: 1206 -> 555 (-53.98%)
Number of fill instructions: 1194 -> 551 (-53.85%)
SIMD32:
Totals from 1974 (74.57% of 2647) affected shaders:
Instrs: 3998126 -> 3033333 (-24.13%); split: -24.18%, +0.05%
CodeSize: 59563952 -> 45580448 (-23.48%); split: -23.52%, +0.05%
Number of spill instructions: 43534 -> 37471 (-13.93%); split: -13.97%, +0.04%
Number of fill instructions: 43118 -> 36412 (-15.55%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
These queries need to be used for partitioning too. And also this degunks the
core RA logic in jay_register_allocate.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
panvk previously reported DRM format modifiers only through
VkDrmFormatModifierPropertiesListEXT.
Report them through VkDrmFormatModifierPropertiesList2EXT as well.
Cc: mesa-stable
Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41772>
VkFormatProperties uses VkFormatFeatureFlags, whose valid bits are limited to
VK_ALL_FORMAT_FEATURE_FLAG_BITS (0x7fffffffu).
Without this mask, the last bit leaks out.
Use vk_format_features2_to_features() helper when filling VkFormatProperties so
flags2-only bits are not leaked through legacy feature fields.
Cc: mesa-stable
Signed-off-by: Gyeyoung Baek <gye976@gmail.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41772>
This test has been marked as flaking on G925, but I've also seen it
flaking on G610 recently. Let's just move it to the common flake-file
instead. Also drop it from the fails-file on G925, as having it in both
isn't really needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40398>
This only fails sometimes, and it doesn't seem to take the whole system
down with it. Let's mark it as a flake instead of skipping it.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40398>
Skips accumulate over time, but rarely gets reevaluated to see if
they're still relevant. To combat this problem, I've dropped all skips,
and added back those that actually serve a practical use.
The result might be a bit more instability in the short term. But
hopefully this pays off in the long term.
Acked-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40398>