This is analagous to `Vec::push_mut()`, which was stabilied in Rust
1.95.0. Since we can't use that rust version yet, we internally
implement it as `push()` followed by `last_mut().unwrap()`.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41941>
Bifrost/valhall descriptor pointers are incorrectly assigned
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 11fcb23f74 ("pan/desc: Add a struct for valhall/bifrost to the union in pan_tiler_context")
Signed-off-by: Ashley Smith <ashley.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41126>
This fixes state leakage when using the RADEON_DEBUG=notcl debug option.
This manifested as heavy desktop corruption when running GL clients with
this flag, since the R300_VAP_TCL_BYPASS state would leak into other HWTCL
users such as Xorg/glamor or Wayland compositors.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41996>
Using bitfields results in nondeterministic bit patterns in the unused
bits. Since ir3_shader_output is stored in the cache, this makes it
difficult to verify cache equality between different builds.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41999>
Similar to RADV, restarts render pass with resolve attachments. Not
the most ideal for tiling, but we don't even use native resolve for
built-in modes due to Metal format limitations.
Reviewed-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41888>
Multiview often involves a loop over view indexes, and our output
handling assumes that everything is constant-indexed. Unrolling
the loops takes care of this. (brw already does this.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
It's mildly tempting to reuse the src0_alpha source for color1 since
the two features should never overlap, but for now we add an extra
optional source.
We require SIMD16 for now as we only have SIMD16 messages. Eventually,
we're likely to want to support SIMD32 with 2x16 sends, but this gets
us going for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
Surprisingly, this actually appears to come up. Two Baldur's Gate 3
shaders optimized away to have unconditional "demote" in a shader with
no other side-effects, meaning no writes occur and we can eliminate the
entire program. One of the shaders still did a fair amount of math to
produce color values that were never used.
We introduce a pass to detect store_render_target_intel intrinsics
where discard == true and eliminate them. We then DCE and see if we
eliminated the entire program other than "demote" or "terminate" and
drop those too. We then add back a Null RT store if needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
otherwise we get garbage in the other lanes. this was a pain to debug.
dEQP-VK.subgroups.clustered.compute.subgroupclusteredand_bvec2
this should be optimized (and maybe reworked/simplified too) but now this should
be /correct/ at least.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
The stack pointer starts out at b.shader->scratch_size, plus per-lane
offsets. Every time we spill/fill, we adjust the stack pointer to
the offset for our desired memory location, and leave it there. Over
the course of each block's spills/fills, we track the current delta from
the original value, and restore it to there at the end of the block.
However, when we started clobbering lane 0 and rematerializing it,
we were recreating it as the original base value (b.shader->scratch_size
+ sizeof(uint32_t) * 0). We need to include sp_delta_B too, or else we
will calculate our deltas incorrectly for that lane, and restore it
incorrectly at the end of the block too.
Found while debugging the issue fixed by the previous commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
The idea here is that the scratch surface address is stored in
ADDRESS_REGISTER, while per-lane offsets are stored in `sp', an
array of UGPR[dispatch_width]. When we encounter an opcode that
needs to clobber the address register, we stash it in the first
UGPR of `sp'. This clobbers the first lane offset, but that's
easy to reconstruct since it's lane 0. When we need to spill/fill,
we restore the address register and rematerialize the offset for
lane 0.
This is all good. However, we were saving the address register
every time we found an opcode that clobbered it...even if we'd
_already_ clobbered it. So if you had back to back shuffles,
the first would save the scratch surface address, and the second
would save...some part of the first shuffle. So we'd never get
the scratch address back again. Easy fix, only save if valid.
Fixes misrendering in Baldurs Gate 3 compute shaders.
Fixes: 64acab1d69 ("jay/lower_spill: use 1 less temporary")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
instead of appending instructions like brw_eu helpers, just construct a single
gen_inst at a time. this involves a lot less indirection.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
Equivalent now that the IR allows it.
For the dynamic case:
< (32&W) mov.u16 g0, g38<16,8,2> │ I@1
---
> (32&W) mov.u16 g0, g38<2> │ I@1
For the constant case it's actually better since copyprop can see through it:
< (1&W) mov.u32 u0.0, 0xaaaaaaaa │
< (32&W) mov.u16 g1, u0.0 │ I@1
---
> (32&W) mov.u16 g0, 0xaaaa │
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>
Previously we had special ops doing data model breaking things on GPRs. But
there's no real reason for that, we can calculate lane IDs as UGPR vectors
within the Jay data model just fine. Adjust jay_ir/jay_validate to define packed
16-bit UGPR vectors, giving them the natural semantics, then use that to
calculate lane IDs, peeling back all the hacks we added along the way.
This also unfortunately pessimizes inverse_ballot() but only in a corner case
that could be revisited later. Stats are net positive.
In addition to the code clean up, this has 3 other benefits:
* Now that we can rematerialize the lane ID code anywhere we want, we could
theoretically reduce register pressure in some scenarios. Stats show this
doesn't help in the current implementation, though.
* Now that we can calculate lane IDs in control flow, the issues with divergent
function calls all go away. (Well, the lane ID issue. There are other issues.)
* Now that we use UGPRs for this, we don't need a stride=16 GRF in shaders that
don't actually use 16-bit math, meaning less shuffling from bad partitions.
That's reflected in the positive stats here.
SIMD16:
Totals from 1643 (62.07% of 2647) affected shaders:
Instrs: 2227750 -> 2221032 (-0.30%); split: -0.44%, +0.14%
CodeSize: 33138416 -> 33034224 (-0.31%); split: -0.52%, +0.20%
SIMD32:
Totals from 1643 (62.07% of 2647) affected shaders:
Instrs: 2864583 -> 2806217 (-2.04%); split: -2.22%, +0.19%
CodeSize: 43088064 -> 42171504 (-2.13%); split: -2.29%, +0.17%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41872>