We were failing to tell the allocator about the restriction that scalar
texture instructions (allocated as scalar regs) couldn't be allocated such
that the start of the full unwritemasked vector started before r0. There
was a patch in select_reg_callback on a6xx that tried to work around that,
but you could still end up backed into a corner you shouldn't be because
we didn't tell the RA what it needed.
Fixes compiler assertion failures on a300-a400's blit_z shader, used for
Z32F gmem blits.
Looks like as a result we get tighter register allocation but more nops:
instructions in affected programs: 757945 -> 760356 (0.32%)
nops in affected programs: 317983 -> 320468 (0.78%)
non-nops in affected programs: 27525 -> 27451 (-0.27%)
mov in affected programs: 3098 -> 3023 (-2.42%)
dwords in affected programs: 109664 -> 110656 (0.90%)
last-baryf in affected programs: 112701 -> 112847 (0.13%)
full in affected programs: 4326 -> 4011 (-7.28%)
sstall in affected programs: 120550 -> 120836 (0.24%)
(ss) in affected programs: 13939 -> 13918 (-0.15%)
(sy) in affected programs: 3006 -> 2786 (-7.32%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
When the GS lowering was working on store_output intrinsics, we had to
clean up the split vars to avoid getting confused. Now that we shadow
the output vars instead, there's no confusion and we can drop this
hack.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We mostly got away with replacing a store_output with a store_var, but
for complex types like structs, that doesn't work. Once the IO has
been lowered from vars to intrinsic, we've lost the deref chains and
can't properly shadow the outputs.
This commits moves the GS lowering up so we do it before the output
variables get lowered to store_output. This way the pass works much
like nir_lower_io_to_temporaries() and cleanly shadows the outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
This pass lowers per-vertex input intrinsics to load_shared_ir3. This
was open coded in the TCS and GS lowering passes before - this way we
can share it. Furthermore, we'll need to run the rest of the GS
lowering earlier (before lowering IO) so we need to split off this
part that operates on the IO intrinsics first.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
Render Target Array Index has moved from R0.0[26:16] to
R1.1[26:16] on gen12.
Fixes dEQP-VK.multiview.input_attachments.*
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4836>
Not much is known currently about these fields and their values, but
this gets things going in the scenarios we have been testing with so
far.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4832>
Has been observed that after the job chain has completed, those fields
become populated.
tiler_heap_next_start contains an address inside the tiler heap, a bit
before the value that the GPU writes to tiler_heap_free.
used_hierarchy_mask contains a hex value that corresponds to values
observed as hierarchy masks.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4832>
Similar to what the blob does. My reason for doing this was mainly so
traces weren't as different, which makes it more work to spot
relevant differences.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4832>
On Bifrost traces, we can observe that this bit is always enabled.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4832>
Found upon inspection.
Signed-off-by: Andres Gomez <agomez@igalia.com
Reviewed-and-Tested-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4840>
`wc -l $file` returns the number of lines and the filename.
Fixes: b8c66aeb93 ("ci: Clean up some excessive use of pipes in dEQP results processing.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4829>
They're not real strides like linear textures but the hw does use them
so we do have to get it right annoyingly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4844>
In detecting the case where we actually do need to re-emit LRZ state
(due to new batch), we were checking `ctx->last.dirty` to detect when
we cannot trust previous state. But this is cleared before we check
it.
Move where it is cleared to the end of the draw_vbo() path.
Fixes: dfa702e94b ("freedreno/a6xx: limit LRZ state emit")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4842>
If use NIR's 1-bit bool representation , we get exactly the bool behavior
the hardware provides: CMPS produces true or false, AND/OR/XOR work as
intended without extra absnegs, and we can pass those half values directly
to other CMPS. We emit an absneg for b2b1 ("turn a memory load into a
1-bit NIR boolean"), but we would have done so for the ir3_n2b() on the
use of that value anyway. The most awkward bit is that inot(a@1) is now a
sub(1, a), but we can encode the 1 as an immediate so it's fine.
No significant changes to GL_TIME_ELAPSED on my set of traces (n=21).
instructions in affected programs: 1570638 -> 1548702 (-1.40%)
nops in affected programs: 624053 -> 611381 (-2.03%)
non-nops in affected programs: 959061 -> 949797 (-0.97%)
mov in affected programs: 5258 -> 5252 (-0.11%)
cov in affected programs: 15099 -> 15902 (5.32%)
dwords in affected programs: 469600 -> 452768 (-3.58%)
last-baryf in affected programs: 162211 -> 154726 (-4.61%)
full in affected programs: 4881 -> 4797 (-1.72%)
sstall in affected programs: 173953 -> 174545 (0.34%)
(ss) in affected programs: 10922 -> 10934 (0.11%)
(sy) in affected programs: 728 -> 745 (2.34%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4518>
DCC_DECOMPRESS doesn't work. Instead of trying to figure out why,
use a compute blit where the load is compressed and the store is
uncompressed.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4761>
This prevents an infinite recursion with a compute-based DCC decompression
when it restores shader images.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4761>
In addition to the cleanup, there are these changes in behavior:
- clear_render_target waits for idle after the dispatch and then flushes
L0-L1 caches (this was missing)
- sL0 is no longer invalidated before the dispatch, because src resources
don't use it
- sL0 is no longer invalidated after the dispatch if dst is an image
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4761>