It's useful to determine how much memory a nir_shader consumes for debugging
memory bloat, particularly for persistent NIR library. Add a helper that lets us
compute this.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892>
This patterns were all found in the AGX quads tessellator, a medium-sized OpenCL
kernel. LLVM generates a lot of garbage around booleans which we need to chew
through. Though there's nothing AGX or really OpenCL specific here, so some of
this could help graphics shaders too.
Together, their effect is significant for that kernel instr count & occupancy:
before: 2966 inst, 2310 alu, 2310 fscib, 1216 ic, 23148 bytes, 239 regs, 384 threads
after: 2848 inst, 2246 alu, 2246 fscib, 1000 ic, 22260 bytes, 231 regs, 448 threads
No significant changes on GL shaderdb (a single godot shader regressed 1
instruction, 1344->1345).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31892>
Commit 0e6aaab00a ("pan/cs: add block to handle registers backup in
exception handler") broke the lazy allocation case by checking the
current chunk capacity too early.
Fixes: 0e6aaab00a ("pan/cs: add block to handle registers backup in exception handler")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Tested-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Tested-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Benjamin Lee <benjamin.lee@collabora.com>
Tested-by: Benjamin Re <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31884>
Secondary command buffers need seqno registers to hold the current value
at the start of command buffer execution in order to calculate correct
wait values.
Fixes: c2299b6642 ("panvk/csf: Implement vkCmdExecuteCommands")
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31813>
The removed assertion was originally added to enforce
> VUID-vkCmdExecuteCommands-pCommandBuffers-00100:
> If vkCmdExecuteCommands is not being called within a render pass
> instance, each element of pCommandBuffers must not have been recorded
> with the VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT
However, if a render pass instance is entered with vkCmdBeginRendering,
vk_command_buffer::render_pass is unset.
Code change was done by Boris, only commit description was added.
Fixes: c2299b6642 ("panvk/csf: Implement vkCmdExecuteCommands")
Co-authored-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rebecca Mckeever <rebecca.mckeever@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31813>
This used to abort (see the previous commit) when the hardware wasn't
able to sample all SPM counters because the BO was too small. The SPM
BO can now be resized like the SQTT BO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31883>
The spec says
depthWriteEnable controls whether depth writes are enabled when
depthTestEnable is VK_TRUE. Depth writes are always disabled when
depthTestEnable is VK_FALSE.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-By: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31878>
Instead of passing 2 different 4-bit precision values via the SGPR, pass
the quant mode enum + log_samples as 3 bits, and 2-bit log_samples
separately. This saves 3 bits in the SGPR, which we'll need for culling
states.
This completely changes how the small prim precision is computed from
the state bits.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31865>
Computing 'htile_size/meta_size' is allowed for RADEON_SURF_MODE_1D when
RADEON_SURF_TC_COMPATIBLE_HTILE isn't set.
Lacking of computing causes performance degradation in some scenarios.
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31617>
Stuff COMPUTE_WALKER_BODY in COMPUTER_WALKER in both iris and anv.
This also fixes the tracepoint for ray dispatches. Stuffing
COMPUTE_WALKER_BODY allow us to set the
cmd_buffer->state.last_compute_walker.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31822>
For combined image/sampler descriptors, each user-facing descriptor gets
two entries in the descriptor table. Indexes must be strided to account
for this.
Signed-off-by: Benjamin Lee <benjamin.lee@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31777>
This code isn't really wrong, but it makes some assumptions that are a
bit hard to grok. Let's thread a bit more carefully, by adding an assert
that hopefully clears things up a tad.
We area after all choosing in the range of RAW8 to RAW128.
CID: 1605056
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31767>
If the FS isn't used, there's no reason to consult it. This was inspired
by a coverity report, which was technically wrong, but made me look
closer at the code.
CID: 1620442
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31767>
A vs is always required, and we already dereference it in this function
unconditionally. Let's add an assert to be sure, and drop the run-time
check here.
CID: 1620449
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31767>
While this is perfectly valid, stuffing the conditional into the define
makes us evaluate it over and over again, causing some warnings about
nonsensical compares with Coverity.
CID: 1618771
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31767>
The value can't be larger than 31 here anyway, due to the bitfield
width. So the assert is completely needless.
CID: 1633082
Fixes: b8bfbbdf66 ("panvk: check against texfeat_bit")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31767>