Every layer has its own dispatch table that it can use to call down the
layer stack. This allows us to use RRA and RGP tracing simultaneously.
Using application layers with tracing should work as well.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20439>
radv_before_taskmesh_draw will no longer call radv_before_draw and
instead implement the necessary functionality on its own.
radv_before_draw will no longer have to emit mesh shader descriptors.
As a result, both functions should have a lower CPU overhead now.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18829>
Any unused split_vector definition can always use the same register
as the operand. This avoids creating unnecessary copies.
Fossil DB stats on Rembrandt (RDNA2):
Totals from 3904 (2.89% of 134906) affected shaders:
CodeSize: 18326692 -> 18271688 (-0.30%)
Instrs: 3386632 -> 3372888 (-0.41%)
Latency: 42337481 -> 42330085 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6566731 -> 6566424 (-0.00%); split: -0.01%, +0.00%
Copies: 224301 -> 210559 (-6.13%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
Eliminate unnecessary copies when the operand registers of a
p_split_vector instruction are not clobbered between the p_split_vector
and the user of its definitions.
This happens when p_split_vector doesn't kill its operand and its
definitions have a shorter lifespan that the operand. It affects every
NGG culling shader among other things.
This optimization exists because it's too difficult to solve it
in RA, and should be removed after we solved this in RA.
v2 by Daniel Schürmann:
- Rearrange and simplify conditions for the new optimization
- Fix a few bugs
v3 by Daniel Schürmann:
- Check number of encoded ALU operands
Fossil DB stats on Rembrandt (RDNA2):
Totals from 64896 (48.10% of 134906) affected shaders:
CodeSize: 175693348 -> 175434944 (-0.15%)
Instrs: 33333912 -> 33269388 (-0.19%)
Latency: 183766084 -> 183763432 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 28589651 -> 28589340 (-0.00%); split: -0.00%, +0.00%
Copies: 2806550 -> 2742038 (-2.30%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
This allows is_overwritten_since to return false when the last
writer instruction of a register can't be tracked but we know it wasn't
written in the current block.
Fossil DB stats on Rembrandt (RDNA2):
Totals from 1163 (0.86% of 134906) affected shaders:
CodeSize: 9815920 -> 9805016 (-0.11%)
Instrs: 1843688 -> 1840962 (-0.15%)
Latency: 19219153 -> 19209171 (-0.05%)
InvThroughput: 3354375 -> 3353852 (-0.02%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
This works because of SSA and should be safer than just setting 'not_written_yet'.
No Fossil DB changes on Rembrandt (RDNA2).
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
Some HW may be able to fold only some of dst types, e.g.
for Adreno folding i32 -> i16 could cause a different result since
folded variant clamps the result instead of masking it.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20396>
These files are relatively big (amounting to ~4.5GB for one run) but
are not really useful.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20415>
This fixes an issue when conditional rendering and multiple views
were used with a task+mesh draw call.
Fixes: 2479b62869
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20098>
This fixes a mistake which assumes there is always at least 1 preamble.
This assumption is currently incorrect on transfer queues.
Fixes: e10b2f273e
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20401>
Only enable mesh+task shaders when IBs and gang submit are enabled.
We won't support gang submit with noibs.
Also remove the RADV_PERFTEST=ext_ms option.
Side note, GFX11 task/mesh support is still a TODO.
Don't skip the CTS tests which require GFX->ACE synchronization.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
This function was added because previously ACE and GFX work was
submitted separately and we needed to make sure they both use the
same BOs. Now they are part of the same submission so this
function is not necessary anymore.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
These are no longer used by any part of RADV, so we
can just safely delete it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
Add new preambles and postambles for synchronizing gang members in a
gang submission using semaphores.
These semaphores are both located in a small BO.
Gang wait preambles:
- gang leader writes 1 to a semaphore
- gang member waits for it to be written
When task shaders are used, make sure ACE waits until GFX starts to execute.
Userspace is required to emit this wait to make sure it behaves correctly
in a multi-process environment, because task shader dispatches are not
meant to be executed on multiple compute engines at the same time.
Gang wait postambles:
- gang member writes 1 to a semaphore
- gang leader waits for it to be written
This ensures that the gang leader waits for the whole gang,
which is necessary because the kernel signals the userspace fence
as soon as the gang leader is done, which may lead to bugs because the
same command buffers could be submitted again while still being executed.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
This is now handled in the queue submission code so is not necessary.
However, keep the semaphore for future use.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
Move processing of the command buffer array inside the loop that
splits a submission.
We now also add the perf counter lock/unlock to each submission
instead of just the first and last.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
It would be too difficult to keep the radv_queue_submit_with_ace
function while also refactoring the radv_queue_submit_with_ace function,
so let's delete it first.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
Add the ability to have multiple preambles,
except for the sysmem (NOIBS/GFX6) code path which still only allows 1.
This is necessary because with gang submit we will need a way to submit
a preamble to different queues at the same time.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
This is necessary for supporting gang submit.
With gang submit, the kernel now allows us to submit multiple IBs
with different IP types. Therefore, RADV will also need to group
various CSs with different IP types together and remember the IP
type of each CS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
We are going to need additional data which is not present in
the currently used struct.
This commit just adds the new struct but does not yet add
new fields to it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
When using gang submit, the last IB is considered the "gang leader"
and its IP type will determine which fence to signal when the
submission is done. Therefore, use the last CS to set the IP type.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
This is necessary because we are going to want to allow using
more than just 1 preamble.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>