When the hardware doesn't natively support 32-bit predication, the
driver has a fallback which allocates a 64-bit predicate to the upload
BO in order to copy the original value.
But when conditional rendering is enabled in the stateCommandBuffer
which is used by preprocess() and the execute() is recorded also in the
stateCommandBuffer. If the preprocess() is recorded in a different
cmdbuf which is submitted before the cmdbuf that contains execute(),
the fallback (ie. alloc + COPY_DATA) will be performed after. This would
cause the predicate value to be always 0.
To fix that, keep track of the user predication VA which is the only
VA that needs to be used by DGC because it reads 32-bit from the shader.
This fixes a very weird corner case with vkd3d-proton.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13143
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34953>
Only the bottom 16 bits matter of the select source matter so we can
throw away the top 16 bits and avoid any i20 encoding issues. All of
the back-ends were already doing this except SM70 which has 32-bit
immediates anyway. However, doing it in a common place where it's
documented is better than skattering it everywhere. Also, doing it as
part of legalization ensures that we see the same thing in the
post-legalize IR as gets encoded.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34678>
Every back-end has code to mask these because the hardware only has
limited encoding space. However, this can be done as a common
legalization operation and doing so means that our post-legalize IR
matches what actually gets encoded.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34678>
SM20 was smart enough to reduce shift immediates instead of just
detecting i20 overflow and adding copies. This adds helpers to make
this easier and propagates the improvement out to all the back-ends.
Even though it isn't necessary on Volta+, we might as well do it there
for consistency and because smaller shift values are easier to read in
the final assembly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34678>
The checked_shr wasn't returning the correct value if .wrap was not set.
We also weren't checking this case in the unit tests so we missed it.
While we're here, get rid of a bunch of pointhess `as u64` as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34678>
As other video memories for AV1 are already allocated for the maximum
sizes, now it does the same for MV buffers too.
This fixes a bunch of artifacts of AV1 playing.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866>
Reduction ops don't return anything, including predicates. On Turing
through Hopper, this doesn't matter because these bits are ignored.
However, Blackwell uses those bits to adjust address calculations for
reduction ops.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34910>
We had a bool for most of them and an enum for OpTld4. Now we have an
enum for all of them and we just reserve PerPx for OpTld4. While we're
here, rework printing to put the "." in the enum display method.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34910>
Pre-blackwell, it's ignored so we can set whatever. On Blackwell+, it
seems to be take into account somehow (more RE needed?) so we need to
set it to rZ to get the old behavior.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34910>
I can't see this in traces on Blackwell and it causes hangs.
These regs are in the hopper class headers so should be fine there.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34910>
Now that it is possible to have more than one initrd, let's switch to
the common b2c kernel which requires two additional initrds:
* The GPU initrd which contains amdgpu, i915, nouveau, radeon, and xe,
along with their necessary firmware
* The depmod initrd which contains what's necessary to modprobe the
modules of the GPU initrd
Since the GPU initrd is huge (73 MB), let's reduce the size by dropping
all the firmware that is not related to AMD.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34881>
This saves a bit of bandwidth when we're not going to use the value.
Improves renderpass times across 4 affected traces I tested (bioshock,
stranded deep, transport fever, and godot material testers) on sysmem by
.3% +/- .1%.
A similar change for avoiding stencil reads showed no change on the one
app affected among all of our renderdoc traces, so that's left out.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34964>
The driconf options were leaked when the panvk instance was destroyed.
Fixes: aa8fec638f ("panvk: add basic driconf infrastructure")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34986>
si_get_shader_variant_info doesn't need to check the kill flags because
killed stores are removed from NIR before that.
Only shader variants need to clear the writes_* flags if the epilog kills
them.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34492>
Just use nir_def_bits_used() instead of the manually written conditions.
These are gathered from bits of load_scalar_arg(vs_state_bits):
- uses_vs_state_indexed
- uses_gs_state_provoking_vtx_first
- uses_gs_state_outprim
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34492>