Blender uses atomic operations as part of its virtual shadow mapping
implementation. Virtual shadow mapping page tagging in compute shaders
benefits from divergent atomics fusion, while fragment shaders doing the
atomic raster step in general have worse performance with this
optimization turned on.
Thus, an option is added to only apply divergent atomics fusion to compute
shaders in ANV, and this option is enabled for Blender.
Initial support for divergent atomics fusion optimization in ANV was added
in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40631.
Signed-off-by: Christoph Neuhauser <christoph.neuhauser@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41706>
In shader-db, with `-p skl`, shaders/0ad/12.shader_test does not
compact an instruction because precompact overwrites portions of the
instruction. (Treating the three source instruction as a two source
when accessing instruction fields.)
This instruction could be compacted:
mad(8) g65<1>F g61<4,4,1>F g64<4,4,1>F -g17<4,4,1>F { align16 1Q };
But, since precompact erroneously sets bits, the instruction isn't
compacted.
Fossil testing:
* Tested with 0a3f3fd193 ("brw: drop unused color_outputs_valid
key") reverted, as fossils are currently producing inconsitent
results otherwise.
* Tested skl, icl, dg2, mtl, lnl, bmg and ptl. Only skl had a change.
SKL:
Totals:
CodeSize: 8335219296 -> 8320248992 (-0.18%)
Totals from 359508 (14.42% of 2492689) affected shaders:
CodeSize: 2838254352 -> 2823284048 (-0.53%)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41588>
Instead of doing the iadd manually we can use the uniform slot of the
ld/st/atom instruction getting rid of the iadd altogether.
Additionally for global memory we can also consume a 32 bit offset instead
of requiring it to be 64 bit.
Totals from 158539 (13.07% of 1212873) affected shaders:
CodeSize: 2308216336 -> 2242231136 (-2.86%); split: -2.86%, +0.00%
Number of GPRs: 8682436 -> 8662675 (-0.23%); split: -0.26%, +0.04%
SLM Size: 238816 -> 238604 (-0.09%)
Static cycle count: 2169063422 -> 2147747544 (-0.98%); split: -0.99%, +0.01%
Spills to memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Fills from memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Spills to reg: 45053 -> 45273 (+0.49%); split: -0.04%, +0.53%
Fills from reg: 36385 -> 36757 (+1.02%); split: -0.04%, +1.06%
Max warps/SM: 6027232 -> 6034616 (+0.12%); split: +0.12%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
This tries to handle all combinations we might run into to. We should rely
on previous optimizations that the more difficult cases never happend.
As a side benefit instead of lowering a UGPR to a GPR, it will now be
moved to the UGPR slot.
Totals from 258010 (21.27% of 1212873) affected shaders:
CodeSize: 3742700224 -> 3576740928 (-4.43%); split: -4.44%, +0.01%
Number of GPRs: 13606055 -> 13496463 (-0.81%); split: -0.86%, +0.05%
SLM Size: 589740 -> 589660 (-0.01%)
Static cycle count: 3271547493 -> 3272550831 (+0.03%); split: -0.47%, +0.50%
Spills to memory: 56180 -> 56136 (-0.08%)
Fills from memory: 56180 -> 56136 (-0.08%)
Spills to reg: 108211 -> 110013 (+1.67%); split: -0.63%, +2.30%
Fills from reg: 99216 -> 100471 (+1.26%); split: -0.30%, +1.56%
Max warps/SM: 9921228 -> 9972060 (+0.51%); split: +0.52%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
Adding the zero constants have a minor impact on stats due to some unlucky
interactions with nir_opt_cse, opt_instr_sched_prepass and assign_regs.
Totals from 61 (0.01% of 1212873) affected shaders:
CodeSize: 1044720 -> 1047472 (+0.26%); split: -0.00%, +0.27%
Static cycle count: 1198932 -> 1198490 (-0.04%); split: -0.07%, +0.04%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
Stop passing drmVersionPtr to backends and make sure all
manual version checks are transitioned to
pan_kmod_driver_version_at_least() to encourage new checks
to do the same.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41704>
v14+ supports up to 256 layers in a single tiler descriptor. This comes
with the limitation that only one tiler descriptor is allowed per render
pass.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41640>
buffer_size is uint32_t so we must be careful to not overflow it.
radeonsi had code for this but radv doesn't, which means it will
hang if RADV_THREAD_TRACE_BUFFER_SIZE is too big or if buffer_size
is being doubled up to the point it overflows.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41383>
Instead of having each driver define :
- options through DRI_CONF_OPT_* macros
- call driQueryOption*() to parse those options
- define all the variables to old those options' values
We add one script to do it all for you. All you have to do now is list
all the options you want in a python file.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41697>
Delay sqtt init until all states/funcs have been set.
Also num_contexts is initialized at the end of si_create_context
so use num_contexts == 0 to test if this is the first context.
Fixes: b2db3e1ddc ("radeonsi: add si_gfx_context.c and move code from si_pipe.c")
Reviewed-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41474>
The nir_instrs_equal normalizes the some indices but hash_intrinsic
wasn't normalizing them. Reorganize the code so both do it using the
same helper.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Assisted-by: Pi coding agent (GPT-5.5)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41606>
Everything else related to VPE is already in mm subfolder, so let's
move the pipe_video_codec implementation there as well.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41439>
Unlike most other things where the MOCS setting combines the MOCS Index
and the protected memory bit, the EXECUTE_INDIRECT_DRAW/DISPATCH
commands take only the MOCS Index, and it's limited to only 4 bits.
Enabling the feature on ARL-H caused some tests to hit an assert when
the MOCS selected ended up out of range.
Rename the field to avoid confusion (and match documentation) and set it
through a helper function that calls the same old function and shifts it
down to fit.
Fixes: d1109f67bb ("iris: Emit EXECUTE_INDIRECT_DRAW when available")
Fixes: d161e3c2e2 ("iris: Emit a EXECUTE_INDIRECT_DISPATCH when available")
Fixes: 580728564e ("anv: Emit a EXECUTE_INDIRECT_DISPATCH when available")
Fixes: 6d4f43f0d6 ("anv: Emit EXECUTE_INDIRECT_DRAW when available")
Fixes: 7a9e82e82f ("genxml/12.5: Add the EXECUTE_INDIRECT_DISPATCH instruction")
Fixes: 4229757309 ("genxml/12.5: Add the EXECUTE_INDIRECT_DRAW instruction")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41372>
Unless the tristate is unset, which is not, it will be true when casted
to bool, as the return of this function expects.
Fixes: 2741ddd75a ("anv: fix issues found with indirect data stride")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41372>