Instead of doing the iadd manually we can use the uniform slot of the
ld/st/atom instruction getting rid of the iadd altogether.
Additionally for global memory we can also consume a 32 bit offset instead
of requiring it to be 64 bit.
Totals from 158539 (13.07% of 1212873) affected shaders:
CodeSize: 2308216336 -> 2242231136 (-2.86%); split: -2.86%, +0.00%
Number of GPRs: 8682436 -> 8662675 (-0.23%); split: -0.26%, +0.04%
SLM Size: 238816 -> 238604 (-0.09%)
Static cycle count: 2169063422 -> 2147747544 (-0.98%); split: -0.99%, +0.01%
Spills to memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Fills from memory: 25845 -> 25799 (-0.18%); split: -0.20%, +0.02%
Spills to reg: 45053 -> 45273 (+0.49%); split: -0.04%, +0.53%
Fills from reg: 36385 -> 36757 (+1.02%); split: -0.04%, +1.06%
Max warps/SM: 6027232 -> 6034616 (+0.12%); split: +0.12%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
This tries to handle all combinations we might run into to. We should rely
on previous optimizations that the more difficult cases never happend.
As a side benefit instead of lowering a UGPR to a GPR, it will now be
moved to the UGPR slot.
Totals from 258010 (21.27% of 1212873) affected shaders:
CodeSize: 3742700224 -> 3576740928 (-4.43%); split: -4.44%, +0.01%
Number of GPRs: 13606055 -> 13496463 (-0.81%); split: -0.86%, +0.05%
SLM Size: 589740 -> 589660 (-0.01%)
Static cycle count: 3271547493 -> 3272550831 (+0.03%); split: -0.47%, +0.50%
Spills to memory: 56180 -> 56136 (-0.08%)
Fills from memory: 56180 -> 56136 (-0.08%)
Spills to reg: 108211 -> 110013 (+1.67%); split: -0.63%, +2.30%
Fills from reg: 99216 -> 100471 (+1.26%); split: -0.30%, +1.56%
Max warps/SM: 9921228 -> 9972060 (+0.51%); split: +0.52%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
Adding the zero constants have a minor impact on stats due to some unlucky
interactions with nir_opt_cse, opt_instr_sched_prepass and assign_regs.
Totals from 61 (0.01% of 1212873) affected shaders:
CodeSize: 1044720 -> 1047472 (+0.26%); split: -0.00%, +0.27%
Static cycle count: 1198932 -> 1198490 (-0.04%); split: -0.07%, +0.04%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39384>
Stop passing drmVersionPtr to backends and make sure all
manual version checks are transitioned to
pan_kmod_driver_version_at_least() to encourage new checks
to do the same.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41704>
v14+ supports up to 256 layers in a single tiler descriptor. This comes
with the limitation that only one tiler descriptor is allowed per render
pass.
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41640>
buffer_size is uint32_t so we must be careful to not overflow it.
radeonsi had code for this but radv doesn't, which means it will
hang if RADV_THREAD_TRACE_BUFFER_SIZE is too big or if buffer_size
is being doubled up to the point it overflows.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41383>
Instead of having each driver define :
- options through DRI_CONF_OPT_* macros
- call driQueryOption*() to parse those options
- define all the variables to old those options' values
We add one script to do it all for you. All you have to do now is list
all the options you want in a python file.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41697>
Delay sqtt init until all states/funcs have been set.
Also num_contexts is initialized at the end of si_create_context
so use num_contexts == 0 to test if this is the first context.
Fixes: b2db3e1ddc ("radeonsi: add si_gfx_context.c and move code from si_pipe.c")
Reviewed-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41474>
The nir_instrs_equal normalizes the some indices but hash_intrinsic
wasn't normalizing them. Reorganize the code so both do it using the
same helper.
Fixes: b2bc57551a ("nir/instr_set: allow cse with fp_math_ctrl mismatches for intrinsics")
Assisted-by: Pi coding agent (GPT-5.5)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41606>
Everything else related to VPE is already in mm subfolder, so let's
move the pipe_video_codec implementation there as well.
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41439>
Unlike most other things where the MOCS setting combines the MOCS Index
and the protected memory bit, the EXECUTE_INDIRECT_DRAW/DISPATCH
commands take only the MOCS Index, and it's limited to only 4 bits.
Enabling the feature on ARL-H caused some tests to hit an assert when
the MOCS selected ended up out of range.
Rename the field to avoid confusion (and match documentation) and set it
through a helper function that calls the same old function and shifts it
down to fit.
Fixes: d1109f67bb ("iris: Emit EXECUTE_INDIRECT_DRAW when available")
Fixes: d161e3c2e2 ("iris: Emit a EXECUTE_INDIRECT_DISPATCH when available")
Fixes: 580728564e ("anv: Emit a EXECUTE_INDIRECT_DISPATCH when available")
Fixes: 6d4f43f0d6 ("anv: Emit EXECUTE_INDIRECT_DRAW when available")
Fixes: 7a9e82e82f ("genxml/12.5: Add the EXECUTE_INDIRECT_DISPATCH instruction")
Fixes: 4229757309 ("genxml/12.5: Add the EXECUTE_INDIRECT_DRAW instruction")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41372>
Unless the tristate is unset, which is not, it will be true when casted
to bool, as the return of this function expects.
Fixes: 2741ddd75a ("anv: fix issues found with indirect data stride")
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41372>
Thjis patch adds u_tracepoint_type to mark begin/end tracepoints.
Tracepoints inside a begin/end range will be printed with an
indentation.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41271>
Stops allocating events in chunks. u_trace_event is allocated using a
linear allocator which has minimal overhead. Buffers for timestamps are
allocated using a custom allocator.
As a sideeffect, it is possible to deduplicate consecutive tracepoints.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41271>
In case of nir_intrinsic_load_inline_data_intel it was not using base_offset to
create the uniform, instead it was using only the special BRW_INLINE_PARAM_REG
value that later will be replaced by the inline_data fixed register.
So here using base_offset for both intrinsics, adding BRW_INLINE_PARAM_REG if
nir_intrinsic_load_inline_data_intel and then in brw_shader::assign_curb_setup
checking for inst->src[i].nr >= BRW_INLINE_PARAM_REG and adjusting brw_reg by
the remaining of the subtraction with BRW_INLINE_PARAM_REG.
Fixes: 7f19814414 ("brw/nir: handle inline_data_intel more like push_data_intel")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41607>