this is subtle, but the relevant igc:
// In case of shooting down of this instruction, we need to add sync to
// preserve the swsb id sync, so that it's safe to clear the dep
if (currInst.hasPredication() ||
(currInst.getExecSize() != dep.getInstruction()->getExecSize()) ||
(currInst.getChannelOffset() != dep.getInstruction()->getChannelOffset()))
needSyncForShootDownInst = true;
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40960>
This appears to cause some sort of prefetching which is causing
page faults for linear textures on the following page after the
texture allocation.
This might be okay for tiled, but for now just disable it.
The test crashing this was to allocate an 800x409 linear 2D texture
which gnome-initial-setup was doing.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15277
Cc: mesa-stable
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40939>
We need to use these registers on another file and I don't want to add
another copy of their definition to our code base.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40937>
This check for ">= 125" is already inside a check for ">= 125". Also,
let's take this opportunity to comment the #else and #endif of the
relevant check to make the code easier to follow.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40937>
This helps RA a bit by reducing the size of the vectors passed to tex
instructions and therefore eliminate a few movs.
Totals from 145533 (12.51% of 1163204) affected shaders:
CodeSize: 1868329120 -> 1855817520 (-0.67%); split: -0.70%, +0.03%
Number of GPRs: 7007196 -> 7007028 (-0.00%); split: -0.01%, +0.01%
Static cycle count: 1157484762 -> 1153189018 (-0.37%); split: -0.46%, +0.09%
Spills to reg: 30581 -> 30580 (-0.00%)
Fills from reg: 33263 -> 33262 (-0.00%)
Max warps/SM: 5911104 -> 5911100 (-0.00%); split: +0.00%, -0.00%
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40900>
The option uses the dumping already implemented for rra to gather
statistics about BVHs on the CPU and write them to a csv file. This csv
file can then be compared using a tool similar to report-fossils to
judge the impact of changes to the bvh build code.
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38463>
On unconditional branches qpu_set_branch_targets() can fill the delay slots
with a copy of the first instructions of the successor block.
As the qpu validator is sequential it would detect an incorrect hazard
when the MULTOP was copied but the UMUL24 wasn't.
This was identified in debug build when running gfxbench5.aztec_ruins_vk.
Assisted-by: Claude Opus 4.6
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40923>
The validation of branch instructions happening in branch and thrsw
delay slots has been dead code since it was introduced as the check
was after:
if (inst->type != V3D_QPU_INSTR_TYPE_ALU)
return;
Now last_branch_ip is updated and checks in_branch_delay_slots()
are active.
Fixes in_branch_delay_slots, as for branch there are always 3 delay slots.
As scheduler enforces this restrictions shader-db does not show any
regression.
Assisted-by: Claude Opus 4.6
Fixes: 90269ba353 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40923>
Partial revert of a22ad99bdd ("pvr: set device features/props/extensions to
Vulkan 1.0 minimums (unless implemented)"), as this optional feature is fully
implemented already.
Tested with:
dEQP-VK.*wide*
dEQP-VK.dynamic_state.monolithic.line_width.*
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40890>
Partial revert of a22ad99bdd ("pvr: set device features/props/extensions to
Vulkan 1.0 minimums (unless implemented)"), as this optional feature is fully
implemented already.
Tested with:
dEQP-VK.*depth_bias*
dEQP-VK.*bias_clamp*
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40890>
Partial revert of a22ad99bdd ("pvr: set device features/props/extensions to
Vulkan 1.0 minimums (unless implemented)"), as this optional feature is fully
implemented already.
Tested with:
dEQP-VK.draw.*_multi_draw
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40890>
Partial revert of a22ad99bdd ("pvr: set device features/props/extensions to
Vulkan 1.0 minimums (unless implemented)"), as this optional feature is fully
implemented already.
It also turns out that Vulkan CTS was already testing this feature even though
it wasn't being advertised as supported,
dEQP-VK.draw.renderpass.indexed_draw.draw_instanced_indexed_triangle_list being
an example of this.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40890>
We do this later on lowered IO anyway, in radv_nir_trim_fs_color_exports.
That pass is also per component, and not per output slot.
No Foz-DB changes on Navi48.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40928>
Loop unrolling can create phis when constants are defined in the loop but
used outside of it. Ideally this should not happen, but for now we have
to remove these as soon as possible before they trip up other passes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40928>
Instead of duplicating fake devices.
This requires to move amdgpu_devices.* to the common folder so
they can be shared between shim and tests.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40656>