The fmul+fadd -> fma rules in nir_opt_algebraic are marked imprecise,
because they are a contraction. However, they respect signed zero/Inf/NaN rules.
As such, it is legal to do this fusion with shader float controls as long as the
exact bit is not set (mapping to SPIR-V NoContract).
Unfortunately, NIR's imprecise rules do not distinguish between contraction
issues versus float special case issues, forcing nir_search to skip all
imprecise rules when any shader float control modes are used. This notably
affects DXVK, which sets shader float controls to get D3D11 float behaviour and
hence loses FMA fusing.
Therefore, we plumb in the exact bit to express NoContract independent of the
float controls, and weaken the requirement for fma fusion to allowable
contraction. For fma splitting, it's a similar issue, as inexact GLSL fma in
SPIR-V is just a multiply add that we're allowed to contract rather than the
real deal.
Drivers that use their own FMA fusing passes (notably, Intel and AMD) are
unaffected, but DXVK-capable drivers using fuse_ffma should like this. Results
on hk shown:
Totals from 2194 (4.06% of 54019) affected shaders:
MaxWaves: 2174272 -> 2175936 (+0.08%); split: +0.08%, -0.01%
Instrs: 1173283 -> 1131494 (-3.56%); split: -3.57%, +0.01%
CodeSize: 8568168 -> 8381724 (-2.18%); split: -2.18%, +0.01%
Spills: 1094 -> 747 (-31.72%)
Fills: 988 -> 681 (-31.07%)
Scratch: 4444 -> 3820 (-14.04%)
ALU: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
FSCIB: 953032 -> 913149 (-4.18%); split: -4.19%, +0.01%
IC: 215398 -> 215274 (-0.06%)
GPRs: 139865 -> 139032 (-0.60%); split: -1.56%, +0.96%
Uniforms: 414886 -> 414466 (-0.10%); split: -0.14%, +0.04%
Preamble instrs: 646398 -> 644017 (-0.37%); split: -0.43%, +0.07%
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35989>
Similar to nir_lower_alu_width(), the callback can return the
desired number of components for a phi, or 0 for no lowering.
The previous behavior of nir_lower_phis_to_scalar() with lower_all=true
can be elicited via nir_lower_all_phis_to_scalar() while the previous
behavior with lower_all=false now corresponds to nir_lower_phis_to_scalar()
with NULL callback.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
This check causes unnecessary overhead and can be replaced by simply
checking whether a phi_src is from a loop continue block.
Except for rare edge cases, the result will be the same.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35783>
With the check for the cap gone no driver will expose
ARB_shader_clock. Unfortunately the CI doesn't catch this
because it doesn't provide expectations whether a test should
pass or be skipped. In this case
spec@arb_shader_clock@execution@clock
spec@arb_shader_clock@execution@clock2x32
went from pass to skip. (Tested on r600, but on radeonsi one
can also see that the extension ARB_shader_clock is no longer
available).
Adding the test for the cap back in fixes this.
Fixes: 2ce201707e (Add support for EXT_shader_clock)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35992>
Add a new field userq_num_hqds to drm_amdgpu_info_hw_ip to expose the
number of available hardware queue descriptors (HQDs) for user queues.
This allows userspace to query the maximum number of user queues that
can be created for a particular IP block.
the patch link in driver side:
https://lists.freedesktop.org/archives/amd-gfx/2025-June/126686.html
v2: we should also put userq_num_hqds into radeon_info and
print it where other fields are printed. (Marek Olšák)
v3: rename num_userqs to num_queue_slots
and add print log in ac_print_gpu_info. (Marek Olšák)
v4: rename userq_num_hqds to userq_num_slots in hw_ip_info,
and update the hw information (Marek Olšák)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35850>
Traces jobs upload their results at the end, making them incompatible
with the current design of CI-tron which doesn't allow internet access
for security reasons, so they are not included for now.
We're working on a solution for controlled access to specific domains,
and will add the traces jobs once that's ready.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35215>
It's done later in nir_lower_io_passes only for shader stages not
supporting indirect access.
Unfortunately we have add a hack into nir_lower_io_passes to get rid of
output loads. A later commit will remove it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
If the last block is empty, nir_block_last_instr returns NULL, which
sets the cursor to NULL, which crashes.
I think this can't crash currently because if xfb is present, there is
always at least 1 output store in the last block due to
lower_io_vars_to_temporaries, but that won't be true after we stop
calling it in a later commit.
Fixes: fa9cee4247 - glsl: implement lower_xfb_varying() as a NIR pass
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
The GLSL compiler always lowers inputs to temps for VS and GS, so exclude
them from driver support because the GLSL compiler will no longer do that
unconditionally. Thus, indirect VS and GS inputs are completely untested
and broken in a lot of drivers.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
These drivers set lower_all_io_to_temps = true, which means all indirect
access is always lowered except TCS, which is skipped by
nir_lower_io_vars_to_temporaries. Based on that, these drivers have never
received indirect IO for non-TCS shaders.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35945>
From the Vulkan spec:
`If pColorAttachmentLocations is NULL, it is
equivalent to setting each element to its index
within the array.`
Use similar logic to what we do in
CmdSetRenderingInputAttachmentIndices to handle
this behaviour properly.
Signed-off-by: Autumn Ashton <misyl@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35948>
using a screen method for this is broken since the value can change
before it is flushed. it must be passed along with the methods that use it
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35866>