Commit graph

10886 commits

Author SHA1 Message Date
Qiang Yu
799806d85e all: rename PIPE_SHADER_MESH_TYPES to MESA_SHADER_MESH_STAGES
Use command:
  find . -type f -not -path '*/.git/*' -exec sed -i 's/\bPIPE_SHADER_MESH_TYPES\b/MESA_SHADER_MESH_STAGES/g' {} +

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:40 +08:00
Qiang Yu
7729920d92 all: rename PIPE_SHADER_MESH to MESA_SHADER_MESH
Use command:
  find . -type f -not -path '*/.git/*' -exec sed -i 's/\bPIPE_SHADER_MESH\b/MESA_SHADER_MESH/g' {} +

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:39 +08:00
Qiang Yu
f60ea0a3cd all: rename PIPE_SHADER_COMPUTE to MESA_SHADER_COMPUTE
Use command:
  find . -type f -not -path '*/.git/*' -exec sed -i 's/PIPE_SHADER_COMPUTE/MESA_SHADER_COMPUTE/g' {} +

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:39 +08:00
Marek Olšák
fee8e92855 nir: use gc_ctx for nir_variable to reduce ralloc/malloc overhead
gc_ctx uses a slab allocator. This reduces GLSL compile times by 1-3%
with the gallium noop driver.

This reduces the number of ralloc_size calls for Heaven shaders by 14.3%.
Note that gc_ctx also uses ralloc_size, so the reduction is a net change.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
2025-08-05 22:55:14 +00:00
Marek Olšák
44350bce1f nir: add nir_variable_create_zeroed helper
This will allow us to switch nir_variable from ralloc to gc_ctx,
which uses a slab allocator.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
2025-08-05 22:55:14 +00:00
Marek Olšák
b769d5dcde nir: don't use variables as ralloc parents, use the shader instead
so that we can switch variables to gc_ctx

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
2025-08-05 22:55:13 +00:00
Marek Olšák
dadd4e4555 nir/clone: don't call ralloc_strdup with a NULL pointer for intrinsic names
No impact, but it was affecting my ralloc_strdup stats for
nir_intrinsic_instr names.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
2025-08-05 22:55:13 +00:00
Marek Olšák
3c4a64e807 nir: eliminate most ralloc/malloc for nir_variable names
Store small names in a fixed-sized string in nir_variable.
GLSL IR does the same thing.

When compiling my shader-db with the gallium noop driver, it improves GLSL
compile times by 0.7% (much lower than anticipated).

For Unigine Heaven shaders:
- it eliminates 95.6% ralloc calls for nir_variable names
- the total number of ralloc calls is reduced by 11%

It also adds only 16B to nir_variable, while just the ralloc header
for the name would occupy 40B.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
2025-08-05 22:55:12 +00:00
Marek Olšák
96ffc24e4e nir: add nir_variable_{set,append,steal}_name{f}() to modify nir_variable names
Setting variable names currently always uses ralloc, but the new
nir_variable_* helpers will mostly eliminate ralloc/malloc in a later
commit.

This just updates all places that touch nir_variable names to use the new
helpers.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
2025-08-05 22:55:12 +00:00
Marek Olšák
05749922b0 nir: don't allocate nir_constant::elements if there are none
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
2025-08-05 22:55:11 +00:00
Dave Airlie
b1242e6b30 spirv: move cmat store barrier after the store.
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: b98f87612b ("spirv: Implement SPV_KHR_cooperative_matrix")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36583>
2025-08-05 22:28:03 +00:00
Job Noorman
ae66bd1c00 nir/opt_uniform_subgroup: use ballot_bit_count
Using bit_count on the result of ballot doesn't work for targets where
ballot's num_components > 1.

Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Fixes: d2e1e4442a ("ir3: enable nir_opt_uniform_subgroup")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35669>
2025-08-05 17:09:27 +00:00
Antonio Ospite
5649a0aa06 libcl: avoid calling UNREACHABLE(str) macro without arguments
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
In commit 9ced3148ca ("util: avoid calling UNREACHABLE(str) macro
without arguments", 2025-07-30) the argument type check in the
UNREACHABLE(str) macro in src/util/macros.h was improved to also avoid
calling it without arguments, but the definition in
src/compiler/libcl/libcl.h was not updated.

Apply a similar change to src/compiler/libcl/libcl.h to keep the C and
CL macros in sync.

Fixes: 9ced3148ca ("util: avoid calling UNREACHABLE(str) macro without arguments", 2025-07-30)

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> on gfx8 (Polaris 20)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36508>
2025-08-04 23:15:18 +02:00
Georg Lehmann
1d885fab9c nir/opt_algebraic: optimize pack_half_rtz of b2f
Foz-DB Navi21:
Totals from 13 (0.02% of 80255) affected shaders:
Instrs: 2313 -> 2306 (-0.30%); split: -0.35%, +0.04%
CodeSize: 13452 -> 13480 (+0.21%)
Latency: 12066 -> 12013 (-0.44%); split: -0.45%, +0.01%
InvThroughput: 2172 -> 2163 (-0.41%)
Copies: 112 -> 114 (+1.79%)
VALU: 1480 -> 1472 (-0.54%)
SALU: 154 -> 155 (+0.65%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
bc3b09c5dd nir/opt_algebraic: optimize pack_half_rtz of bcsel with constant
Foz-DB Navi21:
Totals from 448 (0.56% of 80255) affected shaders:
Instrs: 345474 -> 344791 (-0.20%); split: -0.20%, +0.00%
CodeSize: 1917784 -> 1913324 (-0.23%); split: -0.25%, +0.02%
VGPRs: 22344 -> 22416 (+0.32%)
Latency: 2320847 -> 2318161 (-0.12%); split: -0.13%, +0.01%
InvThroughput: 543008 -> 541722 (-0.24%)
SClause: 11450 -> 11459 (+0.08%)
Copies: 19991 -> 19949 (-0.21%); split: -0.23%, +0.02%
PreSGPRs: 19129 -> 19114 (-0.08%)
PreVGPRs: 19695 -> 19696 (+0.01%); split: -0.01%, +0.01%
VALU: 257627 -> 256948 (-0.26%)
SALU: 30432 -> 30422 (-0.03%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
8512479097 nir/opt_algebraic: create 16bit fmin/fmax if only used by pack_half_2x16_rtz_split
Foz-DB Navi21:
Totals from 1842 (2.30% of 80066) affected shaders:
Instrs: 869152 -> 866751 (-0.28%)
CodeSize: 4687316 -> 4682496 (-0.10%); split: -0.14%, +0.03%
VGPRs: 75216 -> 75312 (+0.13%)
Latency: 7297749 -> 7297929 (+0.00%); split: -0.01%, +0.02%
InvThroughput: 1864933 -> 1860706 (-0.23%); split: -0.23%, +0.00%
Copies: 52679 -> 52463 (-0.41%)
VALU: 665076 -> 662890 (-0.33%)
SALU: 56226 -> 56010 (-0.38%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Georg Lehmann
22afe83473 nir/opt_algebraic: remove fneg around fmin/fmax
Foz-DB Navi21:
Totals from 282 (0.35% of 80255) affected shaders:
Instrs: 310515 -> 309755 (-0.24%)
CodeSize: 1721236 -> 1714540 (-0.39%)
Latency: 1366446 -> 1365141 (-0.10%); split: -0.10%, +0.00%
InvThroughput: 352528 -> 351097 (-0.41%); split: -0.41%, +0.00%
Copies: 24623 -> 24630 (+0.03%)
VALU: 231716 -> 230951 (-0.33%)
SALU: 28774 -> 28779 (+0.02%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36535>
2025-08-04 19:42:22 +00:00
Rhys Perry
d4b329219e nir/lower_memory_model: remove empty lowered barriers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36080>
2025-08-04 15:36:51 +00:00
Rhys Perry
0512ba8743 vtn: remove acquire/release around make visible/available barriers
These are not necessary and can be expensive. I think they were added
because of a misunderstanding of the informative descriptions in the
Vulkan memory model, or because the memory model requires make
visible/available barriers to have these semantics.

Because we use these to implement MakePointerVisible/MakePointerAvailable,
we can skip that requirement in NIR.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36080>
2025-08-04 15:36:51 +00:00
Rhys Perry
ae6e39a8f5 nir: don't move accesses across make visible/available barriers
Otherwise, the barrier would no longer affect the access.

nir_opt_dead_write_vars should be fine, since it's removing stores, not
moving them.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36080>
2025-08-04 15:36:50 +00:00
Rhys Perry
d54f2ca84f vtn: fix placement of barriers for MakeAvailable/MakeVisible
From Vulkan 1.4.321 spec:
The implicit availability operation is program-ordered between the barrier
or atomic and all other operations program-ordered before the barrier or
atomic.
...
The implicit visibility operation is program-ordered between the barrier
or atomic and all other operations program-ordered after the barrier or
atomic.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36080>
2025-08-04 15:36:49 +00:00
Mary Guillemard
440e0c283c libcl: Add stdatomic.h
Useful when using C11 atomics with CL C.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35724>
2025-08-04 12:12:51 +00:00
Rhys Perry
4c36e08854 glsl_to_nir,vtn: insert barriers around begin/end invocation interlock
Backends probably already deal with this, but these would be needed to
prevent NIR passes from moving accesses outside the critical section.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513>
2025-08-04 09:30:06 +00:00
Marek Olšák
8462b1dc71 glsl: switch ir_variable_refcount to linear_ctx
Compiling my shader-db with the gallium noop driver is 6.8% faster now.
Theoretical stat-based results are below, which don't always reflect real
results.

When compiling Heaven shaders with the gallium noop driver,
134610 calloc calls are removed.

134610 / ralloc count = 6%, so it's roughly the equivalent of 6% of
the cost of all ralloc calls that's removed. The shift from calloc to
linear_alloc increases ralloc calls by 0.4%, so the approximate reduction
is 6% -> 0.4% overhead change.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36539>
2025-08-04 02:07:00 +00:00
Marek Olšák
dfe45d1b67 glsl: switch ir_instruction to linear_ctx to eliminate malloc overhead
Compiling my shader-db with the gallium noop driver is 3.6% faster now.

malloc calls from ralloc+linear_alloc are reduced by 34% when compiling
Heaven shaders with the gallium noop driver. That's due to a shift of
malloc calls from ralloc to linear_alloc.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36539>
2025-08-04 02:07:00 +00:00
Marek Olšák
6b2cb71560 glsl: add support for linear_ctx into ir_instruction
The type of the "new operator" parameter determines whether ir_instruction
is allocated with linear_ctx or ralloc. The ralloc operators will be
removed in the next commit.

GCC expects classes with virtual functions to have a virtual destructor,
but linear_ctx has static assertions that expects that no destructor is
present. Remove the assertions, as that's our only option. The destructor
is empty including in all derived classes, so it doesn't have to execute.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36539>
2025-08-04 02:07:00 +00:00
Marek Olšák
ae5b168051 ralloc/linalloc: allow adding custom code to LINEAR_ALLOC new operator
for GLSL IR

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36539>
2025-08-04 02:07:00 +00:00
Marek Olšák
4f2b8e7713 glsl/tests: fix memory leaks
Fixes: 09cc5f0c37 - glsl: use pipe_screen::nir_options instead of NirOptions

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36539>
2025-08-04 02:06:59 +00:00
Alyssa Rosenzweig
e8ff9eb9cb nir/opt_varyings: link interpolation qualifiers
Some hardware (AGX, Imagination, Arm) really want to know the interpolation
qualifiers when compiling the vertex shader. Even though we need to handle this
dynamic for separate shaders, we can improve performance by linking.
nir_opt_varyings already has all the information to do this, so just do so.

Note this has to be done in common code for Gallium, which links varyings within
the GLSL linker but then presents the linked programs as separate shader
objects. This models that nicely, allowing Gallium drivers to optimize without
weird sidebands.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36501>
2025-08-03 21:57:25 +00:00
Alyssa Rosenzweig
66740d9c91 nir: gather interpolation qualifiers
we'll want this to be able to link interpolation qualifiers in a simple way with
nir_opt_varyings. add the metadata for it and the FS gathering pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36501>
2025-08-03 21:57:25 +00:00
Alyssa Rosenzweig
b8f50b6317 nir: gather info in opt_varyings_bulk
the info is all messed up so we need to do this right after. merge this
code.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36501>
2025-08-03 21:57:25 +00:00
Alyssa Rosenzweig
3e8575c037 nir,agx: pull lower_printf_buffer into backend
no other users now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36516>
2025-08-03 21:27:50 +00:00
Alyssa Rosenzweig
1c28fc0a86 nir: add nir_inline_sysval pass
a bunch of drivers have versions of this, might as well make a common one.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36516>
2025-08-03 21:27:47 +00:00
Emma Anholt
d5826506ce nir,agx: Move AGX's loop (generalized) to shared NIR code.
When I went to use opt_reassociate for tu, I was advised that you want to
do this loop to get the best results.  If everyone needs it, let's make it
common code and explain what's going on.

In the process, also make it skip work appropriately when there's no
progress.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36342>
2025-08-03 20:58:28 +00:00
Emma Anholt
062a35b554 nir/lower_sample_shading: Set the sample qualifier on in vars.
This is another step in setting things up, that zink would like to have.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36496>
2025-08-03 20:27:39 +00:00
Emma Anholt
d3ada77a6a nir: Move ST's force-persample-shading NIR pass to shared code.
This is about to grow a little.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36496>
2025-08-03 20:27:39 +00:00
Alyssa Rosenzweig
aca4948997 clc: force exact! across libclc
libclc seems to have piles of bugs where it relies on precise floating point
behaviours to meet CL precision requirements but doesn't actually disable fast
math in its own spir-v. I am tired of playing this whack-a-mole game. Let's just
assume that the math in CLC is right and should not be optimized in unsafe ways,
and force the exact bit across libclc. This works around a large class of libclc
bugs that keep cropping up from innocuous NIR changes.

This does not force the exact bit for application shaders using libclc, just for
the calculations inside of libclc itself. This seems like the right tradeoff all
considered, anything "fast" bypasses libclc anyway.

Fixes generated_tests/cl/builtin/math/builtin-float-pow-1.0.generated.cl on
drivers using nir_opt_reassociate, and probably other stuff.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36527>
2025-08-01 21:00:47 +00:00
Georg Lehmann
cfd5fbfde1 nir/opt_algebraic: make fmin/fmax(a, #b) 16bit if only used by f2f16
Foz-DB Navi31:
Totals from 11 out of 14 FSR4 shaders:
Instrs: 58298 -> 58374 (+0.13%); split: -0.08%, +0.21%
CodeSize: 397836 -> 398108 (+0.07%); split: -0.08%, +0.15%
Latency: 209634 -> 211438 (+0.86%); split: -0.14%, +1.00%
InvThroughput: 229152 -> 229314 (+0.07%); split: -0.03%, +0.10%
VClause: 826 -> 847 (+2.54%); split: -0.36%, +2.91%
Copies: 2954 -> 3040 (+2.91%); split: -1.56%, +4.47%
VALU: 49637 -> 49711 (+0.15%); split: -0.06%, +0.21%
VOPD: 1916 -> 1400 (-26.93%)

These stats looks bad, but it's actually just unlucky RA.
Replacing 1 VOPD (two v_dual_max_f32) with 1 VOP3P (v_pk_max_f16)
should still be a win from a register bandwidth perspective.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:30 +00:00
Georg Lehmann
3168ebe2c5 nir/range_analysis: look through vec2
Foz-DB Navi31:
Totals from 11 out of 14 FSR4 shaders:
Instrs: 58987 -> 58298 (-1.17%)
CodeSize: 402844 -> 397836 (-1.24%)
Latency: 209630 -> 209634 (+0.00%); split: -0.66%, +0.66%
InvThroughput: 230240 -> 229152 (-0.47%); split: -0.48%, +0.00%
VClause: 838 -> 826 (-1.43%); split: -1.55%, +0.12%
Copies: 3019 -> 2954 (-2.15%); split: -2.82%, +0.66%
VALU: 50196 -> 49637 (-1.11%)
VOPD: 1950 -> 1916 (-1.74%); split: +0.72%, -2.46%

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:29 +00:00
Georg Lehmann
caf89c97de nir/range_analysis: look through f2f
Foz-DB Navi31:
Totals from 93 (0.12% of 80273) affected shaders:
Instrs: 123927 -> 121073 (-2.30%); split: -2.30%, +0.00%
CodeSize: 670832 -> 653332 (-2.61%); split: -2.61%, +0.00%
Latency: 337678 -> 322803 (-4.41%); split: -4.41%, +0.00%
InvThroughput: 63277 -> 61083 (-3.47%)
VClause: 460 -> 373 (-18.91%)
SClause: 2178 -> 2100 (-3.58%)
Copies: 7637 -> 7744 (+1.40%)
PreSGPRs: 4414 -> 4287 (-2.88%)
PreVGPRs: 4229 -> 4230 (+0.02%)
VALU: 77375 -> 75693 (-2.17%)
SALU: 16497 -> 16383 (-0.69%); split: -0.73%, +0.04%
VMEM: 561 -> 477 (-14.97%)
SMEM: 3197 -> 3113 (-2.63%)

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:28 +00:00
Georg Lehmann
261239a492 nir/opt_algebraic: use range analysis to detect no-op fmin/fmax
Foz-DB Navi31:
Totals from 418 (0.52% of 80273) affected shaders:
Instrs: 564550 -> 564387 (-0.03%); split: -0.04%, +0.01%
CodeSize: 2983860 -> 2982684 (-0.04%); split: -0.05%, +0.01%
Latency: 4387264 -> 4386397 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 717464 -> 716874 (-0.08%); split: -0.08%, +0.00%
Copies: 40126 -> 40125 (-0.00%)
VALU: 352128 -> 352003 (-0.04%); split: -0.04%, +0.01%
SALU: 50290 -> 50283 (-0.01%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:28 +00:00
Georg Lehmann
a0665e79e9 nir/opt_algebraic: push fsat into bcsel with constant
bcsel doesn't have a free clamp modifier on AMD hardware,
but what's inside might have free clamp.

Foz-DB Navi31:
Totals from 873 (1.09% of 80273) affected shaders:
MaxWaves: 22008 -> 21968 (-0.18%)
Instrs: 4624956 -> 4623950 (-0.02%); split: -0.04%, +0.02%
CodeSize: 24152780 -> 24142884 (-0.04%); split: -0.05%, +0.01%
VGPRs: 57900 -> 57960 (+0.10%)
Latency: 28762622 -> 28749889 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 5320810 -> 5320145 (-0.01%); split: -0.02%, +0.00%
VClause: 115879 -> 115929 (+0.04%); split: -0.10%, +0.14%
SClause: 93058 -> 93059 (+0.00%); split: -0.01%, +0.02%
Copies: 335674 -> 335845 (+0.05%); split: -0.05%, +0.10%
PreSGPRs: 53819 -> 53843 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 50908 -> 50939 (+0.06%); split: -0.02%, +0.08%
VALU: 2816395 -> 2815514 (-0.03%); split: -0.04%, +0.01%
SALU: 509988 -> 509987 (-0.00%); split: -0.02%, +0.02%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:27 +00:00
Georg Lehmann
e9e5146848 nir/opt_algebraic: optimize fsat(fmax(a, b)) where b is not positive
Foz-DB Navi31:
Totals from 946 (1.18% of 80273) affected shaders:
Instrs: 4986082 -> 4983988 (-0.04%); split: -0.04%, +0.00%
CodeSize: 25998700 -> 25989796 (-0.03%); split: -0.04%, +0.00%
Latency: 45514742 -> 45510330 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 8163529 -> 8162325 (-0.01%); split: -0.02%, +0.00%
VClause: 112105 -> 112104 (-0.00%); split: -0.00%, +0.00%
SClause: 109694 -> 109688 (-0.01%)
Copies: 372356 -> 372284 (-0.02%); split: -0.03%, +0.01%
Branches: 132636 -> 132633 (-0.00%)
PreVGPRs: 58997 -> 58979 (-0.03%); split: -0.03%, +0.00%
VALU: 3025662 -> 3024191 (-0.05%); split: -0.05%, +0.00%
SALU: 551712 -> 551714 (+0.00%); split: -0.00%, +0.00%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36468>
2025-08-01 20:29:27 +00:00
Alyssa Rosenzweig
bcf1a1c20b treewide: use nir_def_block
Via Coccinelle patch:

    @@
    expression definition;
    @@

    -definition->parent_instr->block
    +nir_def_block(definition)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Alyssa Rosenzweig
82ae8b1d33 treewide: simplify nir_def_rewrite_uses_after
Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.

We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.

Via Coccinelle patch:

    @@
    expression a, b;
    @@

    -nir_def_rewrite_uses_after(a, b, b->parent_instr)
    +nir_def_rewrite_uses_after_def(a, b)

Followed by a bunch of sed.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Alyssa Rosenzweig
cc6e3b84cb treewide: use nir_def_as_*
Via Coccinelle patch:

    @@
    expression definition;
    @@

    -nir_instr_as_alu(definition->parent_instr)
    +nir_def_as_alu(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_intrinsic(definition->parent_instr)
    +nir_def_as_intrinsic(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_phi(definition->parent_instr)
    +nir_def_as_phi(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_load_const(definition->parent_instr)
    +nir_def_as_load_const(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_deref(definition->parent_instr)
    +nir_def_as_deref(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_tex(definition->parent_instr)
    +nir_def_as_tex(definition)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Alyssa Rosenzweig
114bf69956 nir: add nir_def_block helper
Another common composition.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Alyssa Rosenzweig
3624f054f2 nir: add nir_def_as_* helpers
We want to get rid of nir_def::parent_instr eventually, requiring an accessor
function instead nir_def_parent_instr(def), so to mitigate the hit to NIR
ergonomics, let's add helpers for common patterns using parent_instr. This gets
us an immediate win for NIR ergonomics and then reduces the surface area for the
later flag day hiding parent_instr.

This commit starts us off by adding compositions for nir_instr_as_* with
parent_instr's, which are common.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Lionel Landwerlin
83cb02206c compiler: add gl_shader_stage_is_graphics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36512>
2025-08-01 11:35:00 +00:00
Marek Olšák
c64c6a0c31 nir/opt_group_loads: support tex instructions without resource srcs for i915
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Fixes: aa732f6f - nir/group_loads: handle more loads (or a later commit)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13624

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36498>
2025-07-31 23:30:20 -04:00