gc_ctx uses a slab allocator. This reduces GLSL compile times by 1-3%
with the gallium noop driver.
This reduces the number of ralloc_size calls for Heaven shaders by 14.3%.
Note that gc_ctx also uses ralloc_size, so the reduction is a net change.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
Store small names in a fixed-sized string in nir_variable.
GLSL IR does the same thing.
When compiling my shader-db with the gallium noop driver, it improves GLSL
compile times by 0.7% (much lower than anticipated).
For Unigine Heaven shaders:
- it eliminates 95.6% ralloc calls for nir_variable names
- the total number of ralloc calls is reduced by 11%
It also adds only 16B to nir_variable, while just the ralloc header
for the name would occupy 40B.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
Setting variable names currently always uses ralloc, but the new
nir_variable_* helpers will mostly eliminate ralloc/malloc in a later
commit.
This just updates all places that touch nir_variable names to use the new
helpers.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36538>
Using bit_count on the result of ballot doesn't work for targets where
ballot's num_components > 1.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Fixes: d2e1e4442a ("ir3: enable nir_opt_uniform_subgroup")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35669>
In commit 9ced3148ca ("util: avoid calling UNREACHABLE(str) macro
without arguments", 2025-07-30) the argument type check in the
UNREACHABLE(str) macro in src/util/macros.h was improved to also avoid
calling it without arguments, but the definition in
src/compiler/libcl/libcl.h was not updated.
Apply a similar change to src/compiler/libcl/libcl.h to keep the C and
CL macros in sync.
Fixes: 9ced3148ca ("util: avoid calling UNREACHABLE(str) macro without arguments", 2025-07-30)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> on gfx8 (Polaris 20)
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36508>
These are not necessary and can be expensive. I think they were added
because of a misunderstanding of the informative descriptions in the
Vulkan memory model, or because the memory model requires make
visible/available barriers to have these semantics.
Because we use these to implement MakePointerVisible/MakePointerAvailable,
we can skip that requirement in NIR.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36080>
Otherwise, the barrier would no longer affect the access.
nir_opt_dead_write_vars should be fine, since it's removing stores, not
moving them.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36080>
From Vulkan 1.4.321 spec:
The implicit availability operation is program-ordered between the barrier
or atomic and all other operations program-ordered before the barrier or
atomic.
...
The implicit visibility operation is program-ordered between the barrier
or atomic and all other operations program-ordered after the barrier or
atomic.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36080>
Useful when using C11 atomics with CL C.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35724>
Backends probably already deal with this, but these would be needed to
prevent NIR passes from moving accesses outside the critical section.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36513>
Compiling my shader-db with the gallium noop driver is 6.8% faster now.
Theoretical stat-based results are below, which don't always reflect real
results.
When compiling Heaven shaders with the gallium noop driver,
134610 calloc calls are removed.
134610 / ralloc count = 6%, so it's roughly the equivalent of 6% of
the cost of all ralloc calls that's removed. The shift from calloc to
linear_alloc increases ralloc calls by 0.4%, so the approximate reduction
is 6% -> 0.4% overhead change.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36539>
Compiling my shader-db with the gallium noop driver is 3.6% faster now.
malloc calls from ralloc+linear_alloc are reduced by 34% when compiling
Heaven shaders with the gallium noop driver. That's due to a shift of
malloc calls from ralloc to linear_alloc.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36539>
The type of the "new operator" parameter determines whether ir_instruction
is allocated with linear_ctx or ralloc. The ralloc operators will be
removed in the next commit.
GCC expects classes with virtual functions to have a virtual destructor,
but linear_ctx has static assertions that expects that no destructor is
present. Remove the assertions, as that's our only option. The destructor
is empty including in all derived classes, so it doesn't have to execute.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36539>
Some hardware (AGX, Imagination, Arm) really want to know the interpolation
qualifiers when compiling the vertex shader. Even though we need to handle this
dynamic for separate shaders, we can improve performance by linking.
nir_opt_varyings already has all the information to do this, so just do so.
Note this has to be done in common code for Gallium, which links varyings within
the GLSL linker but then presents the linked programs as separate shader
objects. This models that nicely, allowing Gallium drivers to optimize without
weird sidebands.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36501>
we'll want this to be able to link interpolation qualifiers in a simple way with
nir_opt_varyings. add the metadata for it and the FS gathering pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36501>
the info is all messed up so we need to do this right after. merge this
code.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36501>
no other users now.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36516>
a bunch of drivers have versions of this, might as well make a common one.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: John Anthony <john.anthony@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36516>
When I went to use opt_reassociate for tu, I was advised that you want to
do this loop to get the best results. If everyone needs it, let's make it
common code and explain what's going on.
In the process, also make it skip work appropriately when there's no
progress.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36342>
libclc seems to have piles of bugs where it relies on precise floating point
behaviours to meet CL precision requirements but doesn't actually disable fast
math in its own spir-v. I am tired of playing this whack-a-mole game. Let's just
assume that the math in CLC is right and should not be optimized in unsafe ways,
and force the exact bit across libclc. This works around a large class of libclc
bugs that keep cropping up from innocuous NIR changes.
This does not force the exact bit for application shaders using libclc, just for
the calculations inside of libclc itself. This seems like the right tradeoff all
considered, anything "fast" bypasses libclc anyway.
Fixes generated_tests/cl/builtin/math/builtin-float-pow-1.0.generated.cl on
drivers using nir_opt_reassociate, and probably other stuff.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36527>
Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.
We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.
Via Coccinelle patch:
@@
expression a, b;
@@
-nir_def_rewrite_uses_after(a, b, b->parent_instr)
+nir_def_rewrite_uses_after_def(a, b)
Followed by a bunch of sed.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
Another common composition.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
We want to get rid of nir_def::parent_instr eventually, requiring an accessor
function instead nir_def_parent_instr(def), so to mitigate the hit to NIR
ergonomics, let's add helpers for common patterns using parent_instr. This gets
us an immediate win for NIR ergonomics and then reduces the surface area for the
later flag day hiding parent_instr.
This commit starts us off by adding compositions for nir_instr_as_* with
parent_instr's, which are common.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>