This was something that came up in the slop MR. Not sure it's actually a
good idea or not but kind of curious what people think, given we have a
sound tool (Coccinelle) to do the transform. Saves a redundant branch
but means extra noninlined function calls.. likely no actual perf impact
but saves some code.
Via Coccinelle patches:
@@
expression ptr;
@@
-if (ptr) {
-free(ptr);
-}
+free(ptr);
@@
expression ptr;
@@
-if (ptr) {
-FREE(ptr);
-}
+FREE(ptr);
@@
expression ptr;
@@
-if (ptr) {
-ralloc_free(ptr);
-}
+ralloc_free(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-free(ptr);
-}
-
+free(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-FREE(ptr);
-}
-
+FREE(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-ralloc_free(ptr);
-}
-
+ralloc_free(ptr);
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d]
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus]
Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr]
Reviewed-by: Janne Grunau <j@jannau.net> [asahi]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv]
Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3]
Acked-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>
Here we switch to using a sorted arrays for the binary search which
significantly speeds up the lookups. For a shader with ~8000
uniforms its up to 10x faster and the godot-tps-gles3-high.trace
in issue #13894 returns to its original runtime length before we
switched to using range remap in e052254066
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
This will allow us to use the linked list for validation of inserts
during linking then switch to using an array for fast binary searches.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
If the ptr value is NULL and we match an existing entry during an
insert call we now just return the existing value. This allows us
to drop an extra lookup, this will become important as the
following patches change `util_range_remap()` lookups to use a
sorted array that is not created until after we have added all
entries to our linked list.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>
Recursion in Rust can be annoyingly expensive. The recursive DFS
implementation we have now is probably not terrible but these things
bloat in debug builds more than you'd expect. This replaceas it with a
non-recursive implementation which uses a Vec instead.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
There are four depth first searches in cfg.rs. This adds a DFS
abstraction which we can later optimize.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>
Sample positions aren't uniform when the sample id is divergent.
This was a regression when we started lowering fragment shader
barycentrics in NIR.
Fixes: 7f444fc72c ("nir: add nir_intrinsic_load_sample_positions_amd")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
This is more in line with similar opcodes like umul_32x16.
Also change its const expr: the masking based on bit size was
unnecessary as it is only defined for 32 bits. Use simple casts instead.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:
con 32 %495 = fmul! %203.i, %1 (0.000000)
con 32 %496 = ffma! %203.j, %1 (0.000000), %495
con 32 %497 = ffma! %203.k, %1 (0.000000), %496
con 32 %498 = ffma! %203.l, %1 (0.000000), %497
con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0)
con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0)
con 32 %501 = f2u32 %498
con 32 %502 = umin %501, %172 (0x4)
con 32 %503 = ishl %502, %172 (0x4)
con 32 %504 = load_const (0x00000040 = 64)
con 32 %505 = umin %503, %504 (0x40)
con 32 %506 = iadd %500, %505
The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%
Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
If the source -1.0 < x < 0.0, simply removing the ftrun will introduce
undefined behavior. By chance of how at least Intel and NVIDIA GPUs
implement f2u, this has Just Worked.
No shader-db changes on any Intel platform.
fossil-db:
Lunar Lake
Totals:
Instrs: 913264354 -> 913264366 (+0.00%)
Cycle count: 104953995530 -> 104953996854 (+0.00%)
Max live registers: 189266026 -> 189266058 (+0.00%)
Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%)
Totals from 24 (0.00% of 1984794) affected shaders:
Instrs: 4669 -> 4681 (+0.26%)
Cycle count: 50610 -> 51934 (+2.62%)
Max live registers: 1222 -> 1254 (+2.62%)
Non SSA regs after NIR: 1174 -> 1126 (-4.09%)
Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001288026 -> 1001288038 (+0.00%)
Cycle count: 92813392671 -> 92813392791 (+0.00%)
Max live registers: 121935383 -> 121935399 (+0.00%)
Max dispatch width: 19949928 -> 19949912 (-0.00%)
Totals from 2 (0.00% of 2284670) affected shaders:
Instrs: 1380 -> 1392 (+0.87%)
Cycle count: 18940 -> 19060 (+0.63%)
Max live registers: 136 -> 152 (+11.76%)
Max dispatch width: 32 -> 16 (-50.00%)
No fossil-db changes on Skylake.
Suggested-by: Georg Lehmann
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
There are no shader-db changes on ELK platforms because those platforms
don't support 8- or 16-bit integer types.
v2: Restrict patterns generated such that the integer limits are exactly
representable in the specified floating point format. With the exception
of the value 0, this requires that float_sz > int_sz. This had no impact
on shader-db or fossil-db on any Intel platform. Noticed by Georg.
v3: Add a missing is_a_number.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 889936056 -> 889934082 (<.01%)
cycles in affected programs: 65806 -> 63832 (-3.00%)
helped: 2 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00%
Spill count: 519861 -> 519813 (-0.01%)
Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01%
Max live registers: 71738626 -> 71738696 (+0.00%)
Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%)
Totals from 1236 (0.16% of 790723) affected shaders:
Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01%
Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25%
Spill count: 16509 -> 16461 (-0.29%)
Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15%
Max live registers: 250112 -> 250182 (+0.03%)
Non SSA regs after NIR: 302368 -> 302114 (-0.08%)
Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01%
Spill count: 530603 -> 530615 (+0.00%)
Fill count: 613231 -> 613273 (+0.01%)
Max live registers: 46559041 -> 46559087 (+0.00%)
Totals from 1237 (0.14% of 905547) affected shaders:
Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01%
Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37%
Spill count: 17518 -> 17530 (+0.07%)
Fill count: 32273 -> 32315 (+0.13%)
Max live registers: 128360 -> 128406 (+0.04%)
Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00%
Max live registers: 46878430 -> 46878462 (+0.00%)
Totals from 1233 (0.14% of 905427) affected shaders:
Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00%
Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10%
Max live registers: 130782 -> 130814 (+0.02%)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
v2: Add a comment around has_f2[ui]_sat explaining which opcodes it
enables. Suggested by Georg. Cast u_uintN_max and friends to double in
nir_opcodes.py. This ensures that an exact conversion is made.
Eliminate duplicate conversions from half float to double. Both noticed
by Georg.
v3: Apply "NaN should be zero" fix suggested by Georg.
Co-authored-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
The Intel Gfx12.x generation of GPU has an architecture feature called
EU fusion in which 2 subgroups run lock step. A typical case where
this happens is a compute shader with 1x1x1 local workgroup size and a
dispatch command of 2x1x1. In that case 2 threads will be run in lock
step for each of the workgroup.
This has been the sources of some troubles in the backend because one
subgroup can run with all lanes disabled, requiring care for SEND
messages using the NoMask flag (execution regardless of the lane mask).
We found out that other things are happening when 2 subgroups run
together :
- the HW will use the surface/sampler handle from only one subgroup
- the HW will use the sampler header from only one subgroup
So one of the fused subgroup can access the wrong surface/sampler if
the value is different between the 2 subgroups and that can happen
even with subgroup uniform values.
Fortunately we can flag SEND instructions to disable the fusion
behavior (most likely at a performance cost).
This change introduce a new divergence mode that tries to compute
things divergent between subgroups so that we can flag instructions
accordingly.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>
nir_recompute_io_bases will modify i/o intrinsics, which is not the
expected behaviour when the keep_intrinsics flag is set.
Fixes: 83aecc8f3f ("mesa/st, nir: commonize unlower_io_to_vars pass")
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>
When comparing an vec3 and a vec4 array, scalar type is the same for
both (float). Instead use the array element type to compare (that is,
vec3 vs vec4).
Fixes
spec@glsl-1.20@compiler@invalid-vec4-array-to-vec3-array-conversion.vert
piglit test.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37783>
Sometimes we need to select between ishr/ushr based some condition; this
builder makes this less verbose.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>
All vtn_bindgen2-generated files use the same 'vtn_bindgen_dummy' struct
name. When linking more than one file (like in panfrost), the
constructor and destructor symbols collide and every instance ends up
running the same initialization. In panfrost, this results in us
dropping any printf format strings that don't occur in v6.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: b7447a94c8 ("vtn: add vtn_bindgen2 tool")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37798>
When you're trying to figure out what shader some NIR pass broke, use
nir_shader_bisect_select() to decide between NIR pass behaviors, and then
nir_shader_bisect.py will help you automatically bisect down to which
source_blake3 is at fault. Once it's identified, it prints you a C call
you can use for selecting that shader specifically, which you can use for
continuing on in your debugging.
On a test I was looking at, this took 10 steps to bisect 134 shaders down
to the source_blake3 of the NIR shader in question.
This idea is heavily lifted from Job Noorman's ir3_shader_bisect.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37468>
Usage of '--outdir' argument in python scripts makes it very
complicated for tools like ninja-to-soong to generate the Android
equivalent build file.
This is because the option is less clear on what will be generated.
Instead, change it for '--out' where we give the full path of the file
to generate. This has the good point of deduplicating the locations of
the file name to have it only in 'meson.build'.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37741>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
This is so that passes and backends can tell if a coherent load/store is
atomic or not, instead of having to assume it could be either.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
These instructions don't need a sampler.
This doesn't fix anything now because this helper isn't unused yet, but
it will help for descriptor heap.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37720>