This reduces indirect indexing of 1-element arrays to indexing with 0.
Without this, we fail an assertion later.
Discovered when writing a test.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38113>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38097>
Slight differences due to different optimization order.
Totals from 135 (0.17% of 79839) affected shaders: (Navi48)
Instrs: 287852 -> 287527 (-0.11%); split: -0.15%, +0.03%
CodeSize: 1522972 -> 1521764 (-0.08%); split: -0.12%, +0.04%
Latency: 1806803 -> 1825754 (+1.05%); split: -0.08%, +1.12%
InvThroughput: 242693 -> 244703 (+0.83%); split: -0.02%, +0.84%
VClause: 4092 -> 4084 (-0.20%)
SClause: 7462 -> 7478 (+0.21%)
Copies: 20509 -> 20401 (-0.53%); split: -0.74%, +0.21%
Branches: 6395 -> 6386 (-0.14%)
PreSGPRs: 7334 -> 7337 (+0.04%); split: -0.03%, +0.07%
PreVGPRs: 6375 -> 6382 (+0.11%)
VALU: 151787 -> 151595 (-0.13%); split: -0.15%, +0.02%
SALU: 52967 -> 52910 (-0.11%); split: -0.23%, +0.12%
VMEM: 6704 -> 6696 (-0.12%)
SMEM: 12099 -> 12129 (+0.25%)
Tested on a small collection of 2518 shaders from Dredge with callgrind using RADV:
baseline:
nir_opt_algebraic was called 12917 times from radv_optimize_nir()
nir_opt_cse was called 15204 times from radv_optimize_nir()
relative time spent in radv_optimize_nir(): 31.48%
total instruction fetch cost: 28,642,638,021
with nir/algebraic: ad-hoc constant-fold ALU instructions
nir_opt_algebraic was called 12797 times from radv_optimize_nir()
nir_opt_cse was called 12963 times from radv_optimize_nir()
relative time spent in radv_optimize_nir(): 30.63%
total instruction fetch cost: 28,284,386,123
=> ~1.27% improvement in total compile times
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37195>
Wrapping jump instructions that are located inside ifs can break SSA
invariants because the else block no longer dominates the merge block.
Repair the SSA to make the validator happy again.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37957>
Most of the time, we can infer the type to append in
util_dynarray_append using __typeof__, which is standardized in C23 and
support in Jesse's MSMSVCV. This patch drops the type argument most of
the time, making util_dynarray a little more ergonomic to use.
This is done in four steps.
First, rename util_dynarray_append -> util_dynarray_append_typed
bash -c "find . -type f -exec sed -i -e 's/util_dynarray_append(/util_dynarray_append_typed(/g' \{} \;"
Then, add a new append that infers the type. This is much more ergonomic
for what you want most of the time.
Next, use type-inferred append as much as possible, via Coccinelle
patch (plus manual fixup):
@@
expression dynarray, element;
type type;
@@
-util_dynarray_append_typed(dynarray, type, element);
+util_dynarray_append(dynarray, element);
Finally, hand fixup cases that Coccinelle missed or incorrectly
translated, of which there were several because we can't used the
untyped append with a literal (since the sizeof won't do what you want).
All four steps are squashed to produce a single patch changing every
util_dynarray_append call site in tree to either drop a type parameter
(if possible) or insert a _typed suffix (if we can't infer). As such,
the final patch is best reviewed by hand even though it was
tool-assisted.
No Long Linguine Meals were involved in the making of this patch.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38038>
In !35844, there was some discussion about allowing 64-bit bcsel that
would be lowered in the driver. One challenge there would be if a 64-bit
bcsel was transformed into integer min or max by an algebraic
optimization. I believe these were the only algebraic patterns that
could create new integer min or max that would not be immediately
constant folded.
There were no shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38033>
Introduces an opt pass that attempts to optimize
load_barycentric_at_{sample,offset} with simpler load_barycentric_*
equivalents where possible, and optionally lowers
load_barycentric_at_sample to load_barycentric_at_offset with a position
derived from the sample ID instead.
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37658>
nir_lower_compute_system_values will attempt to lower
load_workgroup_size unless workgroup_size_variable is set. For precomp
shaders, the workgroup size is set statically for each entrypoint by
nir_precompiled_build_variant. Because we call
lower_compute_system_values early, it sets the workgroup size to zero.
Temporarily setting workgroup_size_variable while we are still
processing all the entrypoints together inhibits this.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 20970bcd96 ("panfrost: Add base of OpenCL C infrastructure")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37799>
Not sure about creating u64vec16 loads, but creating unaligned loads is
possible with opt_if_rewrite_uniform_uses.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37953>
Those are normally uniform always, but for the purpose of fused
threads handling, we need to check their sources.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ca1533cd03 ("nir/divergence: add a new mode to cover fused threads on Intel HW")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>
df1876f615 ("nir: Mark negative re-distribution on fadd as imprecise")
fixed the fadd case by marking it as imprecise. This commit fixes the
ffma case for the same reason.
However, "imprecise" isn't necessary and nowadays we have "nsz" which is
more accurate here. Use that for both fadd and ffma.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>
Introduce new NIR intrinsics to handle getting a "sink" read-only
address and another intrinsic to handle conversion of address to
read-write (allowing implementation to replace the "sink" read-only with
another address like required for Asahi)
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
This is used by the geometry lowering that we are going to move to
common code.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>
This was something that came up in the slop MR. Not sure it's actually a
good idea or not but kind of curious what people think, given we have a
sound tool (Coccinelle) to do the transform. Saves a redundant branch
but means extra noninlined function calls.. likely no actual perf impact
but saves some code.
Via Coccinelle patches:
@@
expression ptr;
@@
-if (ptr) {
-free(ptr);
-}
+free(ptr);
@@
expression ptr;
@@
-if (ptr) {
-FREE(ptr);
-}
+FREE(ptr);
@@
expression ptr;
@@
-if (ptr) {
-ralloc_free(ptr);
-}
+ralloc_free(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-free(ptr);
-}
-
+free(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-FREE(ptr);
-}
-
+FREE(ptr);
@@
expression ptr;
@@
-if (ptr != NULL) {
-ralloc_free(ptr);
-}
-
+ralloc_free(ptr);
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d]
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus]
Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr]
Reviewed-by: Janne Grunau <j@jannau.net> [asahi]
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv]
Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3]
Acked-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Job Noorman <jnoorman@igalia.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>
Sample positions aren't uniform when the sample id is divergent.
This was a regression when we started lowering fragment shader
barycentrics in NIR.
Fixes: 7f444fc72c ("nir: add nir_intrinsic_load_sample_positions_amd")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
This is more in line with similar opcodes like umul_32x16.
Also change its const expr: the masking based on bit size was
unnecessary as it is only defined for 32 bits. Use simple casts instead.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>
Intel platforms will soon implement both bfi and bitfield_select. bfi is
more efficient for bitfield_insert.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>
The eliminated SENDs are from a single app that has a bunch of
fragment shaders with a sequence like:
con 32 %495 = fmul! %203.i, %1 (0.000000)
con 32 %496 = ffma! %203.j, %1 (0.000000), %495
con 32 %497 = ffma! %203.k, %1 (0.000000), %496
con 32 %498 = ffma! %203.l, %1 (0.000000), %497
con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0)
con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0)
con 32 %501 = f2u32 %498
con 32 %502 = umin %501, %172 (0x4)
con 32 %503 = ishl %502, %172 (0x4)
con 32 %504 = load_const (0x00000040 = 64)
con 32 %505 = umin %503, %504 (0x40)
con 32 %506 = iadd %500, %505
The `f2u` is replaced with 0, and that makes the `ffma` dot-product
sequence be unused. Since it is unused, most of the preceeding block
gets eliminated. A lot of instructions after the `f2u` are also
eliminated by other algebraic optimizations. Most importantly, %203 is
the result of a `load_ubo_uniform_block_intel` that is eliminated.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00%
Send messages: 40892036 -> 40887569 (-0.01%)
Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00%
Max live registers: 190030365 -> 190030367 (+0.00%)
Max dispatch width: 47415040 -> 47415024 (-0.00%)
Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00%
Totals from 2234 (0.11% of 1955134) affected shaders:
Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00%
Send messages: 44179 -> 39712 (-10.11%)
Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00%
Max live registers: 367357 -> 367359 (+0.00%)
Max dispatch width: 39184 -> 39168 (-0.04%)
Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>