Commit graph

939 commits

Author SHA1 Message Date
Ella Stanforth
bb07364c54 v3d/compiler: remove num_samplers_used from shader key
This is only ever used by assertions.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34742>
2025-05-08 06:25:22 +00:00
Ella Stanforth
01d0ccd664 v3d/compiler: remove unused texture swizzle
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34742>
2025-05-08 06:25:22 +00:00
Ella Stanforth
76e27d2d0d v3d/compiler: remove return_channels from the shader key
This isn't used anywhere.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34742>
2025-05-08 06:25:22 +00:00
Ella Stanforth
b39fc710ee v3d/compiler: remove int/uint tracking
We don't need this anymore as we do not support anything older than 4.2.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34742>
2025-05-08 06:25:22 +00:00
Ella Stanforth
42154029fc v3d/compiler: Implement software blend lowering
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33942>
2025-04-23 09:03:41 +00:00
Ella Stanforth
a6f67d5b69 v3d/compiler: Only lower logic ops for color buffers that exist
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33942>
2025-04-23 09:03:41 +00:00
Ella Stanforth
1ec0cdb733 v3d/compiler: Fixup output types for all 8 outputs
Cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33942>
2025-04-23 09:03:41 +00:00
Juan A. Suarez Romero
f5e36e382f broadcom/compiler: initialize register
This fixes issue detected by static analyzer: passed-by-value struct
argument contains uninitialized data (e.g., field: 'file').

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34050>
2025-04-04 15:55:13 +00:00
Juan A. Suarez Romero
0e50b09d4a broadcom/compiler: don't use VLA on emit alu
Using constant-size array instead of variable-length array is preferred
due several issues with the latter.

Particularly, for this case using VLA generates several warnings by
static analyzer: passed-by-value struct argument contains uninitialized
data (e.g., field: 'file').

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34050>
2025-04-04 15:55:13 +00:00
Juan A. Suarez Romero
01151f045f broadcom/compiler: use safe iterator to remove instructions
The current approach has an issue detected by static analyzer: use of
memory after it is freed.

Using a proper iterator makes things safer.

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34050>
2025-04-04 15:55:13 +00:00
Connor Abbott
7a55e13939 nir, compiler: Rename needs_quad_helper_invocations
This currently treats coarse and fine derivatives the same, but Qualcomm
needs to know whether just coarse derivatives are used or fine
derivatives/quad ops are also used. Rename this to
needs_coarse_quad_helper_invocations make clear the difference from the
new field, needs_full_quad_helper_invocations.

Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: 264d8a6766 ("ir3: Set need_full_quad depending on info.fs.require_full_quads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33862>
2025-03-14 21:55:57 +00:00
Ella Stanforth
332b313547 v3d: enable framebuffer fetch
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33766>
2025-03-12 13:28:16 +00:00
Ella Stanforth
6023a46d02 v3d/compiler: Implement load_output
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33766>
2025-03-12 13:28:16 +00:00
Alyssa Rosenzweig
9a58a8257e treewide: Switch to nir_progress
Via the Coccinelle patch at the end of the commit message, followed by

sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')
ninja -C ~/mesa/build clang-format
cd ~/mesa/src/compiler/nir && clang-format -i *.c
agxfmt

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -return true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -return false;
    -}
    +bool progress = prog_expr;
    +return nir_progress(progress, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, prog ? (metadata) : nir_metadata_all);
    +nir_progress(prog, impl, metadata);

    @@
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    -return true;
    +return nir_progress(true, impl, metadata);

    @@
    expression impl;
    @@

    -nir_metadata_preserve(impl, nir_metadata_all);
    -return false;
    +return nir_no_progress(impl);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    -other_prog |= prog;
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +nir_progress(prog, impl, metadata);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -other_prog = true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    identifier prog;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -prog = true;
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +prog = prog | nir_progress(impl_progress, impl, metadata);

    @@
    identifier other_prog, prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -other_prog = true;
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +other_prog = other_prog | nir_progress(prog, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    identifier prog;
    @@

    -if (prog_expr) {
    -prog = true;
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +prog = prog | nir_progress(impl_progress, impl, metadata);

    @@
    expression prog_expr, impl, metadata;
    @@

    -if (prog_expr) {
    -nir_metadata_preserve(impl, metadata);
    -} else {
    -nir_metadata_preserve(impl, nir_metadata_all);
    -}
    +bool impl_progress = prog_expr;
    +nir_progress(impl_progress, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    -prog = true;
    +prog = nir_progress(true, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -}
    -return prog;
    +return nir_progress(prog, impl, metadata);

    @@
    identifier prog;
    expression impl, metadata;
    @@

    -if (prog) {
    -nir_metadata_preserve(impl, metadata);
    -}
    +nir_progress(prog, impl, metadata);

    @@
    expression impl;
    @@

    -nir_metadata_preserve(impl, nir_metadata_all);
    +nir_no_progress(impl);

    @@
    expression impl, metadata;
    @@

    -nir_metadata_preserve(impl, metadata);
    +nir_progress(true, impl, metadata);

squashme! sed -ie 's/progress = progress | /progress |=/g' $(git grep -l 'progress = prog')

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33722>
2025-02-26 15:19:53 +00:00
Georg Lehmann
f26069fdd9 nir: replace nir_opt_conditional_discard with nir_opt_peephole_select
Foz-DB Navi21:
Totals from 118 (0.15% of 79377) affected shaders:
Instrs: 208001 -> 207355 (-0.31%); split: -0.33%, +0.01%
CodeSize: 1080428 -> 1078432 (-0.18%); split: -0.20%, +0.02%
SpillSGPRs: 202 -> 211 (+4.46%)
Latency: 1923508 -> 1919093 (-0.23%); split: -0.62%, +0.39%
InvThroughput: 407475 -> 407081 (-0.10%); split: -0.12%, +0.02%
SClause: 7050 -> 7033 (-0.24%); split: -0.31%, +0.07%
Copies: 12156 -> 11821 (-2.76%); split: -3.04%, +0.28%
PreSGPRs: 8198 -> 8331 (+1.62%); split: -0.02%, +1.65%
PreVGPRs: 7628 -> 7528 (-1.31%)
VALU: 155747 -> 155657 (-0.06%); split: -0.06%, +0.00%
SALU: 18295 -> 17782 (-2.80%); split: -2.98%, +0.18%
SMEM: 10521 -> 10519 (-0.02%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33590>
2025-02-20 21:59:17 +00:00
Georg Lehmann
ca8147edbe nir/peephole_select: add options struct
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33590>
2025-02-20 21:59:16 +00:00
Juan A. Suarez Romero
1e0e521a7d broadcom/compiler: move stores to the end of shader
It is possible that shader comes with output stores executed before
loading inputs. As the memory to read the inputs and store the outputs
is the same, this mean it could be overwriting the inputs before reading
them.

This move avoids this situation.

This partially improves
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33053.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33310>
2025-02-03 17:10:47 +00:00
Konstantin Seurer
60a20bcf3d nir: Stop using instructions for debug info
Annotating ssa defs without affecting compilation is impossible with
debug info instructions since referencing a nir_def from the debug info
instr will add uses.

The old approach also stops worrking if passes reorder instructions.

This patch proposes a solution which should not regress performance just
like the old approach. The difference is that this one allocates a bit
more space for debug info instead of adding a new instruction for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>
2025-01-30 20:14:01 +00:00
Daniel Schürmann
f3be7ce01b nir/from_ssa: only consider divergence if requested
This pass used to unconditionally use divergence information
which forced the caller to either call divergence_analysis or
ensure that the divergence is properly reset.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33009>
2025-01-23 01:31:23 +00:00
Alyssa Rosenzweig
7a4469681e nir: pass a callback to nir_lower_robust_access
rather than try to enumerate everything a driver might want with an unmanageable
collection of booleans, just do a filter callback + data. this ends up simpler
overall, and will allow Intel to use this pass for just 64-bit images without
needing to add even more booleans.

while we're churning the pass signature, also do a quick port to
nir_shader_intrinsics_pass

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [NIR and V3D]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32907>
2025-01-08 15:59:05 +00:00
Juan A. Suarez Romero
0d14e129bc v3d: avoid 0-size variable length array
Declaring a variable-length array (VLA) based on a variable that can be
0 is declared dangerous.

In this case, the variable can't take value 0, so adding an assertion
fixes the issue.

This was detected by static analyzer.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32819>
2025-01-07 14:21:42 +00:00
Marek Olšák
c21bc65ba7 nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads
A negative hole size means the loads overlap. This will be used by drivers
to handle overlapping loads in the callback easily.

Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>
2025-01-01 00:03:55 +00:00
Juan A. Suarez Romero
fd19106773 broadcom/compiler: fix fp16 conversion operations
The case for converting a 32-bit integer to 16-bit float is not
correctly implemented.

Fixes: 214121e9b0 ("broadcom/compiler: handle fp16 conversion ops")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32593>
2024-12-16 10:56:38 +00:00
Juan A. Suarez Romero
8ffdf5a2ab broadcom/compiler: ensure offset source exists
As the lowering is applied on a load uniform intrinsic, there must be an
offset source number.

This fixes CID#1604734 ("Negative array index read") detected by
Coverity Scan.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32593>
2024-12-16 10:56:38 +00:00
Marek Olšák
7f4e36ff7d gallium: replace PIPE_SHADER_CAP_INDIRECT_INPUT/OUTPUT_ADDR with NIR options
This is a prerequisite for enabling nir_opt_varyings for all gallium
drivers.

nir_lower_io_passes (called by the GLSL linker) only uses NIR options
to lower indirect IO access before lowering IO and calling
nir_opt_varyings.

Most drivers report full support for indirect IO and lower it themselves,
which prevents compaction of lowered indirectly accessed varyings because
nir_opt_varyings doesn't touch indirect varyings.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> (Rb for asahi)
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> (for r300)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32423>
2024-12-03 12:57:36 +00:00
Iago Toral Quiroga
f988a2f336 broadcom: move double-buffer heuristic helpers to the compiler
This avoids pulling the dependency on NIR headers in
libbroadcom_v3d.

Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32240>
2024-11-21 07:21:47 +00:00
Marek Olšák
25d4943481 nir: make use_interpolated_input_intrinsics a nir_lower_io parameter
This will need to be set to true when the GLSL linker lowers IO, which
can later be unlowered by st/mesa, and then drivers can lower it again
without load_interpolated_input. Therefore, it can't be a global
immutable option.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32229>
2024-11-20 02:45:37 +00:00
Rhys Perry
45c1280d2c nir_lower_mem_access_bit_sizes: pass access to callback
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Rhys Perry
61752152f7 nir_lower_mem_access_bit_sizes: add nir_mem_access_shift_method
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Jose Maria Casanova Crespo
5b951bcdd7 v3d: Enable Early-Z with discards when depth updates are disabled
The Early-Z optimization is disabled when there is a discard
instruction in the shader used in the draw call.

But if discard is the only reason to disable Early-Z, and at
draw call time the updates in the draw call are disabled we
can enable Early-Z using a shader variant.

If there are occlussion queries active we also need to disable
Early-z optimization.

So this patch enables Early-Z in this scenario.

The performance improvement is significant when running gfxbench
benchmark showing an average improvement of 11.15%

fps_avg  helped:  gl_gfxbench_aztec_high.trace:  3.13 ->  3.73 (19.13%)
fps_avg  helped:  gl_gfxbench_aztec.trace:       4.82 ->  5.68 (17.88%)
fps_avg  helped:  gl_gfxbench_manhattan31.trace: 5.10 ->  6.00 (17.59%)
fps_avg  helped:  gl_gfxbench_manhattan.trace:   7.24 ->  8.36 (15.52%)
fps_avg  helped:  gl_gfxbench_trex.trace:       19.25 -> 20.17 ( 4.81%)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32028>
2024-11-12 13:26:38 +00:00
Daniel Schürmann
c8348139fd nir: change signature of nir_src_is_divergent()
Now, it takes nir_src * instead of nir_src.
Also move the implementation to nir_divergence_analysis.c.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>
2024-10-24 10:06:17 +00:00
Daniel Schürmann
c25c63ebc0 nir/divergence: separately indicate whether loops have divergent continues or breaks
bool nir_loop_is_divergent(nir_loop *)
 replaces the previous loop->divergent indicator.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>
2024-10-24 10:06:17 +00:00
Marek Olšák
65ace5649b nir: reject unsupported component counts from all vectorize callbacks
If you allow an unsupported component count in the callback for loads,
nir_opt_load_store_vectorize will align num_components to the next supported
vector size, essentially overfetching.

This changes all callbacks to reject it. AMD will enable it in a later commit.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Marek Olšák
02923e237d nir: add hole_size parameter into the vectorize callback
It will be used to allow merging loads with a hole between them.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29398>
2024-10-15 05:50:24 +00:00
Christian Gmeiner
cf939334e6 v3d: Add a few function traces
Sprinkle around a few traces that were useful in locating submit and
fence waits.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31575>
2024-10-14 12:21:51 +00:00
Iago Toral Quiroga
4d1971f17f broadcom: fix pairing tmu lookup with previous ldtmu
There are some restrictions when pairing a new TMU lookup with
a previous LDTMU and we had code to handle this but we were not
limiting the restriction only to TMU lookups.

total instructions in shared programs: 10856992 -> 10823967 (-0.30%)
instructions in affected programs: 1823670 -> 1790645 (-1.81%)
helped: 10212
HURT: 110
Instructions are helped.

total max-temps in shared programs: 2234069 -> 2233153 (-0.04%)
max-temps in affected programs: 15100 -> 14184 (-6.07%)
helped: 660
HURT: 3
Max-temps are helped.

total sfu-stalls in shared programs: 15935 -> 15967 (0.20%)
sfu-stalls in affected programs: 317 -> 349 (10.09%)
helped: 31
HURT: 57
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 10872927 -> 10839934 (-0.30%)
inst-and-stalls in affected programs: 1824656 -> 1791663 (-1.81%)
helped: 10199
HURT: 111
Inst-and-stalls are helped.

total nops in shared programs: 185612 -> 185767 (0.08%)
nops in affected programs: 4865 -> 5020 (3.19%)
helped: 164
HURT: 256
Nops are HURT.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31574>
2024-10-10 06:58:15 +00:00
Iago Toral Quiroga
c58bfb355a broadcom/compiler: generate mali opcodes for clamping on Pi5
Models C0 and D0 support these opcodes too.

total instructions in shared programs: 10869461 -> 10856992 (-0.11%)
instructions in affected programs: 1467666 -> 1455197 (-0.85%)
helped: 6012
HURT: 1413
Instructions are helped.

total threads in shared programs: 431014 -> 431010 (<.01%)
threads in affected programs: 8 -> 4 (-50.00%)
helped: 0
HURT: 2

total uniforms in shared programs: 5432771 -> 5430909 (-0.03%)
uniforms in affected programs: 183047 -> 181185 (-1.02%)
helped: 976
HURT: 128
Uniforms are helped.

total max-temps in shared programs: 2235272 -> 2234069 (-0.05%)
max-temps in affected programs: 38163 -> 36960 (-3.15%)
helped: 1262
HURT: 168
Max-temps are helped.

total spills in shared programs: 4331 -> 4363 (0.74%)
spills in affected programs: 964 -> 996 (3.32%)
helped: 6
HURT: 47

total fills in shared programs: 6527 -> 6622 (1.46%)
fills in affected programs: 2047 -> 2142 (4.64%)
helped: 6
HURT: 47

total sfu-stalls in shared programs: 15807 -> 15935 (0.81%)
sfu-stalls in affected programs: 787 -> 915 (16.26%)
helped: 71
HURT: 172
Sfu-stalls are HURT.

total inst-and-stalls in shared programs: 10885268 -> 10872927 (-0.11%)
inst-and-stalls in affected programs: 1469423 -> 1457082 (-0.84%)
helped: 5998
HURT: 1417
Inst-and-stalls are helped.

total nops in shared programs: 184280 -> 185612 (0.72%)
nops in affected programs: 10000 -> 11332 (13.32%)
helped: 311
HURT: 1193
Nops are HURT.

The results show a reduction in register pressure, but an increase in
spills, which looks contradictory. This is because for some reason, this
optimization makes the NIR scheduler produce code for some shaders in Godot
that cause additional spilling, but the problem seems to be exclusive to
Godot shaders and not really related to the optimization itself but to
how the NIR scheduler works. Excluding Godot shaders we actually see a
decrease in spills and a slightly larger improvement in instruction
counts:

total instructions in shared programs: 10720106 -> 10707621 (-0.12%)
instructions in affected programs: 1375316 -> 1362831 (-0.91%)
helped: 5948
HURT: 1364
Instructions are helped.

total threads in shared programs: 428248 -> 428244 (<.01%)
threads in affected programs: 8 -> 4 (-50.00%)
helped: 0
HURT: 2

total spills in shared programs: 3729 -> 3712 (-0.46%)
spills in affected programs: 451 -> 434 (-3.77%)
helped: 6
HURT: 0

total fills in shared programs: 4738 -> 4714 (-0.51%)
fills in affected programs: 564 -> 540 (-4.26%)
helped: 6
HURT: 0

Comparing only shaders from Godot:

total instructions in shared programs: 149355 -> 149371 (0.01%)
instructions in affected programs: 92350 -> 92366 (0.02%)
helped: 64
HURT: 49
Inconclusive result (value mean confidence interval includes 0).

total max-temps in shared programs: 16477 -> 16472 (-0.03%)
max-temps in affected programs: 180 -> 175 (-2.78%)
helped: 5
HURT: 0
Max-temps are helped.

total spills in shared programs: 602 -> 651 (8.14%)
spills in affected programs: 513 -> 562 (9.55%)
helped: 0
HURT: 47

total fills in shared programs: 1789 -> 1908 (6.65%)
fills in affected programs: 1483 -> 1602 (8.02%)
helped: 0
HURT: 47

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>
2024-10-03 09:02:08 +00:00
Iago Toral Quiroga
c57be33d96 broadcom/compiler: implement NIR mali opcodes for clamping
These translate directly to new unpack modifiers on V3D 7.x.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31480>
2024-10-03 09:02:08 +00:00
Iago Toral Quiroga
5a62d47762 broadcom/compiler: don't use small immediates in geometry stages
Shader-db shows this is beneficial, even if it comes with a small
increase in register pressure.

total instructions in shared programs: 10889197 -> 10869857 (-0.18%)
instructions in affected programs: 3625014 -> 3605674 (-0.53%)
helped: 14911
HURT: 8324
Instructions are helped.

total threads in shared programs: 431034 -> 431014 (<.01%)
threads in affected programs: 40 -> 20 (-50.00%)
helped: 0
HURT: 10
Threads are HURT.

total uniforms in shared programs: 5308006 -> 5432767 (2.35%)
uniforms in affected programs: 2204951 -> 2329712 (5.66%)
helped: 9
HURT: 30766
Uniforms are HURT.

total max-temps in shared programs: 2226471 -> 2235269 (0.40%)
max-temps in affected programs: 272670 -> 281468 (3.23%)
helped: 2372
HURT: 8479
Max-temps are HURT.

total spills in shared programs: 4318 -> 4331 (0.30%)
spills in affected programs: 39 -> 52 (33.33%)
helped: 2
HURT: 7

total fills in shared programs: 6514 -> 6527 (0.20%)
fills in affected programs: 42 -> 55 (30.95%)
helped: 2
HURT: 7

total sfu-stalls in shared programs: 15166 -> 15808 (4.23%)
sfu-stalls in affected programs: 2389 -> 3031 (26.87%)
helped: 513
HURT: 944
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 10904363 -> 10885665 (-0.17%)
inst-and-stalls in affected programs: 3660930 -> 3642232 (-0.51%)
helped: 14878
HURT: 8450
Inst-and-stalls are helped.

total nops in shared programs: 183672 -> 184256 (0.32%)
nops in affected programs: 12532 -> 13116 (4.66%)
helped: 1841
HURT: 2251
Nops are HURT.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31355>
2024-09-25 14:21:46 +00:00
Iago Toral Quiroga
390849f6a2 broadcom/compiler: don't add const offset to unifa if it is 0
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31355>
2024-09-25 14:21:46 +00:00
Iago Toral Quiroga
09e0e53a3b broadcom/compiler: avoid register conflict with ldunif(a) and ldvary
ldvary instructions have implicit writes to rf0 (r5 in Pi4) that are
read in follow-up instructions to complete the interpolation calculations
so we rather not allocate ldunif(a)'s dst to rf0/r5  during these sequence
too to facilitate pairing.

This gives us -0.25% of instructions for fragment shaders in shader-db for
Pi5 and -0.64% on Pi4.

Shader-db Pi5:

total instructions in shared programs: 10890641 -> 10889197 (-0.01%)
instructions in affected programs: 575506 -> 574062 (-0.25%)
helped: 2506
HURT: 1378
Instructions are helped.

total max-temps in shared programs: 2226555 -> 2226471 (<.01%)
max-temps in affected programs: 5061 -> 4977 (-1.66%)
helped: 139
HURT: 78
Max-temps are helped.

total sfu-stalls in shared programs: 15143 -> 15166 (0.15%)
sfu-stalls in affected programs: 310 -> 333 (7.42%)
helped: 134
HURT: 195
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 10905784 -> 10904363 (-0.01%)
inst-and-stalls in affected programs: 577053 -> 575632 (-0.25%)
helped: 2497
HURT: 1415
Inst-and-stalls are helped.

total nops in shared programs: 183945 -> 183672 (-0.15%)
nops in affected programs: 3862 -> 3589 (-7.07%)
helped: 478
HURT: 234
Nops are helped.

Shader-db Pi4:

total instructions in shared programs: 12842116 -> 12835720 (-0.05%)
instructions in affected programs: 996970 -> 990574 (-0.64%)
helped: 6027
HURT: 367
Instructions are helped.

total max-temps in shared programs: 2251877 -> 2251707 (<.01%)
max-temps in affected programs: 2670 -> 2500 (-6.37%)
helped: 167
HURT: 9
Max-temps are helped.

total sfu-stalls in shared programs: 21132 -> 21093 (-0.18%)
sfu-stalls in affected programs: 114 -> 75 (-34.21%)
helped: 92
HURT: 55
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 12863248 -> 12856813 (-0.05%)
inst-and-stalls in affected programs: 1008237 -> 1001802 (-0.64%)
helped: 6070
HURT: 359
Inst-and-stalls are helped.

total nops in shared programs: 281645 -> 281200 (-0.16%)
nops in affected programs: 2241 -> 1796 (-19.86%)
helped: 501
HURT: 88
Nops are helped.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31355>
2024-09-25 14:21:46 +00:00
Iago Toral Quiroga
917e8e5439 broadcom/compiler: rename is_ldunif_dst to try_rf0
We flag nodes used to ldunif dst so we can try and favor allocating
rf0 to them, so be more explicit about its purpose.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31355>
2024-09-25 14:21:46 +00:00
Iago Toral Quiroga
68014b0d9b broadcom/compiler: skip small immediates optimization on vpm instructions
total instructions in shared programs: 11164938 -> 10890641 (-2.46%)
instructions in affected programs: 6557250 -> 6282953 (-4.18%)
helped: 59134
HURT: 9752
Instructions are helped.

total threads in shared programs: 431068 -> 431034 (<.01%)
threads in affected programs: 68 -> 34 (-50.00%)
helped: 0
Threads are HURT.

total uniforms in shared programs: 3880437 -> 5308006 (36.79%)
uniforms in affected programs: 2669367 -> 4096936 (53.48%)
helped: 2
HURT: 74046
Uniforms are HURT.

total max-temps in shared programs: 2244298 -> 2226555 (-0.79%)
max-temps in affected programs: 463611 -> 445868 (-3.83%)
helped: 17473
HURT: 8040
Max-temps are helped.

total spills in shared programs: 4312 -> 4318 (0.14%)
spills in affected programs: 0 -> 6
helped: 0
HURT: 2

total fills in shared programs: 6508 -> 6514 (0.09%)
fills in affected programs: 0 -> 6
helped: 0
HURT: 2

total sfu-stalls in shared programs: 14794 -> 15143 (2.36%)
sfu-stalls in affected programs: 1261 -> 1610 (27.68%)
helped: 238
HURT: 586
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total inst-and-stalls in shared programs: 11179732 -> 10905784 (-2.45%)
inst-and-stalls in affected programs: 6570407 -> 6296459 (-4.17%)
helped: 59126
HURT: 9786
Inst-and-stalls are helped.

total nops in shared programs: 273422 -> 183945 (-32.72%)
nops in affected programs: 139446 -> 49969 (-64.17%)
helped: 60679
HURT: 2277
Nops are helped.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31259>
2024-09-23 07:45:46 +00:00
Konstantin Seurer
ce24486ee4 nir: Introduce nir_debug_info_instr
Adds a new instruction type that stores metadata that might be useful
for debugging purposes. Passes must ignore these instructions when
making decisions.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18903>
2024-08-25 10:26:33 +00:00
Iago Toral Quiroga
ad9ff707ce broadcom: drop backend implementation of nir_op_ufind_msb
We can have NIR do this for us now that we have uclz.

Suggested by Georg Lehmann.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30614>
2024-08-13 13:16:18 +02:00
Iago Toral Quiroga
35a10f5d5a broadcom: implement nir_op_uclz
This enables some algebraic optimizations.

No changes in shader-db, but it does cause some CTS tests to
produce less instructions.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30614>
2024-08-13 13:16:11 +02:00
Alyssa Rosenzweig
c3d999dec9 broadcom: switch to derivative intrinsics
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30570>
2024-08-09 13:54:11 +00:00
Zan Dobersek
7fd5f76393 nir/lower_vars_to_scratch: calculate threshold-limited variable size separately
ir3's lowering of variables to scratch memory has to treat 8-bit values as
16-bit ones when comparing such value's size against the given threshold
since those values are handled through 16-bit half-registers. But those
values can still use natural 8-bit size and alignment for storing inside
scratch memory.

nir_lower_vars_to_scratch now accepts two size-and-alignment functions,
one used for calculating the variable size and the other for calculating
the size and alignment needed for storing inside scratch memory. Non-ir3
uses of this pass can just duplicate the currently-used function. ir3
provides a separate variable-size function that special-cases 8-bit types.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>
2024-08-07 14:32:28 +00:00
Iago Toral Quiroga
086ed1e54b broadcom/compiler: emit instructions producing flags earlier
We usually emit flags right before consuming them but this is
suboptimal from the point of view of register pressure: if an
instruction is only used to generate flags then waiting to emit
it right before reading the flags extends the liveness of the
sources used to generate the flags for no gain. This pass will
check for such instructions and try to move them as early as
possible.

Shader-db results below show this is effective to reduce register
pressure, allowing a few shaders to increase thread counts and/or
reduce spilling:

total instructions in shared programs: 11057173 -> 11057076 (<.01%)
instructions in affected programs: 1955543 -> 1955446 (<.01%)
helped: 4214
HURT: 3905
Inconclusive result (value mean confidence interval includes 0).

total threads in shared programs: 425096 -> 425170 (0.02%)
threads in affected programs: 74 -> 148 (100.00%)
helped: 37
HURT: 0
Threads are helped.

total uniforms in shared programs: 3846275 -> 3845674 (-0.02%)
uniforms in affected programs: 23574 -> 22973 (-2.55%)
helped: 217
HURT: 30
Uniforms are helped.

total max-temps in shared programs: 2222910 -> 2220488 (-0.11%)
max-temps in affected programs: 61904 -> 59482 (-3.91%)
helped: 2145
HURT: 113
Max-temps are helped.

total spills in shared programs: 4294 -> 4280 (-0.33%)
spills in affected programs: 148 -> 134 (-9.46%)
helped: 8
HURT: 0

total fills in shared programs: 6497 -> 6468 (-0.45%)
fills in affected programs: 291 -> 262 (-9.97%)
helped: 8
HURT: 0

total sfu-stalls in shared programs: 14344 -> 14611 (1.86%)
sfu-stalls in affected programs: 1308 -> 1575 (20.41%)
helped: 217
HURT: 335
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 11071517 -> 11071687 (<.01%)
inst-and-stalls in affected programs: 1946767 -> 1946937 (<.01%)
helped: 4191
HURT: 3909
Inconclusive result (value mean confidence interval includes 0).

total nops in shared programs: 270628 -> 269829 (-0.30%)
nops in affected programs: 22032 -> 21233 (-3.63%)
helped: 1213
HURT: 571
Inconclusive result (%-change mean confidence interval includes 0).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30511>
2024-08-07 09:28:39 +02:00
Daniel Stone
e05415a82e format: Generate endian-independent format aliases
Instead of having a hardcoded list of endian-independent format aliases
in the header, generate them from the format definitions.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>
2024-07-19 13:50:42 +00:00