Commit graph

367 commits

Author SHA1 Message Date
Kenneth Graunke
7c579f448f intel/brw: Mark all UBO access with a direct buffer index as speculative
UBO loads with a non-indirect buffer index should be safe to perform
speculatively.  With a direct offset, we may sometimes turn them into
push constants, at which point it's just reading a register with no
cost at all.  Otherwise, we access them via messages that use surface
state, and automatically perform bounds checking.  So we shouldn't have
any issues with reading out of bounds and page faulting, for example.

This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics,
so we can turn simple if's with loads on both sides to bcsels.  In some
cases this can collapse a surprising amount of control flow, allowing
other optimizations to work better.

The i965 OpenGL driver used load_uniform intrinsics, which are allowed
in NIR's peephole select pass.  But iris uses the Gallium NIR pass that
translates uniforms to loads from UBO 0, so we haven't been able to take
advantage of NIR's peephole select pass there.  The backend pass was
still able to handle this to some extent, however.

fossil-db results on Alchemist:

   Totals:
   Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00%
   Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00%
   Send messages: 7416330 -> 7416261 (-0.00%)
   Spill count: 52471 -> 52473 (+0.00%)
   Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00%
   Scratch Memory Size: 3197952 -> 3198976 (+0.03%)

   Totals from 1848 (0.29% of 630003) affected shaders:
   Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02%
   Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03%
   Send messages: 59829 -> 59760 (-0.12%)
   Spill count: 3870 -> 3872 (+0.05%)
   Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02%
   Scratch Memory Size: 174080 -> 175104 (+0.59%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Sushma Venkatesh Reddy
0116430d39 intel/brw: Handle 16-bit sampler return payloads
API requires samplers to return 32-bit even though hardware can handle
16-bit floating point, so we detect that case and make more efficient
use of memory BW. This is helping improve performance of encode and
decode tokens during LLM by at least 5% across multiple platforms.

Thank you Kenneth Graunke for suggesting and guiding me throughout
this implementation.

Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>
2024-07-31 21:26:46 +00:00
Marek Olšák
b2d32ae246 nir: add nir_intrinsic_load_per_primitive_input, split from io_semantics flag
Instead of having 1 bit in nir_io_semantics indicating a per-primitive
FS input, add a dedicated intrinsic for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29895>
2024-07-23 16:13:16 +00:00
Qiang Yu
3151f5ec47 nir: add filter parameter to nir_lower_array_deref_of_vec
To be used by latter commits to limit the lowering to specific
variables.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29799>
2024-07-03 02:06:56 +00:00
Francisco Jerez
e8007c9325 intel/fs/xe2+: Don't lower barycentric load offsets to fixed-point format on Xe2+.
Floating-point offsets work fine in combination with the
floating-point arithmetic we're about to lower these intrinsics into,
and they require less instructions than converting to fixed-point and
then back.  No reason to take the precision/range hit nor the extra
instructions.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
2024-06-27 00:18:00 +00:00
Alyssa Rosenzweig
da752ed7c1 treewide: use nir_def_replace sometimes
Two Coccinelle patches here. Didn't catch nearly as much as I would've liked but
it's a start.

Coccinelle patch:

    @@
    expression intr, repl;
    @@

    -nir_def_rewrite_uses(&intr->def, repl);
    -nir_instr_remove(&intr->instr);
    +nir_def_replace(&intr->def, repl);

Coccinelle patch:

    @@
    identifier intr;
    expression instr, repl;
    @@

    nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
    ...
    -nir_def_rewrite_uses(&intr->def, repl);
    -nir_instr_remove(instr);
    +nir_def_replace(&intr->def, repl);

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom]
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima]
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> [etna]
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com> [r300]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29817>
2024-06-21 15:36:56 +00:00
Alyssa Rosenzweig
15257b65c6 treewide: use nir_metadata_control_flow
Via Coccinelle patch:

    @@
    @@

    -nir_metadata_block_index | nir_metadata_dominance
    +nir_metadata_control_flow

...plus some manual fixups for call sites missed by coccinelle.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Juan A. Suarez Romero <jasuarez@igalia.com> [broadcom]
Acked-by: Vasily Khoruzhick <anarsoul@gmail.com> [lima]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29745>
2024-06-17 16:28:14 -04:00
Ian Romanick
7b7e5cf5d4 nir/algebraic: intel/fs: Optimize some patterns before lowering 64-bit integers
v2: Add some comments explaining some of the nuance of the shift
optimizations. Fix a bug in the shift count calculation of the upper
32-bits. Move the @64 from the variable to the opcode. All suggested
by Jordan.

No shader-db changes on any Intel platform.

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 154507026 -> 154506576 (-0.00%)
Cycle count: 17436298868 -> 17436295016 (-0.00%)
Max live registers: 32635309 -> 32635297 (-0.00%)

Totals from 42 (0.01% of 632575) affected shaders:
Instrs: 5616 -> 5166 (-8.01%)
Cycle count: 133680 -> 129828 (-2.88%)
Max live registers: 1158 -> 1146 (-1.04%)

No fossil-db changes on any other Intel platform.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29148>
2024-05-31 09:13:23 -07:00
Lionel Landwerlin
9a36278475 intel/nir: add printf lowering
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25814>
2024-05-15 13:13:38 +00:00
Ian Romanick
3f151c03af intel/brw: Handle fsign optimization in a NIR algebraic pass
This is a lot less code, and it makes it easier to experiment with other
pattern-based optimizations in the future.

The results here are nearly identical to the results I got from Ken's
"intel/brw: Make fsign (for 16/32-bit) in SSA form"... which are not
particularly good.

In this commit and in Ken's, all of the shader-db shaders hurt for
spills and fills are from Deus Ex Mankind Divided. Each shader has a
bunch of texture instructions with a single fsign between the
blocks. With the dependency on the flag removed, the scheduler puts all
of the texture instructions at the start... and there are a LOT of them.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19647060 -> 19650207 (0.02%)
instructions in affected programs: 734718 -> 737865 (0.43%)
helped: 382 / HURT: 1984

total cycles in shared programs: 823238442 -> 822785913 (-0.05%)
cycles in affected programs: 426901157 -> 426448628 (-0.11%)
helped: 3408 / HURT: 3671

total spills in shared programs: 3887 -> 3891 (0.10%)
spills in affected programs: 256 -> 260 (1.56%)
helped: 0 / HURT: 4

total fills in shared programs: 3236 -> 3306 (2.16%)
fills in affected programs: 882 -> 952 (7.94%)
helped: 0 / HURT: 12

LOST:   37
GAINED: 34

fossil-db:

DG2 and Meteor Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 154005469 -> 154008294 (+0.00%); split: -0.00%, +0.00%
Cycle count: 17551859277 -> 17554293955 (+0.01%); split: -0.02%, +0.04%
Spill count: 142078 -> 142090 (+0.01%)
Fill count: 266761 -> 266729 (-0.01%); split: -0.02%, +0.01%
Max live registers: 32593578 -> 32593858 (+0.00%)
Max dispatch width: 5535944 -> 5536816 (+0.02%); split: +0.02%, -0.01%

Totals from 5867 (0.93% of 631350) affected shaders:
Instrs: 5475544 -> 5478369 (+0.05%); split: -0.04%, +0.09%
Cycle count: 1649032029 -> 1651466707 (+0.15%); split: -0.24%, +0.39%
Spill count: 26411 -> 26423 (+0.05%)
Fill count: 57364 -> 57332 (-0.06%); split: -0.10%, +0.04%
Max live registers: 431561 -> 431841 (+0.06%)
Max dispatch width: 49784 -> 50656 (+1.75%); split: +2.38%, -0.63%

Tiger Lake
Totals:
Instrs: 149530671 -> 149533588 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15261418953 -> 15264764921 (+0.02%); split: -0.00%, +0.03%
Spill count: 60317 -> 60316 (-0.00%); split: -0.02%, +0.01%
Max live registers: 32249201 -> 32249464 (+0.00%)
Max dispatch width: 5540608 -> 5540584 (-0.00%)

Totals from 5862 (0.93% of 630309) affected shaders:
Instrs: 4740800 -> 4743717 (+0.06%); split: -0.04%, +0.10%
Cycle count: 566531248 -> 569877216 (+0.59%); split: -0.13%, +0.72%
Spill count: 11709 -> 11708 (-0.01%); split: -0.09%, +0.08%
Max live registers: 424560 -> 424823 (+0.06%)
Max dispatch width: 50304 -> 50280 (-0.05%)

Ice Lake
Totals:
Instrs: 150499705 -> 150502608 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15105629116 -> 15105425880 (-0.00%); split: -0.00%, +0.00%
Spill count: 60087 -> 60090 (+0.00%)
Fill count: 100542 -> 100541 (-0.00%); split: -0.00%, +0.00%
Max live registers: 32605215 -> 32605495 (+0.00%)
Max dispatch width: 5617752 -> 5617792 (+0.00%); split: +0.00%, -0.00%

Totals from 5882 (0.93% of 634934) affected shaders:
Instrs: 4737206 -> 4740109 (+0.06%); split: -0.04%, +0.10%
Cycle count: 598882104 -> 598678868 (-0.03%); split: -0.08%, +0.05%
Spill count: 10278 -> 10281 (+0.03%)
Fill count: 22504 -> 22503 (-0.00%); split: -0.01%, +0.01%
Max live registers: 424184 -> 424464 (+0.07%)
Max dispatch width: 50216 -> 50256 (+0.08%); split: +0.25%, -0.18%

Skylake
Totals:
Instrs: 139092612 -> 139095257 (+0.00%); split: -0.00%, +0.00%
Cycle count: 14533550285 -> 14533544716 (-0.00%); split: -0.00%, +0.00%
Spill count: 58176 -> 58172 (-0.01%)
Fill count: 95877 -> 95796 (-0.08%)
Max live registers: 31924594 -> 31924874 (+0.00%)
Max dispatch width: 5484568 -> 5484552 (-0.00%); split: +0.00%, -0.00%

Totals from 5789 (0.93% of 625512) affected shaders:
Instrs: 4481987 -> 4484632 (+0.06%); split: -0.04%, +0.10%
Cycle count: 578310124 -> 578304555 (-0.00%); split: -0.05%, +0.05%
Spill count: 9248 -> 9244 (-0.04%)
Fill count: 19677 -> 19596 (-0.41%)
Max live registers: 415340 -> 415620 (+0.07%)
Max dispatch width: 49720 -> 49704 (-0.03%); split: +0.10%, -0.13%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29095>
2024-05-14 01:28:20 +00:00
Kenneth Graunke
873fcdff38 intel/brw: Stop using long BRW_REGISTER_TYPE enum names
s/BRW_REGISTER_TYPE/BRW_TYPE/g

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28847>
2024-04-25 11:41:48 +00:00
Ian Romanick
24cdbbdaa2 intel/brw: Delete stray nir_opt_dce
No shader-db or fossil-db changes on any Intel platform.

Fixes: f76f4be301 ("intel/compiler: move gen5 final pass to actually be final pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Ian Romanick
6377e8fd29 intel/brw: Don't call nir_opt_remove_phis before nir_convert_from_ssa
Per discussion in #10727, removing phis breaks LCSSA form which in turn
invalidates divergence analysis.

shader-db:

All Skylake and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20299612 -> 20299695 (<.01%)
instructions in affected programs: 20829 -> 20912 (0.40%)
helped: 6 / HURT: 13

total cycles in shared programs: 842149085 -> 842148399 (<.01%)
cycles in affected programs: 15146222 -> 15145536 (<.01%)
helped: 40 / HURT: 45

fossil-db:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165505077 -> 165505603 (+0.00%); split: -0.00%, +0.00%
Cycles: 15144183575 -> 15144235695 (+0.00%); split: -0.00%, +0.00%
Spill count: 45213 -> 45220 (+0.02%)
Fill count: 74166 -> 74184 (+0.02%)

Totals from 94 (0.01% of 656116) affected shaders:
Instrs: 263079 -> 263605 (+0.20%); split: -0.00%, +0.20%
Cycles: 28411487 -> 28463607 (+0.18%); split: -0.18%, +0.37%
Spill count: 3474 -> 3481 (+0.20%)
Fill count: 6713 -> 6731 (+0.27%)

Fixes: 6dbb5f1e07 ("intel/fs: rerun divergence analysis prior to convert_from_ssa")
Closes: #10727
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Dylan Baker
75ede9d9bc intel/brw: track last successful pass and leave the loop early
This is similar to what RADV implements using the NIR_LOOP_PASS
helpers. I have not used those helpers for a couple of reasons:

 1. They use the pointer to the optimization function, which doesn't
    work if the same function is called multiple times in one invocation
    of the loop (fixable)
 2. After fixing them, due to Intel's use of sub-expressions, the amount
    of code added to wrap the shared macro becomes more than simply
    reimplementing them for the Intel compiler

On most workloads the results are a wash, but on compile heavy
workloads like Cyberpunk 2077 and Rise of the Tomb Raider, I saw
fossil-db runtimes fall by 1-2% on my ICL, with no changes to the
compiled shaders. Caio saw closer to 2.5% on TGL.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27510>
2024-03-21 23:02:32 +00:00
Alyssa Rosenzweig
a6123a80da nir/opt_shrink_vectors: shrink some intrinsics from start
If the backend supports it, intrinsics with a component() are straightforward to
shrink from the start. Notably helps vectorized I/O.

v2: add an option for this and enable only on grown up backends, because some
backends ignore the component() parameter.

RADV GFX11:
Totals from 921 (1.16% of 79439) affected shaders:
Instrs: 616558 -> 615529 (-0.17%); split: -0.30%, +0.14%
CodeSize: 3099864 -> 3095632 (-0.14%); split: -0.25%, +0.11%
Latency: 2177075 -> 2160966 (-0.74%); split: -0.79%, +0.05%
InvThroughput: 299997 -> 298664 (-0.44%); split: -0.47%, +0.02%
VClause: 16343 -> 16395 (+0.32%); split: -0.01%, +0.32%
SClause: 10715 -> 10714 (-0.01%)
Copies: 24736 -> 24701 (-0.14%); split: -0.37%, +0.23%
PreVGPRs: 30179 -> 30173 (-0.02%)
VALU: 353472 -> 353439 (-0.01%); split: -0.03%, +0.02%
SALU: 40323 -> 40322 (-0.00%)
VMEM: 25353 -> 25352 (-0.00%)

AGX:

total instructions in shared programs: 2038217 -> 2038049 (<.01%)
instructions in affected programs: 10249 -> 10081 (-1.64%)

total alu in shared programs: 1593094 -> 1592939 (<.01%)
alu in affected programs: 7145 -> 6990 (-2.17%)

total fscib in shared programs: 1589254 -> 1589102 (<.01%)
fscib in affected programs: 7217 -> 7065 (-2.11%)

total bytes in shared programs: 13975666 -> 13974722 (<.01%)
bytes in affected programs: 65942 -> 64998 (-1.43%)

total regs in shared programs: 592758 -> 591187 (-0.27%)
regs in affected programs: 6936 -> 5365 (-22.65%)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28004>
2024-03-12 18:17:17 +00:00
Caio Oliveira
865ef36609 intel/brw: Remove brw_shader.h
Find a better home for its existing content.  Some functions are
now just static functions at the usage sites.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27861>
2024-02-29 19:28:06 +00:00
Kenneth Graunke
5fbba530cf intel/brw: Delete compiler->supports_shader_constants
True for all drivers using this compiler.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27872>
2024-02-29 18:00:14 +00:00
Caio Oliveira
63a4a4400a intel/brw: Remove edgeflag_is_last VS parameter
Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
5a3f65e678 intel/brw: Remove unused attrib workarounds
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
d3e451780b intel/brw: Inline brw_nir_apply_sampler_key code
It doesn't use the prog_key anymore, so just move the nir_lower_tex
call pass to the single callsite.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:39 +00:00
Caio Oliveira
a1e694a890 intel/brw: Remove Gfx8- code from NIR passes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:38 +00:00
Caio Oliveira
7c23b90537 intel/brw: Always use scalar shaders
Remove scalar_stage[] array, since now it is always scalar.  This
removes any usage of vec4 shaders in brw.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:37 +00:00
Caio Oliveira
303fd4e935 intel/brw: Move type_size_* functions out of vec4-specific file
Will make easier later to delete vec4 files.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27691>
2024-02-28 05:45:37 +00:00
Ian Romanick
535caaf3e0 nir: Optimize uniform iadd, fadd, and ixor reduction operations
This adds optimizations for iadd, fadd, and ixor with reduce,
inclusive scan, and exclusive scan.

NOTE: The fadd and ixor optimizations had no shader-db or fossil-db
changes on any Intel platform.

NOTE 2: This change "fixes" arb_compute_variable_group_size-local-size
and base-local-size.shader_test on DG2 and MTL. This is just changing
the code path taken to not use whatever path was not working properly
before.

This is a subset of the things optimized by ACO. See also
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3731#note_682802. The
min, max, iand, and ior exclusive_scan optimizations are not
implemented.

Broadwell on shader-db is not happy. I have not investigated.

v2: Silence some warnings about discarding const.

v3: Rename mbcnt to count_active_invocations. Add a big comment
explaining the differences between the two paths. Suggested by Rhys.

shader-db:

All Gfx9 and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20300384 -> 20299545 (<.01%)
instructions in affected programs: 19167 -> 18328 (-4.38%)
helped: 35 / HURT: 0

total cycles in shared programs: 842809750 -> 842766381 (<.01%)
cycles in affected programs: 2160249 -> 2116880 (-2.01%)
helped: 33 / HURT: 2

total spills in shared programs: 4632 -> 4626 (-0.13%)
spills in affected programs: 206 -> 200 (-2.91%)
helped: 3 / HURT: 0

total fills in shared programs: 5594 -> 5581 (-0.23%)
fills in affected programs: 664 -> 651 (-1.96%)
helped: 3 / HURT: 1

fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165551893 -> 165513303 (-0.02%)
Cycles: 15132539132 -> 15125314947 (-0.05%); split: -0.05%, +0.00%
Spill count: 45258 -> 45204 (-0.12%)
Fill count: 74286 -> 74157 (-0.17%)
Scratch Memory Size: 2467840 -> 2451456 (-0.66%)

Totals from 712 (0.11% of 656120) affected shaders:
Instrs: 598931 -> 560341 (-6.44%)
Cycles: 184650167 -> 177425982 (-3.91%); split: -3.95%, +0.04%
Spill count: 983 -> 929 (-5.49%)
Fill count: 2274 -> 2145 (-5.67%)
Scratch Memory Size: 52224 -> 35840 (-31.37%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 09:44:11 -08:00
Ian Romanick
c63ea755fe intel/fs: Use nir_opt_uniform_subgroup
shader-db:

All Skylake and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20300435 -> 20300384 (<.01%)
instructions in affected programs: 303 -> 252 (-16.83%)
helped: 2 / HURT: 0

total cycles in shared programs: 842810326 -> 842809750 (<.01%)
cycles in affected programs: 8374 -> 7798 (-6.88%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms (note below) had similar results. (Ice Lake shown)
Instrs: 165559735 -> 165551893 (-0.00%)
Cycles: 15133083961 -> 15132539132 (-0.00%); split: -0.00%, +0.00%
Spill count: 45262 -> 45258 (-0.01%)
Fill count: 74293 -> 74286 (-0.01%)

Totals from 854 (0.13% of 656120) affected shaders:
Instrs: 3461998 -> 3454156 (-0.23%)
Cycles: 154252729 -> 153707900 (-0.35%); split: -0.36%, +0.01%
Spill count: 2655 -> 2651 (-0.15%)
Fill count: 3881 -> 3874 (-0.18%)

DG2 did not see changes in spills or fills.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:38:45 -08:00
Ian Romanick
b22fff90d5 intel/fs: Enable nir_opt_uniform_atomics in all shader stages
The problem seems to have been related to
nir_intrinsic_load_global_block_intel being marked as non-divergent.

No shader-db or fossil-db changes on any Intel platform.

v2: Rebase on splitting ELK from BRW. Remove devinfo->ver >= 8 check.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27044>
2024-02-27 08:37:05 -08:00
Sagar Ghuge
269d2c4a3f intel/compiler: Enable packing of offset with LOD or Bias
Move intel_nir_lower_texture just before nir_lower_tex since we need to
operate on the offset and those are getting lowerd.

v2: (Ian)
- Rename variable name to intel_tex_options

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27447>
2024-02-27 00:22:46 +00:00
Caio Oliveira
d8f9a05f32 intel/compiler: Rename the passes and files related to intel_nir.h
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27644>
2024-02-16 22:35:05 +00:00
Caio Oliveira
dc76cfc781 intel/compiler: Collect NIR-only passes in intel_nir.h
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27644>
2024-02-16 22:35:05 +00:00
Caio Oliveira
c5b80de583 intel/compiler: Rename brw_vue_map to intel_vue_map
And move to the intel_shader_enums.h file.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27475>
2024-02-14 22:31:23 -08:00
Lionel Landwerlin
2437556d83 intel/fs: rerun divergence prior to lowering non-uniform interpolate at sample
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 74a40cc4b6 ("intel/fs: move lower of non-uniform at_sample barycentric to NIR")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:44 +00:00
Sagar Ghuge
98b62434bd intel/compiler: Lower texture operation to combine LOD and AI
We have to push the lowering of texture operations a bit further in
pipeline since nir_lower_tex gets invoked twice and if there is no LOD
source present, nir_lower_tex adds that as a source. Once that's all
done we can easily combine the LOD and array index into a single 32-bit
value.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
2024-02-12 21:25:48 +00:00
Sagar Ghuge
15129c7634 intel/compiler: Use nir_tex_src_backend1 to pack LOD and array index
Since this lowering is totally Intel specific, we don't have to
introduce the new texture source. We can use the nir_tex_src_backend1
source to pack LOD/LOD Bias and array index into 32 bit single value.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
2024-02-12 21:25:48 +00:00
Ian Romanick
84de7a88d3 intel/compiler/xe2: Emit texture instructions w/ combined LOD and array index
The extra assertions are just there to help validate
pack_lod_and_array_index (in nir_lower_tex.c).

v2: Split got_lod_or_bias into two variables. This simplifies some
changes that Sagar is working on. Suggested by Sagar.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
2024-02-02 02:39:10 +00:00
Caio Oliveira
4af079960d intel/compiler: Enable lower_rotate_to_shuffle in subgroup lowering
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27272>
2024-01-25 19:07:42 +00:00
Daniel Schürmann
a3ed36da1a treewide: replace calls to nir_opt_trivial_continues() with nir_opt_loop()
Totals from 850 (1.11% of 76636) affected shaders: (RADV, GFX11)
MaxWaves: 18134 -> 18130 (-0.02%)
Instrs: 3011298 -> 3008585 (-0.09%); split: -0.17%, +0.08%
CodeSize: 15836804 -> 15841972 (+0.03%); split: -0.09%, +0.12%
VGPRs: 63580 -> 63604 (+0.04%)
SpillSGPRs: 966 -> 1148 (+18.84%); split: -0.83%, +19.67%
Latency: 36102291 -> 30186144 (-16.39%); split: -16.41%, +0.02%
InvThroughput: 9058100 -> 7011821 (-22.59%); split: -22.61%, +0.02%
VClause: 65369 -> 65364 (-0.01%); split: -0.03%, +0.02%
SClause: 100309 -> 100305 (-0.00%); split: -0.04%, +0.04%
Copies: 335658 -> 336472 (+0.24%); split: -0.70%, +0.94%
Branches: 110806 -> 108945 (-1.68%); split: -1.94%, +0.26%
PreSGPRs: 73476 -> 73934 (+0.62%); split: -0.25%, +0.87%
PreVGPRs: 58809 -> 58840 (+0.05%); split: -0.01%, +0.06%

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24940>
2024-01-03 20:48:04 +00:00
Dave Airlie
f76f4be301 intel/compiler: move gen5 final pass to actually be final pass
This got broken by the register conversion, this pass needs to be
after all the others.

Fixes: ce75c3c3fe ("intel: Switch to intrinsic-based registers")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26731>
2023-12-18 07:24:37 +00:00
Lionel Landwerlin
6dbb5f1e07 intel/fs: rerun divergence analysis prior to convert_from_ssa
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9964
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26235>
2023-11-17 06:40:49 +00:00
Rhys Perry
f695a9fed2 intel/compiler: use nir_lower_fp16_casts
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25566>
2023-11-16 11:02:31 +00:00
Caio Oliveira
d2125dac85 intel/compiler: Take more precise params in brw_nir_optimize()
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>
2023-11-08 18:10:31 +00:00
Caio Oliveira
c4be90b4ba intel/compiler: Remove unused parameter from brw_nir_adjust_payload()
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25986>
2023-11-08 18:10:31 +00:00
Iván Briano
54498937c5 intel/compiler: round f2f16 correctly for RTNE case
v2: bcsel -> b2i32 (Ian)

Fixes upcoming Vulkan CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up
dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage_vert
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_frag
dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp64_up_nostorage_frag

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281>
2023-10-09 23:37:52 +00:00
Connor Abbott
4282386311 nir/spirv: Add inverse_ballot intrinsic
This is actually a no-op on AMD, so we really don't want to lower it to
something more complicated.  There may be a more efficient way to do
this on Intel too. In addition, in the future we'll want to use this for
lowering boolean reduce operations, where the inverse ballot will
operate on the backend's "natural" ballot type as indicated by
options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In
total, there are now three possible lowerings we may have to perform:

- inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot
with natural source type, when the backend supports inverse_ballot
natively.
- inverse_ballot with source type of uvec4 from SPIR-V to arithmetic,
when the backend doesn't support inverse_ballot.
- inverse_ballot with natural source type from reduce operation, when
the backend doesn't support inverse_ballot.

Previously we just did the second lowering unconditionally in vtn, but
it's just a combination of the first and third. We add support here for
the first and third lowerings in nir_lower_subgroups, instead of simply
moving the second lowering, to avoid unnecessary churn.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
2023-09-20 14:41:18 +00:00
Pavel Ondračka
1c72c71bdf nir/move_vec_src_uses_to_dest: allow to skip reuse of constant sources
And enable this for r300 and intel-vec4

crocus HSW (mostly helps few doplhin ubershaders):
total instructions in shared programs: 1576736 -> 1576589 (<.01%)
instructions in affected programs: 38235 -> 38088 (-0.38%)
helped: 12
HURT: 0
total cycles in shared programs: 111025838 -> 110944796 (-0.07%)
cycles in affected programs: 5646582 -> 5565540 (-1.44%)
helped: 15
HURT: 6
total spills in shared programs: 447 -> 432 (-3.36%)
spills in affected programs: 186 -> 171 (-8.06%)
helped: 12
HURT: 0
total fills in shared programs: 792 -> 774 (-2.27%)
fills in affected programs: 291 -> 273 (-6.19%)
helped: 12
HURT: 0

r300 RV530:
total instructions in shared programs: 96655 -> 96304 (-0.36%)
instructions in affected programs: 15020 -> 14669 (-2.34%)
helped: 79
HURT: 18
total temps in shared programs: 13027 -> 12952 (-0.58%)
temps in affected programs: 677 -> 602 (-11.08%)
helped: 41
HURT: 9
total cycles in shared programs: 147745 -> 147314 (-0.29%)
cycles in affected programs: 21831 -> 21400 (-1.97%)
helped: 84
HURT: 19

r300 RV370:
total instructions in shared programs: 63678 -> 63669 (-0.01%)
instructions in affected programs: 931 -> 922 (-0.97%)
helped: 12
HURT: 6
total temps in shared programs: 10028 -> 10013 (-0.15%)
temps in affected programs: 339 -> 324 (-4.42%)
helped: 33
HURT: 10
total cycles in shared programs: 101118 -> 101087 (-0.03%)
cycles in affected programs: 2659 -> 2628 (-1.17%)
helped: 22
HURT: 6

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24932>
2023-09-19 18:05:37 +02:00
Alyssa Rosenzweig
d1eb17e92e treewide: Drop nir_ssa_for_src users
Via Coccinelle patch:

    @@
    expression b, s, n;
    @@

    -nir_ssa_for_src(b, *s, n)
    +s->ssa

    @@
    expression b, s, n;
    @@

    -nir_ssa_for_src(b, s, n)
    +s.ssa

Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25247>
2023-09-18 10:25:17 -04:00
Ian Romanick
5eddf60e56 intel/compiler: Combine control barriers with identical memory semantics
This prevents the second barrier generating a spurious, identical fence
message as the first barrier.

fossil-db stats on Alchemist:

   Totals:
   Instrs: 196513342 -> 196512777 (-0.00%); split: -0.00%, +0.00%
   Cycles: 14271426028 -> 14271404569 (-0.00%); split: -0.00%, +0.00%
   Send messages: 8021892 -> 8021770 (-0.00%)

   Totals from 46 (0.01% of 653252) affected shaders:
   Instrs: 76761 -> 76196 (-0.74%); split: -0.75%, +0.01%
   Cycles: 2027946 -> 2006487 (-1.06%); split: -1.45%, +0.39%
   Send messages: 7589 -> 7467 (-1.61%)

Nothing in shader-db was affected.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842>
2023-09-09 04:41:25 +00:00
Lionel Landwerlin
10e75aae1b intel/nir: rerun lower_tex if it lowers something
nir_lower_tex can lower tg4 coords into tg4 offset which on DG2+ we
also need to lower into constant offsets.

Unfortunately the nir_lower_tex pass is not able to lower the
instructions it itself generates, so the easy fix for when
nir_lower_tex lowers tg4 coords into tg4 offsets is to rerun the pass.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9735
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25015>
2023-09-05 13:35:51 +00:00
Lionel Landwerlin
74a40cc4b6 intel/fs: move lower of non-uniform at_sample barycentric to NIR
We use a non-uniform lowering loop in the backend which we can do
better in NIR because we can also use divergence analysis there.

This change also limits VGRF usage to a single VGRF to hold the sample
ID in the backend.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24716>
2023-08-29 23:19:13 +00:00
Alyssa Rosenzweig
cda1961835 treewide: Also handle struct nir_builder form
Via Coccinelle patch:

    @def@
    typedef bool;
    typedef nir_builder;
    typedef nir_instr;
    typedef nir_def;
    identifier fn, instr, intr, x, builder, data;
    @@

    static fn(struct nir_builder* builder,
    -nir_instr *instr,
    +nir_intrinsic_instr *intr,
    ...)
    {
    (
    -   if (instr->type != nir_instr_type_intrinsic)
    -      return false;
    -   nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
    |
    -   nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
    -   if (instr->type != nir_instr_type_intrinsic)
    -      return false;
    )

    <...
    (
    -instr->x
    +intr->instr.x
    |
    -instr
    +&intr->instr
    )
    ...>

    }

    @pass depends on def@
    identifier def.fn;
    expression shader, progress;
    @@

    (
    -nir_shader_instructions_pass(shader, fn,
    +nir_shader_intrinsics_pass(shader, fn,
    ...)
    |
    -NIR_PASS_V(shader, nir_shader_instructions_pass, fn,
    +NIR_PASS_V(shader, nir_shader_intrinsics_pass, fn,
    ...)
    |
    -NIR_PASS(progress, shader, nir_shader_instructions_pass, fn,
    +NIR_PASS(progress, shader, nir_shader_intrinsics_pass, fn,
    ...)
    )

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24852>
2023-08-24 15:48:02 +00:00
Alyssa Rosenzweig
465b138f01 treewide: Use nir_shader_intrinsic_pass sometimes
This converts a lot of trivial passes. Nice boilerplate deletion. Via Coccinelle
patch (with a small manual fix-up for panfrost where coccinelle got confused by
genxml + ninja clang-format squashed in, and for Zink because my semantic patch
was slightly buggy).

    @def@
    typedef bool;
    typedef nir_builder;
    typedef nir_instr;
    typedef nir_def;
    identifier fn, instr, intr, x, builder, data;
    @@

    static fn(nir_builder* builder,
    -nir_instr *instr,
    +nir_intrinsic_instr *intr,
    ...)
    {
    (
    -   if (instr->type != nir_instr_type_intrinsic)
    -      return false;
    -   nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
    |
    -   nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr);
    -   if (instr->type != nir_instr_type_intrinsic)
    -      return false;
    )

    <...
    (
    -instr->x
    +intr->instr.x
    |
    -instr
    +&intr->instr
    )
    ...>

    }

    @pass depends on def@
    identifier def.fn;
    expression shader, progress;
    @@

    (
    -nir_shader_instructions_pass(shader, fn,
    +nir_shader_intrinsics_pass(shader, fn,
    ...)
    |
    -NIR_PASS_V(shader, nir_shader_instructions_pass, fn,
    +NIR_PASS_V(shader, nir_shader_intrinsics_pass, fn,
    ...)
    |
    -NIR_PASS(progress, shader, nir_shader_instructions_pass, fn,
    +NIR_PASS(progress, shader, nir_shader_intrinsics_pass, fn,
    ...)
    )

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24852>
2023-08-24 15:48:02 +00:00