Commit graph

167 commits

Author SHA1 Message Date
Lionel Landwerlin
c467444670 brw/nir: use a new intrinsic for fs_msaa_flag
Avoid NIR code doing offset computations.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34109>
2025-05-08 06:48:34 +00:00
Lionel Landwerlin
c434050a00 brw: add pre ray trace intrinsic moves
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Some intrinsics are implemented by reading memory location that could
be rewritten by a further tracing calls. So we need to move those
reads prior to tracing operations in the shaders.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8979
Tested-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34214>
2025-05-06 13:34:53 +00:00
Ian Romanick
07dc1d4043 brw/algebraic: Clear condition modifier on optimized SEL instruction
The condition modifier on SEL means something completely different than
it means on MOV.  On MOV it means to modify the flags based on the value
written to the destination. On SEL it means to compare the sources using
that mode and pick the result (i.e., as min() or max()) without
modifying the flags.

The resulting MOV should not have a condition modifier for the same
reason it (already) doesn't have a predicate. This bug was found by
inspection, so I added a unit test.

No shader-db or shader-db changes on any Intel platform.

Fixes: fab92fa1cb ("i965/fs: Optimize SEL with the same sources into a MOV.")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34192>
2025-04-15 23:59:31 +00:00
Caio Oliveira
cfc4067b0e brw: Add a few basic tests for register coalesce
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34253>
2025-04-09 19:06:48 +00:00
Faith Ekstrand
436f175187 intel/compiler: Use nir_split_conversions()
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34266>
2025-04-07 17:45:21 -05:00
Ian Romanick
2d13acf9d9 brw: Add passes to generate and lower load_reg
v2: Add support for WE_all instructions... this already just worked, so
I only had to delete the check and the FINISHME comment.

v3: Use logic more like def_analysis::update_for_reads to determine when
to not insert LOAD_REG instructions. Based on a suggestion by Ken.

v4: Eliminate "store" from all the names since STORE_REG does not exist
anymore. Fold insert_load_reg into brw_insert_load_reg. Elminate extra
call to s.def_analysis.require() after progress. Pull a loop-invariant
check out of the inst->srouces loop. Drop call to
brw_opt_split_virtual_grfs after lowering load_reg. All suggested by
Caio.

v5: Assert that LOAD_REG doesn't already exist in
brw_insert_load_reg. Update comment before fully_defines. Both
suggested by Caio.

v6: Don't explicitly special-case SHADER_OPCODE_MEMORY_STORE_LOGICAL.
Move the inst->dst.file != VGRF check earlier to avoid the loop over
sources. Both suggested by Ken. Move the call the brw_insert_load_reg
a little bit later, and explain why it's at that location. Suggested
by Caio.

v7: Many changes to the for-each-source loop in brw_insert_load_reg.
Removes incorrect multiplication of s.alloc.sizes with reg_unit. Adds
checks for matching SIMD size and NoMask in the search for pre-existing
LOAD_REG of same value.

v8: Add some unit tests. Suggested by Caio.

shader-db:

Lunar Lake
total instructions in shared programs: 16923237 -> 16921895 (<.01%)
instructions in affected programs: 450565 -> 449223 (-0.30%)
helped: 251 / HURT: 377

total cycles in shared programs: 910428418 -> 889920590 (-2.25%)
cycles in affected programs: 719248184 -> 698740356 (-2.85%)
helped: 9076 / HURT: 9082

total fills in shared programs: 2242 -> 2218 (-1.07%)
fills in affected programs: 116 -> 92 (-20.69%)
helped: 2 / HURT: 0

total sends in shared programs: 848635 -> 848421 (-0.03%)
sends in affected programs: 810 -> 596 (-26.42%)
helped: 10 / HURT: 0

LOST:   82
GAINED: 78

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19875784 -> 19871694 (-0.02%)
instructions in affected programs: 1050091 -> 1046001 (-0.39%)
helped: 251 / HURT: 2403

total cycles in shared programs: 905328238 -> 882446458 (-2.53%)
cycles in affected programs: 682736344 -> 659854564 (-3.35%)
helped: 7869 / HURT: 7911

total spills in shared programs: 5512 -> 5032 (-8.71%)
spills in affected programs: 1830 -> 1350 (-26.23%)
helped: 8 / HURT: 0

total fills in shared programs: 5648 -> 4782 (-15.33%)
fills in affected programs: 3312 -> 2446 (-26.15%)
helped: 8 / HURT: 0

total sends in shared programs: 1032942 -> 1032722 (-0.02%)
sends in affected programs: 572 -> 352 (-38.46%)
helped: 10 / HURT: 0

LOST:   138
GAINED: 53

Tiger Lake
total instructions in shared programs: 19711930 -> 19715591 (0.02%)
instructions in affected programs: 1040623 -> 1044284 (0.35%)
helped: 317 / HURT: 2474

total cycles in shared programs: 862988990 -> 860573870 (-0.28%)
cycles in affected programs: 612392461 -> 609977341 (-0.39%)
helped: 7447 / HURT: 7686

total sends in shared programs: 1034763 -> 1034555 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0

LOST:   56
GAINED: 143

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20545461 -> 20545220 (<.01%)
instructions in affected programs: 422405 -> 422164 (-0.06%)
helped: 180 / HURT: 459

total cycles in shared programs: 872697345 -> 866874523 (-0.67%)
cycles in affected programs: 573117917 -> 567295095 (-1.02%)
helped: 6783 / HURT: 6980

total spills in shared programs: 4335 -> 4336 (0.02%)
spills in affected programs: 90 -> 91 (1.11%)
helped: 1 / HURT: 2

total fills in shared programs: 4194 -> 4196 (0.05%)
fills in affected programs: 463 -> 465 (0.43%)
helped: 1 / HURT: 2

total sends in shared programs: 1079446 -> 1079238 (-0.02%)
sends in affected programs: 784 -> 576 (-26.53%)
helped: 8 / HURT: 0

LOST:   117
GAINED: 37

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209708136 -> 209695617 (-0.01%); split: -0.02%, +0.01%
Send messages: 10927753 -> 10927640 (-0.00%)
Cycle count: 30540172048 -> 30427084732 (-0.37%); split: -0.99%, +0.62%
Spill count: 511621 -> 510932 (-0.13%); split: -0.22%, +0.08%
Fill count: 621166 -> 618440 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35574784 -> 35648512 (+0.21%); split: -0.06%, +0.26%
Max live registers: 65453860 -> 65453140 (-0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 75374990 -> 35195764 (-53.31%)

Totals from 503284 (71.25% of 706391) affected shaders:
Instrs: 180203778 -> 180191259 (-0.01%); split: -0.02%, +0.01%
Send messages: 9699732 -> 9699619 (-0.00%)
Cycle count: 30080349592 -> 29967262276 (-0.38%); split: -1.01%, +0.63%
Spill count: 511584 -> 510895 (-0.13%); split: -0.22%, +0.08%
Fill count: 621120 -> 618394 (-0.44%); split: -0.56%, +0.12%
Scratch Memory Size: 35443712 -> 35517440 (+0.21%); split: -0.06%, +0.27%
Max live registers: 52566092 -> 52565372 (-0.00%); split: -0.01%, +0.00%
Non SSA regs after NIR: 70110949 -> 29931723 (-57.31%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31497>
2025-04-04 06:45:02 +00:00
Lionel Landwerlin
67ae49dede intel: move lower_texture to brw
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33138>
2025-03-29 02:15:18 +00:00
Caio Oliveira
91feef40db brw: Simplify the test code for brw passes
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The key change is to use a builder to write the expected shader result
and compare that.  To make this less error prone, a few helper functions
were added

- a way to allocate VGRFs from both shaders in parallel, that way the
  same brw_reg can be used in both of them;
- assertions that a pass will make progress or not, and proper output
  when the unexpected happens;
- use a common brw_shader_pass_test class so to collect some of the helpers;
- make some helpers work directly with builder.

The idea is to improve the signal in tests, so that the disasm comments
are not necessary anymore.  For example

```
TEST_F(saturate_propagation_test, basic)
{
   brw_reg dst1 = bld.vgrf(BRW_TYPE_F);
   brw_reg src0 = bld.vgrf(BRW_TYPE_F);
   brw_reg src1 = bld.vgrf(BRW_TYPE_F);
   brw_reg dst0 = bld.ADD(src0, src1);
   set_saturate(true, bld.MOV(dst1, dst0));

   /* = Before =
    *
    * 0: add(16)       dst0  src0  src1
    * 1: mov.sat(16)   dst1  dst0
    *
    * = After =
    * 0: add.sat(16)   dst0  src0  src1
    * 1: mov(16)       dst1  dst0
    */

   brw_calculate_cfg(*v);
   bblock_t *block0 = v->cfg->blocks[0];

   EXPECT_EQ(0, block0->start_ip);
   EXPECT_EQ(1, block0->end_ip);

   EXPECT_TRUE(saturate_propagation(v));
   EXPECT_EQ(0, block0->start_ip);
   EXPECT_EQ(1, block0->end_ip);
   EXPECT_EQ(BRW_OPCODE_ADD, instruction(block0, 0)->opcode);
   EXPECT_TRUE(instruction(block0, 0)->saturate);
   EXPECT_EQ(BRW_OPCODE_MOV, instruction(block0, 1)->opcode);
   EXPECT_FALSE(instruction(block0, 1)->saturate);
}
```

becomes

```
TEST_F(saturate_propagation_test, basic)
{
   brw_builder bld = make_shader(MESA_SHADER_FRAGMENT, 16);
   brw_builder exp = make_shader(MESA_SHADER_FRAGMENT, 16);

   brw_reg dst0 = vgrf(bld, exp, BRW_TYPE_F);
   brw_reg dst1 = vgrf(bld, exp, BRW_TYPE_F);
   brw_reg src0 = vgrf(bld, exp, BRW_TYPE_F);
   brw_reg src1 = vgrf(bld, exp, BRW_TYPE_F);

   bld.ADD(dst0, src0, src1);
   bld.MOV(dst1, dst0)->saturate = true;

   EXPECT_PROGRESS(brw_opt_saturate_propagation, bld);

   exp.ADD(dst0, src0, src1)->saturate = true;
   exp.MOV(dst1, dst0);

   EXPECT_SHADERS_MATCH(bld, exp);
}
```

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33936>
2025-03-13 17:43:17 +00:00
Sagar Ghuge
1bfe2571f5 intel/compiler: Lower sample index into coord for MSRT messages
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32690>
2025-03-07 23:06:14 +00:00
Lionel Landwerlin
ce7208c3ee brw: add support for texel address lowering
The expectations are :
  - no MSAA images
  - a single tiling mode is used when not linear

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32676>
2025-02-23 15:16:50 +00:00
Caio Oliveira
ace5daabbd intel/compiler: Use -Werror=vla
Acked-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32965>
2025-02-11 11:25:48 +00:00
Caio Oliveira
352a63122f intel/brw: Rename files brw_fs.cpp/h to brw_shader.cpp/h
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>
2025-02-11 09:13:28 +00:00
Caio Oliveira
6b471e4e26 intel/brw: Merge brw_fs_visitor.cpp into brw_fs.cpp
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>
2025-02-11 09:13:28 +00:00
Caio Oliveira
f8a979466b intel/brw: Rename and move thread_payload types to own header
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32536>
2025-02-11 09:13:28 +00:00
Caio Oliveira
b50c925bd6 intel/brw: Fold simple_allocator into the shader
This was originally turned into a separate struct for reuse between vec4
and fs backends, that's not needed anymore.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33334>
2025-02-06 08:33:03 -08:00
Caio Oliveira
e2f354587d intel/brw: Merge brw_ir_analysis.h into brw_analysis.h
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33048>
2025-02-05 21:47:07 +00:00
Caio Oliveira
c943fb0c20 intel/brw: Move analysis passes without own file to brw_analysis.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33048>
2025-02-05 21:47:06 +00:00
Caio Oliveira
0ebb75743d intel/brw: Use brw_analysis prefix for performance analysis files
Move declaration to the common header and rename definition file.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33048>
2025-02-05 21:47:06 +00:00
Caio Oliveira
6a23749332 intel/brw: Use brw_analysis prefix for def analysis file
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33048>
2025-02-05 21:47:06 +00:00
Caio Oliveira
e0614e8ea1 intel/brw: Use brw_analysis prefix for liveness analysis files
Move declaration to the common header and rename definition file.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33048>
2025-02-05 21:47:06 +00:00
Caio Oliveira
e5369540ea intel/brw: Add brw_analysis.h
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33048>
2025-02-05 21:47:06 +00:00
Caio Oliveira
1332d84500 intel/brw: Rename file brw_fs_nir.cpp to brw_from_nir.cpp
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Dylan Baker <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33330>
2025-02-03 23:08:11 +00:00
Caio Oliveira
9b0d359737 intel/brw: Move fs_inst implementation code together
Move them to brw_inst.h/cpp.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33114>
2025-01-31 00:57:20 +00:00
Caio Oliveira
fbacf3761f intel: Add meson option -Dintel-elk
Defaults to true.  When set to false Iris and various tools can be
built without ELK support.  In both cases this means supporting
only Gfx9+.  This option must be true to build Crocus or Hasvk.

This allows skipping re-building ELK when developing for newer platforms
with tools/tests enabled.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11575
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33054>
2025-01-30 00:45:59 +00:00
Lionel Landwerlin
5adac011b8 meson: rework mesa-clc=system handling
In theory you can build a driver using OpenCL kernels with a
-Dmesa-clc=system. That shouldn't require any LLVM/Clang/etc...

But the checks to find the pre-compiled mesa_clc & vtn_bindgen
binaries are in meson files or conditions only triggered if you build
with LLVM (:

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33014>
2025-01-25 03:28:07 +00:00
Lionel Landwerlin
bf8a1e1e71 brw/elk: move internal kernel parsing out of intel_clc
So it can be called internally.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33014>
2025-01-25 03:28:07 +00:00
Caio Oliveira
62dd470d0a intel/brw: Rename brw_fs_reg_allocate.cpp to brw_reg_allocate.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33112>
2025-01-21 07:33:49 -08:00
Caio Oliveira
b3001e4946 intel/brw: Move a few builder helpers to brw_builder.h/cpp
Add brw prefix when necessary.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33108>
2025-01-18 20:48:57 +00:00
Caio Oliveira
f2d4c9db92 intel/brw: Rename brw_fs_builder.h to brw_builder.h
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33076>
2025-01-18 16:12:54 +00:00
Caio Oliveira
3659934862 intel/brw: Add brw_generator.h header
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>
2025-01-17 00:04:41 +00:00
Caio Oliveira
a5a9f42a39 intel/brw: Rename brw_fs_generator.cpp to brw_generator.cpp
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32844>
2025-01-17 00:04:41 +00:00
Caio Oliveira
634daf2827 intel/brw: Rename brw_fs_validate to brw_validate
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32843>
2025-01-13 23:56:22 +00:00
Lionel Landwerlin
8ac7802ac8 brw: move final send lowering up into the IR
Because we do emit the final send message form in code generation, a
lot of emissions look like this :

  add(8)  vgrf0,    u0, 0x100
  mov(1)   a0.1, vgrf0          # emitted by the generator
  send(8)   ...,  a0.1

By moving address register manipulation in the IR, we can get this
down to :

  add(1)  a0.1,   u0, 0x100
  send(8)  ..., a0.1

This reduce register pressure around some send messages by 1 vgrf.

All lost shaders in the below results are fragment SIMD32, due to the
throughput estimator. If turned off, we loose no SIMD32 shaders with
this change.

DG2 results:

  Assassin's Creed Valhalla:
  Totals from 2044 (96.87% of 2110) affected shaders:
  Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00%
  Subgroup size: 23832 -> 23824 (-0.03%)
  Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82%
  Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39%
  Fill count: 2005 -> 1256 (-37.36%)
  Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00%
  Max live registers: 116765 -> 115058 (-1.46%)
  Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67%

  Cyberpunk 2077:
  Totals from 1181 (93.43% of 1264) affected shaders:
  Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01%
  Subgroup size: 13016 -> 13032 (+0.12%)
  Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39%
  Spill count: 12 -> 8 (-33.33%)
  Fill count: 9 -> 6 (-33.33%)

  Dota2:
  Totals from 173 (11.59% of 1493) affected shaders:
  Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34%
  Max live registers: 5787 -> 5779 (-0.14%)
  Max dispatch width: 1344 -> 1152 (-14.29%)

  Hitman3:
  Totals from 5072 (95.39% of 5317) affected shaders:
  Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00%
  Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48%
  Spill count: 3942 -> 3200 (-18.82%)
  Fill count: 10158 -> 8846 (-12.92%)
  Scratch Memory Size: 257024 -> 223232 (-13.15%)
  Max live registers: 328467 -> 324631 (-1.17%)
  Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73%

  Fortnite:
  Totals from 360 (4.82% of 7472) affected shaders:
  Instrs: 778068 -> 777925 (-0.02%)
  Subgroup size: 3128 -> 3136 (+0.26%)
  Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19%
  Max live registers: 50689 -> 50658 (-0.06%)

  Hogwarts Legacy:
  Totals from 1376 (84.00% of 1638) affected shaders:
  Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03%
  Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12%
  Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36%
  Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23%
  Scratch Memory Size: 99328 -> 89088 (-10.31%)
  Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23%
  Max dispatch width: 11848 -> 11920 (+0.61%)

  Metro Exodus:
  Totals from 92 (0.21% of 43072) affected shaders:
  Instrs: 262995 -> 262968 (-0.01%)
  Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25%
  Max live registers: 11152 -> 11140 (-0.11%)

  Red Dead Redemption 2 :
  Totals from 451 (7.71% of 5847) affected shaders:
  Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00%
  Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00%
  Max live registers: 42294 -> 42185 (-0.26%)

  Spiderman Remastered:
  Totals from 6820 (98.02% of 6958) affected shaders:
  Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65%
  Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25%
  Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61%
  Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58%
  Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74%
  Max live registers: 493149 -> 487458 (-1.15%)
  Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20%

  Strange Brigade:
  Totals from 3769 (91.21% of 4132) affected shaders:
  Instrs: 1354476 -> 1321474 (-2.44%)
  Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59%
  Max live registers: 199057 -> 193656 (-2.71%)
  Max dispatch width: 30272 -> 30240 (-0.11%)

  Witcher 3:
  Totals from 25 (2.40% of 1041) affected shaders:
  Instrs: 24621 -> 24606 (-0.06%)
  Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05%
  Max live registers: 1963 -> 1955 (-0.41%)

LNL results:

  Assassin's Creed Valhalla:
  Totals from 1928 (98.02% of 1967) affected shaders:
  Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11%
  Subgroup size: 41264 -> 41280 (+0.04%)
  Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11%
  Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90%
  Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60%
  Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56%
  Max live registers: 205483 -> 202192 (-1.60%)

  Cyberpunk 2077:
  Totals from 1177 (96.40% of 1221) affected shaders:
  Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03%
  Subgroup size: 24912 -> 24944 (+0.13%)
  Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81%
  Spill count: 8 -> 3 (-62.50%)
  Fill count: 6 -> 3 (-50.00%)
  Max live registers: 126922 -> 125472 (-1.14%)

  Dota2:
  Totals from 428 (32.47% of 1318) affected shaders:
  Instrs: 89355 -> 89740 (+0.43%)
  Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55%
  Max live registers: 32863 -> 32847 (-0.05%)

  Fortnite:
  Totals from 5354 (81.72% of 6552) affected shaders:
  Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53%
  Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65%
  Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72%
  Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35%
  Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71%

  Hitman3:
  Totals from 4912 (97.09% of 5059) affected shaders:
  Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00%
  Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55%
  Spill count: 3739 -> 3136 (-16.13%)
  Fill count: 10657 -> 9564 (-10.26%)
  Scratch Memory Size: 373760 -> 318464 (-14.79%)
  Max live registers: 597566 -> 589460 (-1.36%)

  Hogwarts Legacy:
  Totals from 1471 (96.33% of 1527) affected shaders:
  Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05%
  Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68%
  Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95%
  Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83%
  Scratch Memory Size: 251904 -> 217088 (-13.82%)
  Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12%

  Metro Exodus:
  Totals from 18356 (49.81% of 36854) affected shaders:
  Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83%
  Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84%
  Spill count: 595 -> 546 (-8.24%)
  Fill count: 1604 -> 1408 (-12.22%)
  Max live registers: 2086937 -> 2086933 (-0.00%)

  Red Dead Redemption 2:
  Totals from 4171 (79.31% of 5259) affected shaders:
  Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83%
  Subgroup size: 86416 -> 86432 (+0.02%)
  Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53%
  Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59%
  Scratch Memory Size: 401408 -> 385024 (-4.08%)

  Spiderman Remastered:
  Totals from 6639 (98.94% of 6710) affected shaders:
  Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98%
  Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59%
  Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82%
  Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76%
  Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17%
  Max live registers: 918240 -> 906604 (-1.27%)

  Strange Brigade:
  Totals from 3675 (92.24% of 3984) affected shaders:
  Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00%
  Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09%
  Max live registers: 361849 -> 351265 (-2.92%)

  Witcher 3:
  Totals from 13 (46.43% of 28) affected shaders:
  Instrs: 593 -> 660 (+11.30%)
  Cycle count: 28302 -> 28714 (+1.46%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199>
2025-01-11 08:41:42 +00:00
Caio Oliveira
25384dccc0 intel/brw: Remove 'fs' prefix from passes filenames
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>
2025-01-02 18:11:05 +00:00
Caio Oliveira
3ca6fa7487 intel/brw: Gather brw_reg related implementations in brw_reg.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32800>
2024-12-30 18:26:59 +00:00
Caio Oliveira
9caa845e0f intel/brw: Rename brw_inst.h to brw_eu_inst.h
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Mary Guillemard
5ddeea9a62 meson: Add precomp-compiler and install-precomp-compiler options
As Asahi, Intel and soon Panfrost requires an offline compiler for their
respective internal shaders, this commit adds generic new options to
workaround meson current limitations around cross-compillation.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32719>
2024-12-23 15:09:41 +00:00
Kenneth Graunke
6341b3cd87 brw: Combine convergent texture buffer fetches into fewer loads
Borderlands 3 (both DX11 and DX12 renderers) have a common pattern
across many shaders:

  con 32x4 %510 = (uint32)txf %2 (handle), %1191 (0x10) (coord), %1 (0x0) (lod), 0 (texture)
  con 32x4 %512 = (uint32)txf %2 (handle), %1511 (0x11) (coord), %1 (0x0) (lod), 0 (texture)
  ...
  con 32x4 %550 = (uint32)txf %2 (handle), %1549 (0x25) (coord), %1 (0x0) (lod), 0 (texture)
  con 32x4 %552 = (uint32)txf %2 (handle), %1551 (0x26) (coord), %1 (0x0) (lod), 0 (texture)

A single basic block contains piles of texelFetches from a 1D buffer
texture, with constant coordinates.  In most cases, only the .x channel
of the result is read.  So we have something on the order of 28 sampler
messages, each asking for...a single uint32_t scalar value.  Because our
sampler doesn't have any support for convergent block loads (like the
untyped LSC transpose messages for SSBOs)...this means we were emitting
SIMD8/16 (or SIMD16/32 on Xe2) sampler messages for every single scalar,
replicating what's effectively a SIMD1 value to the entire register.
This is hugely wasteful, both in terms of register pressure, and also in
back-and-forth sending and receiving memory messages.

The good news is we can take advantage of our explicit SIMD model to
handle this more efficiently.  This patch adds a new optimization pass
that detects a series of SHADER_OPCODE_TXF_LOGICAL, in the same basic
block, with constant offsets, from the same texture.  It constructs a
new divergent coordinate where each channel is one of the constants
(i.e <10, 11, 12, ..., 26> in the above example).  It issues a new
NoMask divergent texel fetch which loads N useful channels in one go,
and replaces the rest with expansion MOVs that splat the SIMD1 result
back to the full SIMD width.  (These get copy propagated away.)

We can pick the SIMD size of the load independently of the native shader
width as well.  On Xe2, those 28 convergent loads become a single SIMD32
ld message.  On earlier hardware, we use 2 SIMD16 messages.  Or we can
use a smaller size when there aren't many to combine.

In fossil-db, this cuts 27% of send messages in affected shaders, 3-6%
of cycles, 2-3% of instructions, and 8-12% of live registers.  On A770,
this improves performance of Borderlands 3 by roughly 2.5-3.5%.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32573>
2024-12-12 00:05:42 +00:00
Caio Oliveira
9537b62759 intel/brw: Add SHADER_OPCODE_REDUCE
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30496>
2024-10-11 06:40:29 +00:00
Ian Romanick
119801e647 intel/brw: Move fsat instructions closer to the source
Intel GPUs have a saturate destination modifier, and
brw_fs_opt_saturate_propagation tries to replace explicit saturate
operations with this destination modifier. That pass is limited in
several ways. If the source of the explicit saturate is in a different
block or if the source of the explicit saturate is live after the
explicit saturate, brw_fs_opt_saturate_propagation will be unable to
make progress.

This optimization exists to help brw_fs_opt_saturate_propagation make
more progress. It tries to move NIR fsat instructions to the same block
that contains the definition of its source. It does this only in cases
where it will not create additional live values. It also attempts to do
this only in cases where the explicit saturate will ultimiately be
converted to a destination modifier.

v2: Fix metadata_preserve when theres no progress and use
nir_metadata_control_flow when there is progress. All suggested by
Alyssa.

v3: Fix a typo in the file header comment. Noticed by Ken.  Don't
require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead
of open-coding something slightly more specific. Both suggested by Ken.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733645 -> 19733028 (<.01%)
instructions in affected programs: 193300 -> 192683 (-0.32%)
helped: 246
HURT: 1
helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2
helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for instructions value: -2.87 -2.13
95% mean confidence interval for instructions %-change: -0.34% -0.32%
Instructions are helped.

total cycles in shared programs: 916180971 -> 916264656 (<.01%)
cycles in affected programs: 30197180 -> 30280865 (0.28%)
helped: 194
HURT: 142
helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19
helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23%
HURT stats (abs)   min: 1 max: 28058 x̄: 1781.68 x̃: 399
HURT stats (rel)   min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63%
95% mean confidence interval for cycles value: -196.84 694.97
95% mean confidence interval for cycles %-change: -0.17% 1.27%
Inconclusive result (value mean confidence interval includes 0).

fossil-db:

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02%
Max live registers: 32013312 -> 32013549 (+0.00%)
Max dispatch width: 5512304 -> 5512136 (-0.00%)

Totals from 774 (0.12% of 630172) affected shaders:
Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01%
Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30%
Max live registers: 82195 -> 82432 (+0.29%)
Max dispatch width: 6664 -> 6496 (-2.52%)

Ice Lake
Totals:
Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01%
Max live registers: 32471367 -> 32471603 (+0.00%)
Max dispatch width: 5623752 -> 5623712 (-0.00%)

Totals from 733 (0.12% of 635598) affected shaders:
Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01%
Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64%
Max live registers: 72067 -> 72303 (+0.33%)
Max dispatch width: 6216 -> 6176 (-0.64%)

Skylake
Totals:
Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01%
Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00%

Totals from 659 (0.11% of 625670) affected shaders:
Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00%
Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41%
Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:10 -07:00
Kenneth Graunke
c19e5a0a75 intel/brw: Replace predicated break optimization with a simple peephole
We can achieve most of what brw_fs_opt_predicated_break() does with
simple peepholes at NIR -> BRW conversion time.

For predicated break and continue, we can simply look at an IF ... ENDIF
sequence after emitting it.  If there's a single instruction between the
two, and it's a BREAK or CONTINUE, then we can move the predicate from
the IF onto the jump, and delete the IF/ENDIF.  Because we haven't built
the CFG at this stage, we only need to remove them from the linked list
of instructions, which is trivial to do.

For the predicated while optimization, we can rely on the fact that we
already did the predicated break optimization, and simply look for a
predicated BREAK just before the WHILE.  If so, we move the predicate
onto the WHILE, invert it, and remove the BREAK.

There are a few cases where this approach does a worse job than the old
one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks
containing break, and nir_trivialize_registers may decide it needs to
insert movs into those blocks.  So, at NIR -> BRW time, we'll actually
emit some MOVs there, which might have been possible to copy propagate
out after later optimizations.

However, the fossil-db results show that it's still pretty competitive.
For instructions, 1017 shaders were helped (average -1.87 instructions),
while only 62 were hurt (average +2.19 instructions).  In affected
shaders, it was -0.08% for instructions.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Kenneth Graunke
fad63d6483 intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass
With the select peephole gone, this no longer does much of anything.

No instruction changes in fossil-db on Alchemist.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Kenneth Graunke
06e8335e11 intel/brw: Delete the brw_fs_opt_peephole_select() pass
Now that we can handle load_ubo in NIR's peephole select pass, the
backend pass isn't really useful anymore.

fossil-db results on Alchemist show almost no impact:

   Totals:
   Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00%
   Cycles: 12633748945 -> 12633760459 (+0.00%)

   Totals from 261 (0.04% of 630008) affected shaders:
   Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14%
   Cycles: 23947172 -> 23958686 (+0.05%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
2024-08-05 19:17:55 -07:00
Caio Oliveira
bb7f2db5a2 intel/brw: Move printing functions to its own file
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Caio Oliveira
c92b8a802e intel/brw: Move remaining compile stages to their own files
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30169>
2024-07-25 15:37:13 +00:00
Daniel Stone
e05415a82e format: Generate endian-independent format aliases
Instead of having a hardcoded list of endian-independent format aliases
in the header, generate them from the format definitions.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649>
2024-07-19 13:50:42 +00:00
Caio Oliveira
f48b3bee31 intel/brw: Split off assembler logic into library
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30006>
2024-07-12 19:34:23 +00:00
David Heidelberg
68215332a8 build: pass licensing information in SPDX form
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29972>
2024-06-29 12:42:49 -07:00
Ian Romanick
531461d576 intel/brw: Test corner case CSE of ADD3 instructions
When the destination of both instructions is NULL and the conditional
modifier matches, operands_match (by way of instructions_match) will
only test the first two operands. This can result in bad CSE
happening.

This is a very, very narrow edge case.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
2024-06-27 18:34:53 +00:00
Kenneth Graunke
0d144821f0 intel/brw: Add a new def analysis pass
This introduces a new analysis pass that opportunistically looks for
VGRFs which happen to satisfy the SSA definition properties.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
2024-06-18 09:02:25 +00:00