Commit graph

13226 commits

Author SHA1 Message Date
Ian Romanick
db2b1e4d76 brw/nir: Treat load_btd_{global,local}_arg_addr_intel and load_btd_shader_type_intel as convergent
No shader-db changes on any Intel platform. No fossil-db changes on
Tiger Lake, Ice Lake, or Skylake.

fossil-db:

Lunar Lake
Totals:
Instrs: 141808714 -> 141808513 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22177889310 -> 22181410192 (+0.02%); split: -0.00%, +0.02%
Spill count: 69892 -> 69890 (-0.00%); split: -0.01%, +0.01%
Fill count: 128313 -> 128331 (+0.01%)
Max live registers: 48052083 -> 48052742 (+0.00%); split: -0.00%, +0.00%

Totals from 549 (0.10% of 551446) affected shaders:
Instrs: 911251 -> 911050 (-0.02%); split: -0.10%, +0.07%
Cycle count: 1244153266 -> 1247674148 (+0.28%); split: -0.04%, +0.32%
Spill count: 15849 -> 15847 (-0.01%); split: -0.04%, +0.03%
Fill count: 35087 -> 35105 (+0.05%)
Max live registers: 68047 -> 68706 (+0.97%); split: -0.25%, +1.22%

Meteor Lake
Totals:
Instrs: 152744298 -> 152741241 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17410258529 -> 17405949054 (-0.02%); split: -0.04%, +0.01%
Spill count: 78528 -> 78598 (+0.09%); split: -0.01%, +0.09%
Fill count: 147893 -> 147978 (+0.06%); split: -0.00%, +0.06%
Scratch Memory Size: 3962880 -> 3969024 (+0.16%)
Max live registers: 31887206 -> 31887413 (+0.00%); split: -0.00%, +0.00%

Totals from 552 (0.09% of 633315) affected shaders:
Instrs: 907279 -> 904222 (-0.34%); split: -0.48%, +0.15%
Cycle count: 1152358569 -> 1148049094 (-0.37%); split: -0.56%, +0.19%
Spill count: 15290 -> 15360 (+0.46%); split: -0.03%, +0.48%
Fill count: 35313 -> 35398 (+0.24%); split: -0.02%, +0.26%
Scratch Memory Size: 1313792 -> 1319936 (+0.47%)
Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08%

DG2
Totals:
Instrs: 152766492 -> 152763061 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17406058608 -> 17406396943 (+0.00%); split: -0.02%, +0.02%
Spill count: 78626 -> 78624 (-0.00%); split: -0.01%, +0.01%
Fill count: 147956 -> 148007 (+0.03%); split: -0.01%, +0.04%
Scratch Memory Size: 3962880 -> 3969024 (+0.16%)
Max live registers: 31887158 -> 31887365 (+0.00%); split: -0.00%, +0.00%

Totals from 552 (0.09% of 633315) affected shaders:
Instrs: 908513 -> 905082 (-0.38%); split: -0.47%, +0.09%
Cycle count: 1148162185 -> 1148500520 (+0.03%); split: -0.23%, +0.26%
Spill count: 15364 -> 15362 (-0.01%); split: -0.07%, +0.06%
Fill count: 35343 -> 35394 (+0.14%); split: -0.03%, +0.17%
Scratch Memory Size: 1313792 -> 1319936 (+0.47%)
Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
f3593df877 brw/nir: Treat load_reloc_const_intel as convergent
shader-db:

Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
Lunar Lake
total instructions in shared programs: 18096549 -> 18096537 (<.01%)
instructions in affected programs: 26128 -> 26116 (-0.05%)
helped: 7 / HURT: 2

total cycles in shared programs: 922073090 -> 922093922 (<.01%)
cycles in affected programs: 10574198 -> 10595030 (0.20%)
helped: 19 / HURT: 76

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20503943 -> 20504053 (<.01%)
instructions in affected programs: 23378 -> 23488 (0.47%)
helped: 6 / HURT: 5

total cycles in shared programs: 875477036 -> 875480112 (<.01%)
cycles in affected programs: 13840528 -> 13843604 (0.02%)
helped: 22 / HURT: 55

total spills in shared programs: 4546 -> 4552 (0.13%)
spills in affected programs: 8 -> 14 (75.00%)
helped: 0 / HURT: 1

total fills in shared programs: 5280 -> 5298 (0.34%)
fills in affected programs: 24 -> 42 (75.00%)
helped: 0 / HURT: 1

One compute shader in Tomb Raider was hurt for spills and fills.

fossil-db:

Lunar Lake
Totals:
Instrs: 141808815 -> 141808714 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22185066952 -> 22177889310 (-0.03%); split: -0.05%, +0.02%
Spill count: 69859 -> 69892 (+0.05%); split: -0.03%, +0.07%
Fill count: 128344 -> 128313 (-0.02%); split: -0.04%, +0.01%
Scratch Memory Size: 5833728 -> 5829632 (-0.07%)

Totals from 13384 (2.43% of 551446) affected shaders:
Instrs: 13852162 -> 13852061 (-0.00%); split: -0.00%, +0.00%
Cycle count: 7691993336 -> 7684815694 (-0.09%); split: -0.15%, +0.06%
Spill count: 53266 -> 53299 (+0.06%); split: -0.03%, +0.10%
Fill count: 96492 -> 96461 (-0.03%); split: -0.05%, +0.02%
Scratch Memory Size: 3827712 -> 3823616 (-0.11%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152744735 -> 152744298 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17400199290 -> 17410258529 (+0.06%); split: -0.01%, +0.07%
Max live registers: 31887208 -> 31887206 (-0.00%)

Totals from 12435 (1.96% of 633315) affected shaders:
Instrs: 13445310 -> 13444873 (-0.00%); split: -0.00%, +0.00%
Cycle count: 6941685096 -> 6951744335 (+0.14%); split: -0.03%, +0.18%
Max live registers: 1071302 -> 1071300 (-0.00%)

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150644063 -> 150643944 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15618718733 -> 15622092285 (+0.02%); split: -0.01%, +0.03%
Spill count: 58816 -> 58790 (-0.04%)
Fill count: 101054 -> 101065 (+0.01%)
Max live registers: 31792771 -> 31792766 (-0.00%); split: -0.00%, +0.00%

Totals from 13383 (2.12% of 632544) affected shaders:
Instrs: 12016285 -> 12016166 (-0.00%); split: -0.00%, +0.00%
Cycle count: 5239956851 -> 5243330403 (+0.06%); split: -0.02%, +0.08%
Spill count: 28977 -> 28951 (-0.09%)
Fill count: 47568 -> 47579 (+0.02%)
Max live registers: 1001554 -> 1001549 (-0.00%); split: -0.00%, +0.00%

Skylake
Totals:
Instrs: 140943195 -> 140943154 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14818940190 -> 14816706154 (-0.02%); split: -0.02%, +0.00%
Max live registers: 31663173 -> 31663168 (-0.00%); split: -0.00%, +0.00%

Totals from 12625 (2.01% of 629351) affected shaders:
Instrs: 11598223 -> 11598182 (-0.00%); split: -0.00%, +0.00%
Cycle count: 4519027823 -> 4516793787 (-0.05%); split: -0.05%, +0.00%
Max live registers: 970275 -> 970270 (-0.00%); split: -0.00%, +0.00%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
fb9b363376 brw/nir: Treat load_inline_data_intel as convergent
No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 141808595 -> 141808815 (+0.00%); split: -0.00%, +0.00%
Cycle count: 22181300418 -> 22185066952 (+0.02%); split: -0.01%, +0.03%
Max live registers: 48052077 -> 48052083 (+0.00%)

Totals from 720 (0.13% of 551446) affected shaders:
Instrs: 116778 -> 116998 (+0.19%); split: -0.01%, +0.20%
Cycle count: 1197931082 -> 1201697616 (+0.31%); split: -0.21%, +0.53%
Max live registers: 56552 -> 56558 (+0.01%)

No fossil-db changes on any other Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
3e63920ca5 brw/nir: Treat some load_ubo as convergent
v2: Fix for Xe2.

No changes in shader-db or fossil-db on Lunar Lake, Meteor Lake, or DG2.

shader-db:

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
total instructions in shared programs: 19626547 -> 19634353 (0.04%)
instructions in affected programs: 1591181 -> 1598987 (0.49%)
helped: 925 / HURT: 3595

total cycles in shared programs: 865236718 -> 866682659 (0.17%)
cycles in affected programs: 151284264 -> 152730205 (0.96%)
helped: 3430 / HURT: 5510

total sends in shared programs: 1032237 -> 1032233 (<.01%)
sends in affected programs: 20 -> 16 (-20.00%)
helped: 4 / HURT: 0

LOST:   48
GAINED: 141

fossil-db:

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150662952 -> 150641175 (-0.01%); split: -0.03%, +0.02%
Subgroup size: 7768880 -> 7768888 (+0.00%)
Send messages: 7502265 -> 7502044 (-0.00%)
Cycle count: 15621785298 -> 15618640525 (-0.02%); split: -0.06%, +0.04%
Spill count: 58818 -> 58816 (-0.00%)
Fill count: 101063 -> 101054 (-0.01%)
Max live registers: 31795403 -> 31792179 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 5572160 -> 5571488 (-0.01%); split: +0.00%, -0.01%

Totals from 10278 (1.62% of 632539) affected shaders:
Instrs: 5276493 -> 5254716 (-0.41%); split: -0.89%, +0.48%
Subgroup size: 156432 -> 156440 (+0.01%)
Send messages: 279259 -> 279038 (-0.08%)
Cycle count: 6483576378 -> 6480431605 (-0.05%); split: -0.16%, +0.11%
Spill count: 27133 -> 27131 (-0.01%)
Fill count: 49384 -> 49375 (-0.02%)
Max live registers: 675781 -> 672557 (-0.48%); split: -0.49%, +0.01%
Max dispatch width: 97256 -> 96584 (-0.69%); split: +0.08%, -0.77%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
c48570d2b2 brw/nir: Treat some ALU results as convergent
v2: Fix for Xe2.

v3: Fix handling of 64-bit CMP results.

v4: Scalarize 16-bit comparison temporary destination when used as a
source (as was already done for 64-bit). Suggested by Ken.

shader-db:

Lunar Lake
total instructions in shared programs: 18096500 -> 18096549 (<.01%)
instructions in affected programs: 15919 -> 15968 (0.31%)
helped: 8 / HURT: 21

total cycles in shared programs: 921841300 -> 922073090 (0.03%)
cycles in affected programs: 115946336 -> 116178126 (0.20%)
helped: 386 / HURT: 135

Meteor Lake and DG2 (Meteor Lake shown)
total instructions in shared programs: 19836053 -> 19836016 (<.01%)
instructions in affected programs: 19547 -> 19510 (-0.19%)
helped: 21 / HURT: 18

total cycles in shared programs: 906713777 -> 906588541 (-0.01%)
cycles in affected programs: 96914584 -> 96789348 (-0.13%)
helped: 335 / HURT: 134

total fills in shared programs: 6712 -> 6710 (-0.03%)
fills in affected programs: 52 -> 50 (-3.85%)
helped: 1 / HURT: 0

LOST:   1
GAINED: 1

Tiger Lake
total instructions in shared programs: 19641284 -> 19641278 (<.01%)
instructions in affected programs: 12358 -> 12352 (-0.05%)
helped: 10 / HURT: 19

total cycles in shared programs: 865413131 -> 865460513 (<.01%)
cycles in affected programs: 74641489 -> 74688871 (0.06%)
helped: 388 / HURT: 100

total spills in shared programs: 3899 -> 3898 (-0.03%)
spills in affected programs: 17 -> 16 (-5.88%)
helped: 1 / HURT: 0

total fills in shared programs: 3249 -> 3245 (-0.12%)
fills in affected programs: 51 -> 47 (-7.84%)
helped: 1 / HURT: 0

LOST:   1
GAINED: 1

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20495826 -> 20496111 (<.01%)
instructions in affected programs: 53220 -> 53505 (0.54%)
helped: 28 / HURT: 16

total cycles in shared programs: 875173550 -> 875243910 (<.01%)
cycles in affected programs: 51700652 -> 51771012 (0.14%)
helped: 400 / HURT: 39

total spills in shared programs: 4546 -> 4546 (0.00%)
spills in affected programs: 288 -> 288 (0.00%)
helped: 1 / HURT: 2

total fills in shared programs: 5224 -> 5280 (1.07%)
fills in affected programs: 795 -> 851 (7.04%)
helped: 0 / HURT: 4

LOST:   1
GAINED: 1

fossil-db:

Lunar Lake
Totals:
Instrs: 141811551 -> 141807640 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22183128332 -> 22181285594 (-0.01%); split: -0.06%, +0.05%
Spill count: 69890 -> 69859 (-0.04%); split: -0.09%, +0.04%
Fill count: 128877 -> 128344 (-0.41%); split: -0.42%, +0.00%
Max live registers: 48053415 -> 48051613 (-0.00%); split: -0.00%, +0.00%

Totals from 6817 (1.24% of 551443) affected shaders:
Instrs: 4300169 -> 4296258 (-0.09%); split: -0.14%, +0.05%
Cycle count: 17263755610 -> 17261912872 (-0.01%); split: -0.08%, +0.07%
Spill count: 41822 -> 41791 (-0.07%); split: -0.15%, +0.07%
Fill count: 75523 -> 74990 (-0.71%); split: -0.71%, +0.01%
Max live registers: 733647 -> 731845 (-0.25%); split: -0.29%, +0.04%

Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 152735305 -> 152735801 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 7733536 -> 7733616 (+0.00%)
Cycle count: 17398725539 -> 17400873100 (+0.01%); split: -0.00%, +0.02%
Max live registers: 31887018 -> 31885742 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5561696 -> 5561712 (+0.00%)

Totals from 5672 (0.90% of 633314) affected shaders:
Instrs: 2817606 -> 2818102 (+0.02%); split: -0.05%, +0.07%
Subgroup size: 81128 -> 81208 (+0.10%)
Cycle count: 10021470543 -> 10023618104 (+0.02%); split: -0.01%, +0.03%
Max live registers: 306520 -> 305244 (-0.42%); split: -0.43%, +0.01%
Max dispatch width: 74136 -> 74152 (+0.02%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
7eab2cb67e brw/nir: Treat load_workgroup_id as convergent
v2: Fix for Xe2.

shader-db:

Lunar Lake Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18096526 -> 18096500 (<.01%)
instructions in affected programs: 6759 -> 6733 (-0.38%)
helped: 9 / HURT: 3

total cycles in shared programs: 921727804 -> 921841300 (0.01%)
cycles in affected programs: 110049730 -> 110163226 (0.10%)
helped: 90 / HURT: 372

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20496591 -> 20496402 (<.01%)
instructions in affected programs: 48757 -> 48568 (-0.39%)
helped: 25 / HURT: 8

total cycles in shared programs: 875253948 -> 875237902 (<.01%)
cycles in affected programs: 56760140 -> 56744094 (-0.03%)
helped: 363 / HURT: 34

total spills in shared programs: 4555 -> 4546 (-0.20%)
spills in affected programs: 174 -> 165 (-5.17%)
helped: 2 / HURT: 0

total fills in shared programs: 5243 -> 5224 (-0.36%)
fills in affected programs: 382 -> 363 (-4.97%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 141811577 -> 141811551 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22173792370 -> 22183128332 (+0.04%); split: -0.00%, +0.04%
Max live registers: 48053498 -> 48053415 (-0.00%)

Totals from 3911 (0.71% of 551443) affected shaders:
Instrs: 2164804 -> 2164778 (-0.00%); split: -0.00%, +0.00%
Cycle count: 2404062476 -> 2413398438 (+0.39%); split: -0.02%, +0.41%
Max live registers: 413583 -> 413500 (-0.02%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
6fab1b77c2 brw/nir: Treat some load_uniform as convergent
No shader-db changes on any Intel platform.

v2: Fix for Xe2.

v3: Rework the way that we determine that an intrinsic can actually be
convergent. This will now depend on whether or not the important
sources have previously be determined to be convergent. Fixes
intermitent failures in some test cases (including
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.push_constant_float_16_to_32.scalar_frag).

v4: s/the it/it/ in a comment. Noticed by Ken.

fossil-db:

No fossil-db changes on Lunar Lake.

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152743449 -> 152743161 (-0.00%)
Cycle count: 17399179660 -> 17399193488 (+0.00%)

Totals from 144 (0.02% of 633314) affected shaders:
Instrs: 5936 -> 5648 (-4.85%)
Cycle count: 51616 -> 65444 (+26.79%)

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150646195 -> 150645907 (-0.00%)
Cycle count: 15618427818 -> 15618428942 (+0.00%)

Totals from 144 (0.02% of 632567) affected shaders:
Instrs: 6218 -> 5930 (-4.63%)
Cycle count: 39968 -> 41092 (+2.81%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
341e5117ec brw/nir: Treat load_const as convergent
opt_combine_constants goes to great effort to pack 8 constants into a
single register, this can't have much effect.

There is a lot of fossil-db variation among platforms, but the results
are generally positive.

v2: Fix for Xe2.

shader-db:

Lunar Lake
total instructions in shared programs: 18095100 -> 18092845 (-0.01%)
instructions in affected programs: 158931 -> 156676 (-1.42%)
helped: 423 / HURT: 0

total cycles in shared programs: 921523326 -> 921522784 (<.01%)
cycles in affected programs: 7522774 -> 7522232 (<.01%)
helped: 225 / HURT: 228

LOST:   1
GAINED: 7

Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19820211 -> 19820303 (<.01%)
instructions in affected programs: 53087 -> 53179 (0.17%)
helped: 135 / HURT: 1

total cycles in shared programs: 906380523 -> 906383031 (<.01%)
cycles in affected programs: 1402315 -> 1404823 (0.18%)
helped: 156 / HURT: 100

LOST:   1
GAINED: 16

fossil-db:

Lunar Lake
Totals:
Instrs: 141876801 -> 141783010 (-0.07%); split: -0.07%, +0.00%
Subgroup size: 10994624 -> 10994704 (+0.00%)
Cycle count: 22173441950 -> 22172949188 (-0.00%); split: -0.01%, +0.01%
Spill count: 69850 -> 69890 (+0.06%); split: -0.00%, +0.06%
Fill count: 129285 -> 128877 (-0.32%)
Max live registers: 48047900 -> 48043650 (-0.01%); split: -0.01%, +0.00%

Totals from 29837 (5.41% of 551396) affected shaders:
Instrs: 7842512 -> 7748721 (-1.20%); split: -1.23%, +0.03%
Subgroup size: 940320 -> 940400 (+0.01%)
Cycle count: 3444846368 -> 3444353606 (-0.01%); split: -0.09%, +0.08%
Spill count: 23358 -> 23398 (+0.17%); split: -0.01%, +0.18%
Fill count: 52296 -> 51888 (-0.78%)
Max live registers: 3183481 -> 3179231 (-0.13%); split: -0.16%, +0.03%

Meteor Lake
Totals:
Instrs: 152709353 -> 152666543 (-0.03%); split: -0.03%, +0.00%
Cycle count: 17397176906 -> 17397668904 (+0.00%); split: -0.00%, +0.01%
Fill count: 147896 -> 147893 (-0.00%)
Max live registers: 31862891 -> 31861888 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04%

Totals from 20913 (3.30% of 633046) affected shaders:
Instrs: 6676676 -> 6633866 (-0.64%); split: -0.64%, +0.00%
Cycle count: 1498330125 -> 1498822123 (+0.03%); split: -0.06%, +0.09%
Fill count: 41010 -> 41007 (-0.01%)
Max live registers: 1799295 -> 1798292 (-0.06%); split: -0.06%, +0.00%
Max dispatch width: 12880 -> 14992 (+16.40%); split: +33.29%, -16.89%

DG2 and Tiger Lake had similar results. (DG2 shown)
Totals:
Instrs: 152730878 -> 152688139 (-0.03%); split: -0.03%, +0.00%
Cycle count: 17394835605 -> 17394179808 (-0.00%); split: -0.01%, +0.00%
Max live registers: 31862843 -> 31861840 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04%

Totals from 20912 (3.30% of 633046) affected shaders:
Instrs: 6563021 -> 6520282 (-0.65%); split: -0.65%, +0.00%
Cycle count: 1201999616 -> 1201343819 (-0.05%); split: -0.08%, +0.03%
Max live registers: 1798392 -> 1797389 (-0.06%); split: -0.06%, +0.00%
Max dispatch width: 12872 -> 14984 (+16.41%); split: +33.31%, -16.90%

Ice Lake
Totals:
Instrs: 151914872 -> 151868108 (-0.03%)
Cycle count: 15262958696 -> 15262665082 (-0.00%); split: -0.00%, +0.00%
Max live registers: 32194225 -> 32193192 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5650880 -> 5650608 (-0.00%); split: +0.02%, -0.03%

Totals from 22192 (3.48% of 637223) affected shaders:
Instrs: 6419739 -> 6372975 (-0.73%)
Cycle count: 184733818 -> 184440204 (-0.16%); split: -0.36%, +0.20%
Max live registers: 1989950 -> 1988917 (-0.05%); split: -0.05%, +0.00%
Max dispatch width: 5744 -> 5472 (-4.74%); split: +23.40%, -28.13%

Skylake
Totals:
Instrs: 141027379 -> 140811741 (-0.15%)
Cycle count: 14817704293 -> 14817418611 (-0.00%); split: -0.01%, +0.01%
Max live registers: 31628796 -> 31627791 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5535176 -> 5539880 (+0.08%); split: +0.14%, -0.06%

Totals from 22218 (3.53% of 628840) affected shaders:
Instrs: 5944856 -> 5729218 (-3.63%)
Cycle count: 182845101 -> 182559419 (-0.16%); split: -0.60%, +0.44%
Max live registers: 1974576 -> 1973571 (-0.05%); split: -0.07%, +0.02%
Max dispatch width: 16912 -> 21616 (+27.81%); split: +46.93%, -19.11%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
d0f1a94e3d brw/build: Prepare BROADCAST for scalar values
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
5ea9ed4798 brw/nir: Prepare try_rebuild_source for scalar values
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
59f66b4150 brw/emit: Allow scalar sources to HF math instructions on Xe2
v2: Add a comment explaining the context of the workaround. Suggested by
Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
4457073c32 brw/lower: Properly handle UNIFORM globals address in lower_trace_ray_logical_send
v2: Don't shadow previous declaration of globals_addr. Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
007c92b2ac brw/lower: Adjust source stride on DF is_scalar sources to MAD on Gfx9
This commit used to be "brw/emit: Allow scalar sources to 64-bit
3-source instructions". These instructions were fixed up in
brw_eu_emit. There seems to be some conflict with the <0,1,0> stride an
post-RA scheduling. The only difference between the passing code
generated by this commit and the failing code generated by the older
commit is some post-RA scheduling.

v2: Change the stride of a MAD even if the instruction isn't
lowered. MAD instructions that are already SIMD8 have to follow the same
rules. 🤦

v3: Pull the lowering out to its own pass. Update the comment in
brw_fs_validate. Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
d5d7ae22ae brw/nir: Fix up handling of sources that might be convergent vectors
Sources that are scalars (almost all source) and convergent generally
want <0,1,0> source stride. Sources that are vectors (e.g., texture
coordinates, SSBO write data, etc.) and convergent want no extra strides
applied. In nearly all cases LOAD_PAYLOAD lowering will do the right
thing.

v2: Use VEC in emit_pixel_interpolater_send. Suggested by Ken.

v3: With the elimination of offset_to_component(), offset() may not
convert an is_scalar source to have a zero stride. Explicitly do this
in get_nir_src and prepare_alu_destination_and_sources.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
9e6bd5bf97 brw/lower: Allow uniform and scalar sources to many kinds of SEND
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
1bff4f93ca brw: Basic infrastructure to store convergent values as scalars
In SIMD16 and SIMD32, storing convergent values in full 16- or
32-channel registers is wasteful. It wastes register space, and in most
cases on SIMD32, it wastes instructions. Our register allocator is not
clever enough to handle scalar allocations. It's fundamental unit of
allocation is SIMD8. Start treating convergent values as SIMD8.

Add a tracking bit in brw_reg to specify that a register represents a
convergent, scalar value. This has two implications:

1. All channels of the SIMD8 register must contain the same value. In
   general, this means that writes to the register must be
   force_writemask_all and exec_size = 8;

2. Reads of this register can (and should) use <0,1,0> stride. SIMD8
   instructions that have restrictions on source stride can us <8,8,1>.

Values that are vectors (e.g., results of load_uniform or texture
operations) will be stored as multiple SIMD8 hardware registers.

v2: brw_fs_opt_copy_propagation_defs fix from Ken. Fix for Xe2.

v3: Eliminte offset_to_scalar(). Remove mention of vec4 backend in
brw_reg.h. Both suggested by Caio. The offset_to_scalar() change
necessitates some trickery in the fs_builder offset() function, but I
think this is an improvement overall. There is also some rework in
find_value_for_offset to account for the possibility that is_scalar
sources in LOAD_PAYLOAD might be <8;8,1> or <0;1,0>.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
ef3dc401da brw: Add devinfo parameter to fs_inst::regs_read
This isn't used now, but future commits will add uses. Doing this as a
separate commit removes a lot of "just typing" churn from commits that
have real changes to review.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Marek Olšák
73d675451b ci: update fail lists and trace checksums
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31942>
2024-12-24 05:54:07 -05:00
Rohan Garg
308c2b9828 anv: refactor choose_isl_tiling_flags to pass fewer arguments
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32771>
2024-12-23 19:33:36 +00:00
Rohan Garg
f96b2c002d isl: disable aux when creating uncompressed TileY/Tile64 surfaces from compressed ones
Fixes: 8e96b51 ('intel/isl: Assert alignments of surface addresses')
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32771>
2024-12-23 19:33:36 +00:00
Mary Guillemard
5ddeea9a62 meson: Add precomp-compiler and install-precomp-compiler options
As Asahi, Intel and soon Panfrost requires an offline compiler for their
respective internal shaders, this commit adds generic new options to
workaround meson current limitations around cross-compillation.

Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32719>
2024-12-23 15:09:41 +00:00
Caio Oliveira
5c0c3120ca intel/brw: Use variable instead of manually count the passes
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32709>
2024-12-20 22:41:20 +00:00
Caio Oliveira
ada898bb1c intel/brw: Disallow cmod in some cases of ARF scalar as destination
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32758>
2024-12-20 14:03:15 -08:00
Francisco Jerez
43d59c6186 intel/brw/xe3+: Relax SEND EOT register assignment restrictions.
These restrictions have been removed from the hardware.  Make the code
enforcing and validating them conditional.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32758>
2024-12-20 14:03:15 -08:00
Rohan Garg
9a4f5b739e intel/compiler: disable mesh autostrip for WA 16020916187
Disable mesh autostrip for platforms that need WA 16020916187.
Additionally, zero out the layer and viewport slots when a shading rate
is found through the brw_nir_initialize_mue pass.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32751>
2024-12-20 17:02:20 +01:00
Kenneth Graunke
cb756ae8a2 brw: Don't rely on SIMD splitting in opt_combine_convergent_txfs
The SIMD splitting pass does not handle wide force_writemask_all
instructions correctly at the moment.  For example, a SIMD32 TXF
on pre-Xe2 would get split to a pair of SIMD16.  But it will set
the groups to operate on channels 15:0 and 31:16.  That's not what
we want for a NoMask instruction - both should be 15:0, i.e.
bld.group(inst->exec_size, 0).

We could (and perhaps should) fix the SIMD splitting pass to handle
this, but the pass already has subtle complexity in which builders
are used.  Or we could alter fs_builder::group(), but that has broader
implications.  As a stop-gap, just make opt_combine_covergent_txfs stop
relying on SIMD splitting.  It's trivial to do and fixes the issue
without risking other breakage.

Fixes: 6341b3cd87 ("brw: Combine convergent texture buffer fetches into fewer loads")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32714>
2024-12-19 23:16:12 +00:00
Erik Faye-Lund
97dec34a89 hasvk: use vk_descriptor_type_is_dynamic
No need to open-code this one now that we have a generic helper.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32627>
2024-12-19 15:12:58 +00:00
Erik Faye-Lund
e17abeca44 anv: use vk_descriptor_type_is_dynamic
No need to open-code this one now that we have a generic helper.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32627>
2024-12-19 15:12:58 +00:00
José Roberto de Souza
2bd3df75e5 anv: Emit STATE_SYSTEM_MEM_FENCE_ADDRESS
According to HAS it is necessary to emit this instruction once per
context so MI_MEM_FENCE works properly.

Fixes: 86813c60a4 ("mi-builder: add read/write memory fencing support on Gfx20+")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32680>
2024-12-18 17:16:05 +00:00
José Roberto de Souza
b8f93bfd38 anv: Always create anv_async_submit in init_copy_video_queue_state()
A next patch will emit more instructions in video and copy queues
for Gfx 200 and newer but the current code only creates anv_async_submit
if device has aux_map.
Instead we can always create anv_async_submit and only submit it to
hardware if any instruction was emited.

Fixes: 86813c60a4 ("mi-builder: add read/write memory fencing support on Gfx20+")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32680>
2024-12-18 17:16:05 +00:00
José Roberto de Souza
edb33b47ab intel/genxml/xe2: Add STATE_SYSTEM_MEM_FENCE_ADDRESS instruction
Fixes: 86813c60a4 ("mi-builder: add read/write memory fencing support on Gfx20+")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32680>
2024-12-18 17:16:05 +00:00
Valentine Burley
61d9c47944 ci/lava: Use CI_JOB_TIMEOUT instead of separate variable
The CI_JOB_TIMEOUT variable is the GitLab-defined job timeout in
seconds.
Use this variable in LAVA instead of the separate JOB_TIMEOUT,
which was intended to represent the test phase timeout (job timeout
minus 5 minutes), but was often overlooked.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32609>
2024-12-18 09:23:27 +00:00
Valentine Burley
3d1dd22bb4 anv/ci: Update expectations
Remove bogus failures caused by wrong GPU_VERSION configuration,
delete tests that no longer exist in current CTS versions, and
update expectations.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>
2024-12-18 07:13:44 +00:00
Valentine Burley
526ec3e7dd anv/ci: Remove fails that are in .gitlab-ci/all-skips.txt
These tests are always skipped in Mesa.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>
2024-12-18 07:13:44 +00:00
Valentine Burley
f42d670ea6 anv/ci: Re-enable TGL and JSL manual jobs
Thanks to the speedup achieved by increasing tests_per_group,
nightly jobs are now within reasonable time limits, allowing them
to be re-enabled.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>
2024-12-18 07:13:44 +00:00
Valentine Burley
eb7fb2e919 anv/ci: Bump the number of tests per group for TGL
Due to the slow startup time of deqp-vk, the previous default of
500 tests per group caused the jobs to run up to twice as slowly
compared to using a higher number of tests per group.

Increase the number of tests per group for all subsets of the
deqp-runner suites, which allows decreasing the fractions.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>
2024-12-18 07:13:44 +00:00
Valentine Burley
629b19a59f anv/ci: Bump the number of tests per group for JSL
Due to the slow startup time of deqp-vk, the previous default of
500 tests per group caused the jobs to run up to twice as slowly
compared to using a higher number of tests per group.

Increase the number of tests per group for all subsets of the
deqp-runner suites, which allows decreasing the fractions.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>
2024-12-18 07:13:44 +00:00
Valentine Burley
e68f9bb856 anv/ci: Bump the number of tests per group for ADL
Due to the slow startup time of deqp-vk, the previous default of
500 tests per group caused the jobs to run up to twice as slowly
compared to using a higher number of tests per group.

Increase the number of tests per group for all subsets of the
deqp-runner suites, which allows decreasing the fractions.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>
2024-12-18 07:13:44 +00:00
Valentine Burley
e7e9ceceb3 anv/ci: Fix GPU_VERSION configuration for anv-jsl and anv-jsl-full
The GPU_VERSION was incorrectly set to iris-jsl for these ANV jobs,
causing mismatched expectations.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32681>
2024-12-18 07:13:44 +00:00
Ian Romanick
b4d472cd67 brw/emit: Fix BROADCAST when value is uniform and index is immediate
Fixes: c74511f5dc ("i965: Introduce the BROADCAST pseudo-opcode.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Tried-to-help-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32668>
2024-12-17 21:57:26 +00:00
Kevin Chuang
1b55f10105 anv/bvh: Dump BVH synchronously upon command buffer completion
Modified the BVH dumping mechanism to synchronously wait for the command
buffer to complete before saving BVH data to files. This approach is
more robust compared to the previous method of dumping during
acceleration strucutre destruction.

Note: if DEBUG_BVH_ANY is enabled but intel-rt is disabled, we will wait
for nothing.

Signed-off-by: Kevin Chuang <kaiwenjon23@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32585>
2024-12-16 23:01:11 +00:00
Caio Oliveira
93dfe504f2 intel/brw: Add SHADER_OPCODE_READ_FROM_CHANNEL and LIVE_CHANNEL
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32412>
2024-12-14 11:38:14 -08:00
Caio Oliveira
d325de316d intel/brw: Add some tests for new Xe2 register regioning restrictions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>
2024-12-14 02:15:18 +00:00
Caio Oliveira
f308be16a0 intel/brw: Add validation for some Xe2 register regioning restrictions
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>
2024-12-14 02:15:18 +00:00
Caio Oliveira
6a5a316312 intel/brw: Extract format enum in EU validation code
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>
2024-12-14 02:15:18 +00:00
Caio Oliveira
57b703cec3 intel/brw: Skip some regioning EU validation for Vx1 and VxH modes
Skip the ones that check the VertStride -- which is set to a special
value in those modes.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28636>
2024-12-14 02:15:18 +00:00
Felix DeGrood
0f46c53b0c anv: Use vfg distribution mode = RR_STRICT for Xe2+
Performance tuning. Round Robin strict faster on Xe2 for some
workloads.

Speedup:
 - Borderlands3-dx11-trace: +4%
 - WolfensteinYoungblood-vk.g6: +1.5%
 - Cyberpunk2077-dx12vk-2160p-ultra: +0.5%

Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32566>
2024-12-13 19:15:48 +00:00
Deborah Brouwer
b6435207ab ci: python-test rename artifacts
The current python-test job creates and compresses python related
artifacts for use by future jobs. The artifacts are currently named
`mesa-python-test` which is somewhat misleading because they are not
needed for testing python scripts or libraries.

Rename the artifacts generated by the python-test job to be more
descriptive of their purpose.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32340>
2024-12-13 10:04:03 -08:00
Sagar Ghuge
d3f9139e49 intel: Use Morton compute walk order
According to HSD 14016252163 if compute shader uses the sample
operation, morton walk order and set the thread group batch size to 4 is
expected to increase sampler cache hit rates by increasing sample
address locality within a subslice.

Rework:
 * Caio: "||" => "&&" for type checking in instr_uses_sampler()
 * Jordan: Use nir's foreach macros rather than
   nir_shader_lower_instructions()

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>
2024-12-12 19:56:47 -08:00
Sagar Ghuge
4bd958243d intel/genxml: Update COMPUTE_WALKER_BODY
For PTL, we can have one more additional walk order along with the
"Thread Group Batch Size" field.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32430>
2024-12-12 19:56:47 -08:00