Commit graph

3835 commits

Author SHA1 Message Date
Iván Briano
aee04bf4fb intel/rt: fix ray_query stack address calculation
While the documentation says to use NUM_SIMD_LANES_PER_DSS for the stack
address calculation, what the HW actually uses is
NUM_SYNC_STACKID_PER_DSS. The former may vary depending on the platform,
while the latter is fixed to 2048 for all current platforms.

Fixes: 6c84cbd8c9 ("intel/dev/xe: Set max_eus_per_subslice using topology query")

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32049>
2024-11-08 18:31:52 +00:00
Ian Romanick
7aad19ccd2 brw/lower: Lower invalid source conversion to better code
There are two fragment shaders from RDR2 that is hurt for spills and
fills on Lunar Lake.

    Totals from 2 (0.00% of 551413) affected shaders:
    Spill count: 1252 -> 1317 (+5.19%)
    Fill count: 2518 -> 2642 (+4.92%)

Those shaders... have a lot of room for improvement. There are some
patterns in those shaders that we handle very, very poorly. Improving
those patterns would likely improve the spills and fills in these
shaders quite dramatically.

Given how much other platforms are helped, I don't this should block
this commit.

No shader-db or fossil-db changes on any pre-Gfx12.5 Intel platforms.

v2: Add some comments and an additional assertion. Suggested by Ken.

shader-db:

Lunar Lake
total instructions in shared programs: 18094517 -> 18094511 (<.01%)
instructions in affected programs: 809 -> 803 (-0.74%)
helped: 6 / HURT: 0

total cycles in shared programs: 921532158 -> 921532168 (<.01%)
cycles in affected programs: 2266 -> 2276 (0.44%)
helped: 0 / HURT: 3

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19820845 -> 19820839 (<.01%)
instructions in affected programs: 803 -> 797 (-0.75%)
helped: 6 / HURT: 0

total cycles in shared programs: 906372999 -> 906372949 (<.01%)
cycles in affected programs: 3216 -> 3166 (-1.55%)
helped: 6 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 141887377 -> 141884465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21990301498 -> 21990267232 (-0.00%); split: -0.00%, +0.00%
Spill count: 69732 -> 69797 (+0.09%)
Fill count: 128521 -> 128645 (+0.10%)

Totals from 349 (0.06% of 551413) affected shaders:
Instrs: 506117 -> 503205 (-0.58%); split: -0.79%, +0.21%
Cycle count: 32362996 -> 32328730 (-0.11%); split: -0.52%, +0.41%
Spill count: 1951 -> 2016 (+3.33%)
Fill count: 4899 -> 5023 (+2.53%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152773732 -> 152761383 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17187529968 -> 17187450663 (-0.00%); split: -0.00%, +0.00%
Spill count: 79279 -> 79003 (-0.35%)
Fill count: 148803 -> 147942 (-0.58%)
Scratch Memory Size: 3949568 -> 3946496 (-0.08%)
Max live registers: 31879325 -> 31879230 (-0.00%)

Totals from 366 (0.06% of 633185) affected shaders:
Instrs: 557377 -> 545028 (-2.22%); split: -2.22%, +0.01%
Cycle count: 26171205 -> 26091900 (-0.30%); split: -0.54%, +0.24%
Spill count: 3238 -> 2962 (-8.52%)
Fill count: 10018 -> 9157 (-8.59%)
Scratch Memory Size: 257024 -> 253952 (-1.20%)
Max live registers: 28187 -> 28092 (-0.34%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
2a57568ebd brw/build: Add scalar_group() helper
Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is
deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems
different, so I decided to keep it.

The next commit wants to use this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
5dfea87623 brw/opt: Always do both kinds of copy propagation before lower_load_payload
shader-db:

All Intel platforms except Skylake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18092932 -> 18092713 (<.01%)
instructions in affected programs: 139290 -> 139071 (-0.16%)
helped: 103
HURT: 18
helped stats (abs) min: 1 max: 8 x̄: 2.43 x̃: 2
helped stats (rel) min: 0.02% max: 9.09% x̄: 0.73% x̃: 0.29%
HURT stats (abs)   min: 1 max: 5 x̄: 1.72 x̃: 1
HURT stats (rel)   min: 0.02% max: 0.55% x̄: 0.10% x̃: 0.08%
95% mean confidence interval for instructions value: -2.17 -1.45
95% mean confidence interval for instructions %-change: -0.83% -0.38%
Instructions are helped.

total cycles in shared programs: 922792268 -> 921495900 (-0.14%)
cycles in affected programs: 400296984 -> 399000616 (-0.32%)
helped: 765
HURT: 635
helped stats (abs) min: 2 max: 77018 x̄: 6739.33 x̃: 60
helped stats (rel) min: <.01% max: 35.59% x̄: 1.98% x̃: 0.32%
HURT stats (abs)   min: 2 max: 88658 x̄: 6077.51 x̃: 152
HURT stats (rel)   min: <.01% max: 51.33% x̄: 2.75% x̃: 0.63%
95% mean confidence interval for cycles value: -1620.41 -231.54
95% mean confidence interval for cycles %-change: -0.10% 0.44%
Inconclusive result (%-change mean confidence interval includes 0).

LOST:   4
GAINED: 3

Skylake
total instructions in shared programs: 18658324 -> 18579715 (-0.42%)
instructions in affected programs: 2089957 -> 2011348 (-3.76%)
helped: 9842
HURT: 23
helped stats (abs) min: 1 max: 24 x̄: 7.99 x̃: 8
helped stats (rel) min: 0.05% max: 40.00% x̄: 5.37% x̃: 4.52%
HURT stats (abs)   min: 1 max: 5 x̄: 1.57 x̃: 1
HURT stats (rel)   min: 0.02% max: 1.28% x̄: 0.36% x̃: 0.24%
95% mean confidence interval for instructions value: -7.98 -7.95
95% mean confidence interval for instructions %-change: -5.43% -5.29%
Instructions are helped.

total cycles in shared programs: 860031654 -> 860237548 (0.02%)
cycles in affected programs: 449175235 -> 449381129 (0.05%)
helped: 7895
HURT: 4416
helped stats (abs) min: 1 max: 14129 x̄: 113.70 x̃: 22
helped stats (rel) min: <.01% max: 40.95% x̄: 1.31% x̃: 0.56%
HURT stats (abs)   min: 1 max: 33397 x̄: 249.89 x̃: 34
HURT stats (rel)   min: <.01% max: 67.47% x̄: 2.65% x̃: 0.65%
95% mean confidence interval for cycles value: 1.46 31.98
95% mean confidence interval for cycles %-change: 0.02% 0.19%
Cycles are HURT.

LOST:   557
GAINED: 900

fossil-db:

Lunar Lake
Totals:
Instrs: 141933621 -> 141884681 (-0.03%); split: -0.03%, +0.00%
Cycle count: 21990657282 -> 21990200212 (-0.00%); split: -0.14%, +0.14%
Spill count: 69754 -> 69732 (-0.03%); split: -0.05%, +0.02%
Fill count: 128559 -> 128521 (-0.03%); split: -0.05%, +0.02%
Scratch Memory Size: 5934080 -> 5925888 (-0.14%)
Max live registers: 48021653 -> 48051253 (+0.06%); split: -0.00%, +0.06%

Totals from 13510 (2.45% of 551410) affected shaders:
Instrs: 19497180 -> 19448240 (-0.25%); split: -0.25%, +0.00%
Cycle count: 2455370202 -> 2454913132 (-0.02%); split: -1.25%, +1.23%
Spill count: 10975 -> 10953 (-0.20%); split: -0.32%, +0.12%
Fill count: 21709 -> 21671 (-0.18%); split: -0.28%, +0.10%
Scratch Memory Size: 674816 -> 666624 (-1.21%)
Max live registers: 2502653 -> 2532253 (+1.18%); split: -0.01%, +1.19%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152763523 -> 152772716 (+0.01%); split: -0.00%, +0.01%
Cycle count: 17188701887 -> 17187510768 (-0.01%); split: -0.10%, +0.09%
Spill count: 79280 -> 79279 (-0.00%); split: -0.00%, +0.00%
Fill count: 148809 -> 148803 (-0.00%)
Max live registers: 31879240 -> 31879093 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559984 -> 5559712 (-0.00%); split: +0.00%, -0.01%

Totals from 20524 (3.24% of 633183) affected shaders:
Instrs: 20366964 -> 20376157 (+0.05%); split: -0.01%, +0.05%
Cycle count: 2406162382 -> 2404971263 (-0.05%); split: -0.68%, +0.63%
Spill count: 19935 -> 19934 (-0.01%); split: -0.02%, +0.01%
Fill count: 34487 -> 34481 (-0.02%)
Max live registers: 1745598 -> 1745451 (-0.01%); split: -0.01%, +0.01%
Max dispatch width: 117992 -> 117720 (-0.23%); split: +0.03%, -0.26%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150694108 -> 150683859 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15526754059 -> 15529031079 (+0.01%); split: -0.10%, +0.12%
Max live registers: 31791599 -> 31791441 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5569488 -> 5569296 (-0.00%); split: +0.00%, -0.01%

Totals from 15000 (2.37% of 632406) affected shaders:
Instrs: 10965577 -> 10955328 (-0.09%); split: -0.11%, +0.02%
Cycle count: 2025347115 -> 2027624135 (+0.11%); split: -0.80%, +0.91%
Max live registers: 983373 -> 983215 (-0.02%); split: -0.02%, +0.00%
Max dispatch width: 83064 -> 82872 (-0.23%); split: +0.12%, -0.35%

Skylake
Totals:
Instrs: 140588784 -> 140413758 (-0.12%); split: -0.13%, +0.00%
Cycle count: 14724286265 -> 14723402393 (-0.01%); split: -0.04%, +0.04%
Fill count: 100130 -> 100129 (-0.00%)
Max live registers: 31418029 -> 31417146 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5513400 -> 5535192 (+0.40%); split: +0.89%, -0.49%

Totals from 39733 (6.35% of 625986) affected shaders:
Instrs: 17240737 -> 17065711 (-1.02%); split: -1.02%, +0.01%
Cycle count: 1994668203 -> 1993784331 (-0.04%); split: -0.31%, +0.27%
Fill count: 44481 -> 44480 (-0.00%)
Max live registers: 2766781 -> 2765898 (-0.03%); split: -0.03%, +0.00%
Max dispatch width: 210600 -> 232392 (+10.35%); split: +23.23%, -12.89%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
be26012f1d brw/opt: Always do copy prop, DCE, and register coalesce after lower_regioning
shader-db:

Lunar Lake
total instructions in shared programs: 18100289 -> 18083853 (-0.09%)
instructions in affected programs: 790048 -> 773612 (-2.08%)
helped: 3058 / HURT: 1

total cycles in shared programs: 921691992 -> 921293816 (-0.04%)
cycles in affected programs: 37210762 -> 36812586 (-1.07%)
helped: 2329 / HURT: 624

LOST:   27
GAINED: 26

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19825635 -> 19821391 (-0.02%)
instructions in affected programs: 138675 -> 134431 (-3.06%)
helped: 877 / HURT: 0

total cycles in shared programs: 907900598 -> 907885713 (<.01%)
cycles in affected programs: 7127161 -> 7112276 (-0.21%)
helped: 318 / HURT: 242

total spills in shared programs: 5790 -> 5758 (-0.55%)
spills in affected programs: 660 -> 628 (-4.85%)
helped: 8 / HURT: 0

total fills in shared programs: 6744 -> 6712 (-0.47%)
fills in affected programs: 708 -> 676 (-4.52%)
helped: 8 / HURT: 0

LOST:   10
GAINED: 0

Skylake
total instructions in shared programs: 18722197 -> 18637637 (-0.45%)
instructions in affected programs: 2757553 -> 2672993 (-3.07%)
helped: 12290 / HURT: 1

total cycles in shared programs: 859716039 -> 859432560 (-0.03%)
cycles in affected programs: 113731837 -> 113448358 (-0.25%)
helped: 9555 / HURT: 2422

LOST:   265
GAINED: 714

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 142000618 -> 141928331 (-0.05%); split: -0.05%, +0.00%
Subgroup size: 10995136 -> 10995072 (-0.00%)
Cycle count: 21994723230 -> 21990481140 (-0.02%); split: -0.08%, +0.06%
Spill count: 69911 -> 69754 (-0.22%); split: -0.23%, +0.00%
Fill count: 128723 -> 128559 (-0.13%); split: -0.15%, +0.02%
Scratch Memory Size: 5936128 -> 5934080 (-0.03%)
Max live registers: 48006880 -> 48020936 (+0.03%); split: -0.01%, +0.04%

Totals from 17450 (3.16% of 551410) affected shaders:
Instrs: 14984149 -> 14911862 (-0.48%); split: -0.48%, +0.00%
Subgroup size: 365744 -> 365680 (-0.02%)
Cycle count: 2585095128 -> 2580853038 (-0.16%); split: -0.71%, +0.54%
Spill count: 20893 -> 20736 (-0.75%); split: -0.76%, +0.00%
Fill count: 44181 -> 44017 (-0.37%); split: -0.44%, +0.07%
Scratch Memory Size: 995328 -> 993280 (-0.21%)
Max live registers: 2378069 -> 2392125 (+0.59%); split: -0.20%, +0.79%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150719758 -> 150676269 (-0.03%); split: -0.04%, +0.01%
Subgroup size: 7764560 -> 7764632 (+0.00%)
Cycle count: 15526689814 -> 15525687740 (-0.01%); split: -0.03%, +0.02%
Spill count: 60120 -> 59472 (-1.08%); split: -1.17%, +0.10%
Fill count: 105973 -> 104675 (-1.22%); split: -1.40%, +0.17%
Scratch Memory Size: 2396160 -> 2381824 (-0.60%); split: -0.73%, +0.13%
Max live registers: 31782879 -> 31788857 (+0.02%); split: -0.01%, +0.03%
Max dispatch width: 5569200 -> 5569344 (+0.00%); split: +0.00%, -0.00%

Totals from 10089 (1.60% of 632405) affected shaders:
Instrs: 6389866 -> 6346377 (-0.68%); split: -0.87%, +0.19%
Subgroup size: 102912 -> 102984 (+0.07%)
Cycle count: 681310278 -> 680308204 (-0.15%); split: -0.65%, +0.51%
Spill count: 19571 -> 18923 (-3.31%); split: -3.61%, +0.30%
Fill count: 38229 -> 36931 (-3.40%); split: -3.88%, +0.48%
Scratch Memory Size: 808960 -> 794624 (-1.77%); split: -2.15%, +0.38%
Max live registers: 677473 -> 683451 (+0.88%); split: -0.45%, +1.33%
Max dispatch width: 88672 -> 88816 (+0.16%); split: +0.27%, -0.11%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
b2d7a823be brw/lower: Don't emit spurious moves to or from NULL register
Previously an instruction like

    cmp.l.f0.0(16) null:F, v359:F, 0f

would get lowered to

    undef(16) v13703:UD
    cmp.l.f0.0(16) v13703:F, v359:F, 0f
    mov(16) null:UD, v13703:UD

After copy propagation and dead-code elimination are run again, the
original CMP gets turned back into its original form!

Some cases can also emit MOVs from the original NULL register.

It should be possible to not do any lowering here, but there are some
interactions with source lowering passes for things like

    cmp.l.f0.0(16) null:HF, g89.1<16,16,1>:HF, 0hf

What inspired this was... diff'ing step-by-step dumps from
INTEL_DEBUG=optimizer had a lot of useless changes due to these MOVs
and undefs. It was very annoying.  This low-effort change gets the
majority of the possible benefit.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
9aba731d03 brw/cse: Don't eliminate instructions that write flags
With other changes in my tree, I observed this code from
dEQP-VK.subgroups.vote.compute.subgroupallequal_float have the second
cmp.z removed.

    undef(8) %69:UD
    cmp.z.f0.0(8) %69:F, %37:F, %57+0.0<0>:F
    mov(1) v58+0.0:D, 0d NoMask group0
    (+f0.0) mov(1) v58+0.0:D, -1d NoMask group0
    cmp.nz.f0.0(8) null:D, v58+0.0<0>:D, 0d
    ...
    undef(8) %72:UD
    cmp.z.f0.0(8) %72:F, %37:F, %57+0.0<0>:F
    mov(1) v63+0.0:D, 0d NoMask group0
    (+f0.0) mov(1) v63+0.0:D, -1d NoMask group0

This was also fixed by running dead-code elimination before CSE. That
seems more like avoiding the problem than fixing it, though.

I believe this affects shader-db results because leaving the second
CMP in the shader can give more opportunities for cmod propagation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 234c45c929 ("intel/brw: Write a new global CSE pass that works on defs")

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 922097690 -> 922260862 (0.02%)
cycles in affected programs: 3178926 -> 3342098 (5.13%)
helped: 130
HURT: 88
helped stats (abs) min: 2 max: 2194 x̄: 296.71 x̃: 16
helped stats (rel) min: <.01% max: 16.56% x̄: 1.86% x̃: 0.18%
HURT stats (abs)   min: 4 max: 11992 x̄: 2292.55 x̃: 47
HURT stats (rel)   min: 0.04% max: 57.32% x̄: 11.82% x̃: 0.61%
95% mean confidence interval for cycles value: 320.36 1176.63
95% mean confidence interval for cycles %-change: 1.59% 5.73%
Cycles are HURT.

LOST:   2
GAINED: 1

fossil-db:

Lunar Lake, Meteor Lake, Tiger Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 142022960 -> 142022928 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21995242782 -> 21995384040 (+0.00%); split: -0.00%, +0.00%
Max live registers: 48013385 -> 48013343 (-0.00%)

Totals from 507 (0.09% of 551441) affected shaders:
Instrs: 886191 -> 886159 (-0.00%); split: -0.01%, +0.01%
Cycle count: 69302492 -> 69443750 (+0.20%); split: -0.66%, +0.86%
Max live registers: 94413 -> 94371 (-0.04%)

DG2
Totals:
Instrs: 152856370 -> 152856093 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17237159885 -> 17236804052 (-0.00%); split: -0.00%, +0.00%
Fill count: 150673 -> 150631 (-0.03%)
Max live registers: 31871520 -> 31871476 (-0.00%)

Totals from 506 (0.08% of 633197) affected shaders:
Instrs: 831795 -> 831518 (-0.03%); split: -0.04%, +0.01%
Cycle count: 55578509 -> 55222676 (-0.64%); split: -1.38%, +0.74%
Fill count: 2779 -> 2737 (-1.51%)
Max live registers: 51383 -> 51339 (-0.09%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 152017826 -> 152017793 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15180773451 -> 15180761166 (-0.00%); split: -0.00%, +0.00%
Fill count: 106610 -> 106614 (+0.00%)
Max live registers: 32195006 -> 32194966 (-0.00%)

Totals from 411 (0.06% of 637268) affected shaders:
Instrs: 705935 -> 705902 (-0.00%); split: -0.01%, +0.01%
Cycle count: 47830019 -> 47817734 (-0.03%); split: -0.05%, +0.02%
Fill count: 2865 -> 2869 (+0.14%)
Max live registers: 42883 -> 42843 (-0.09%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
80a5d158ae brw/copy: Don't copy propagate through smaller entry dest size
Copy propagation would incorrectly occur in this code

    mov(16) v4+2.0:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0

to create

    mov(16) v4+2.0:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, u0<0>:UD NoMask group0

This has different behavior. I think I just made a mistake when I
changed this condition in e3f502e007.

It seems like this condition could be relaxed to cover cases like (note
the change of destination stride)

    mov(16) v4+2.0<2>:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0

I'm not sure it's worth it.

No shader-db or fossil-db changes on any Intel platform. Even the code
for the test case mentioned in the original commit did not change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e3f502e007 ("intel/fs: Allow copy propagation between MOVs of mixed sizes")
Closes: #12116
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
c1c09e3c4a brw/emit: Add correct 3-source instruction assertions for each platform
Specifically, allow two immediate sources for BFE on Gfx12+. I stumbled
on this while trying some stuff with !31852.

v2: Don't be lazy. Add proper assertions for all the things on all the
platforms. Based on a suggestion by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7bed11fbde ("intel/brw: Allow immediates in the BFE instruction on Gfx12+")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31858>
2024-11-08 16:48:57 +00:00
Kenneth Graunke
22b511ef02 intel: Set shader_spilling_rate=11 in intel_clc
A while back Matt enabled shader_spilling_rate by default for anv.
But intel_clc doesn't use the driconf mechanism that we use there.

The GRL shaders spill a lot, and with us now compiling additional
generations of the shaders, Mesa build time is getting prohibitively
expensive.  By setting this, we drop the time taken for a clean debug
build by approximately 35% on my current laptop.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31993>
2024-11-06 01:57:10 +00:00
Sviatoslav Peleshko
3a962a28e7 intel/elk_asm: Add BranchCtrl support
We emit it for gfx8, so the assembler should support it too.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
2024-11-02 18:01:20 +00:00
Sviatoslav Peleshko
cd4c328408 intel/elk: List all instructions that have BranchCtrl bit
Previously this bit was not clearly documented in PRMs, but gfx12 PRMs
finally list all the instructions where it is present.

Although it's unclear if it's functional for anything other than "if",
"else", and "goto", we probably still should acknowledge its existence
in other instructions.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
2024-11-02 18:01:20 +00:00
Sviatoslav Peleshko
445df8d611 intel/brw_asm: Add BranchCtrl support
We emit it for gfx9, so the assembler should support it too.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
2024-11-02 18:01:19 +00:00
Sviatoslav Peleshko
aea7366613 intel/brw: List all instructions that have BranchCtrl bit
Previously this bit was not clearly documented in PRMs, but gfx12 PRMs
finally list all the instructions where it is present.

Although it's unclear if it's functional for anything other than "if",
"else", and "goto", we probably still should acknowledge its existence
in other instructions.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
2024-11-02 18:01:19 +00:00
Paulo Zanoni
5ca883505e brw: add a NOP in between WHILE instructions on LNL
This is a workaround that is still in progress, see HSD 22020521218.
If we don't have these NOPs we may see rendering corruption or even
GPU hangs.

While we still don't fully understand the issue from the hardware
point of view, let's have this workaround so we can pass CTS and move
things forward. If we need to change this later, we can. Besides, the
impact is minimal. Shaderdb/fossilize report no changes for this
patch.

On our Blackops trace, the lack of this patch causes corruption in fog
rendering (rectangles where fog was supposed to be shown don't show
the fog).

On dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters, without
this patch we get a GPU hang.

Backport-to: 24.2
Testcase: dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11813
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31331>
2024-10-31 23:57:10 +00:00
Iván Briano
13db5fad27 brw: fix task/mesh push constant loading
The InlineData passed to the shader is a fixed size unrelated to the
register size. It happens to match pre-Xe2, but by considering it the
same in Xe2, we ended up reading pushed constants from the wrong place
when they didn't fit in the InlineData.

Fixes: 97b17aa0b1 ("brw/nir: rework inline_data_intel to work with compute")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31856>
2024-10-26 18:12:41 +00:00
Jordan Justen
35ace9d4e2 intel/compiler: Xe2 and Xe3 use the same compaction tables
Ref: bspec 56709
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>
2024-10-26 07:39:30 +00:00
Jordan Justen
688a673c5a intel/brw: Allow Xe3 in brw_stage_has_packed_dispatch()
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>
2024-10-26 07:39:30 +00:00
Jordan Justen
cd33b7766a intel/compiler: Add compiler enum for Xe3
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>
2024-10-26 07:39:30 +00:00
Ian Romanick
04e1783278 brw: Call brw_fs_opt_algebraic less often
No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31729>
2024-10-25 23:39:36 +00:00
Ian Romanick
ac64b78f1f brw/copy: Perform constant folding with constant propagation
No shader-db or fossil-db changes on any Intel platform.

v2: Simlify the logic for when to try constant folding. Do
commute_immediates before constant folding. Both suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31729>
2024-10-25 23:39:36 +00:00
Ian Romanick
2cc1575a31 brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31729>
2024-10-25 23:39:36 +00:00
Ian Romanick
5dcad54902 brw/sat: Convert nearly all tests to use new style builders
v2: Use new style builder for second ADD in other_non_saturated_use
too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
2024-10-25 20:31:45 +00:00
Ian Romanick
19ae7aceb5 brw/sat: Fix small typos, copy and paste, etc.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
2024-10-25 20:31:45 +00:00
Ian Romanick
de45273307 brw/builder: Add new style ALU3 builder
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
2024-10-25 20:31:45 +00:00
Ian Romanick
8329c04521 brw/copy: Don't remove instructions w/ conditional modifier
Fixes: 9e750f00c3 ("intel/brw: Make opt_copy_propagation_defs clean up its own trash")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
2024-10-25 20:31:44 +00:00
Kenneth Graunke
d949d47f09 brw/emit: Fix align16 3src subregister encodings for HF types
Prior to Cherryview, align16 3src instruction sources had to have their
subregister number be DWord-aligned.  Cherryview added a discontiguous
bit in the encoding to represent bit 1 of the subregister number.  This
allows us to use packed HF sources.

Update the ISA encoding helpers to properly handle bit 1.  While we're
at it, make them take a full subregister number and adjust accordingly,
rather than making the callers divide or multiply by some alignment.

Note that the destination subregister must still be DWord aligned, so
HF destinations must be strided.

Thanks to Ian Romanick for discovering that we were botching this.

BSpec: 12054, 12081

v2 (idr): Fix ordering of high and low bit parameters to brw_inst_bits.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
2024-10-25 20:31:44 +00:00
Kenneth Graunke
33cd5a49f1 brw/validate: Return an error for Align16 access mode on Icelake+
Gfx11+ doesn't support Align16 instructions anymore - only Align1 mode.

Bailing early for Align16 is important so that brw_hw_decode_inst
doesn't try to read Align16 related instruction fields on generations
where they no longer exist (which could trigger assertions).

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>
2024-10-25 20:31:44 +00:00
Francisco Jerez
e2eba3c7da intel/brw/xe2+: Adjust performance analysis divergence weight due to EU fusion removal.
This reduces the penalty the heuristic gives to SIMD32 shaders
relative to SIMD16 in presence of discard control flow on Xe2+.  The
penalty was meant to account for the inefficient divergence behavior
of SIMD32 shaders on Gfx12.x platforms, since Gfx12 hardware had EUs
bundled in groups of two, and each pair shared control flow logic so
both EUs could only execute instructions in lockstep, which meant that
SIMD32 shaders had an effective warp size of 64 on Gfx12.x.

This change switches back to more optimistic modelling of discard
divergence.  With it we gain about 6% performance in a Shadow of the
Tomb Raider trace (tested on BMG).

One may wonder if there are still workloads that would suffer
materially from enabling SIMD32 for all pixel shaders on Xe2 instead
of using this heuristic, since Xe2 EUs have twice the GRF space, twice
the FPU throughput and better divergence behavior than Xe, but the
answer seems to be yes unfortunately: E.g. Superposition has some
pixel shaders where SIMD32 has substantially worse scheduling due to
the increased number of false dependencies due to higher register
pressure, and using SIMD32 for them reduces performance significantly.
The heuristic seems to model this correctly so it doesn't look like we
can do without it at least right now on Xe2.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31697>
2024-10-24 22:06:52 +00:00
Kenneth Graunke
7bed11fbde intel/brw: Allow immediates in the BFE instruction on Gfx12+
We weren't allowing immediates in BFE at all.  Gfx12+ supports
immediates in src0 (value) and src2 (width), but not src1 (offset).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31437>
2024-10-24 21:31:28 +00:00
Daniel Schürmann
87cb42f953 treewide: don't lower to LCSSA before calling nir_divergence_analysis()
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>
2024-10-24 10:06:17 +00:00
Daniel Schürmann
c8348139fd nir: change signature of nir_src_is_divergent()
Now, it takes nir_src * instead of nir_src.
Also move the implementation to nir_divergence_analysis.c.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>
2024-10-24 10:06:17 +00:00
Paulo Zanoni
c0bceaf057 brw: don't emit instruction to add zero in spilling code
When the spill_offset is zero, don't emit an instruction that adds
zero.

Results on MTL:

- Shaders:
instructions helped:   shaders/blender/581.shader_test FS SIMD8:         6760 -> 6759 (-0.01%) (scheduled: none)
instructions helped:   shaders/blender/1017.shader_test FS SIMD8:        6760 -> 6759 (-0.01%) (scheduled: none)
instructions helped:   shaders/blender/1045.shader_test FS SIMD8:        6474 -> 6473 (-0.02%) (scheduled: none)
instructions helped:   shaders/blender/723.shader_test FS SIMD8:         6458 -> 6457 (-0.02%) (scheduled: none)
instructions helped:   shaders/blender/1042.shader_test FS SIMD8:        6458 -> 6457 (-0.02%) (scheduled: none)
instructions helped:   shaders/blender/917.shader_test FS SIMD8:         4900 -> 4897 (-0.06%) (scheduled: none)
instructions helped:   shaders/blender/455.shader_test FS SIMD8:         4832 -> 4829 (-0.06%) (scheduled: none)

cycles helped:   shaders/blender/917.shader_test FS SIMD8:         891856 -> 891832 (<.01%) (scheduled: none)
cycles helped:   shaders/blender/455.shader_test FS SIMD8:         894692 -> 894660 (<.01%) (scheduled: none)

total instructions in shared programs: 1596934 -> 1596923 (<.01%)
instructions in affected programs: 42642 -> 42631 (-0.03%)
helped: 7
HURT: 0

- Fossils:

Instrs: 151744378 -> 151741213 (-0.00%)
Cycle count: 16007811131 -> 16007643963 (-0.00%); split: -0.00%, +0.00%

Totals from 1353 (0.21% of 632545) affected shaders:
Instrs: 3925143 -> 3921978 (-0.08%)
Cycle count: 2292838118 -> 2292670950 (-0.01%); split: -0.01%, +0.00%

 RELATIVE IMPROVEMENTS - Instrs                                      Before     After      Delta      Percentage
 mesa/benchmarks/gravity_mark/3e9c48cebaddf012/cs/0                  1947       1941       -6         -0.31%
 mesa/steam-native/red_dead_redemption2/571534e21fb7bd2a/fs.8/0      3431       3421       -10        -0.29%
 mesa/steam-dxvk/batman_arkham_city_goty/d783eacc9ebe324d/fs.8/0     717        715        -2         -0.28%
 mesa/steam-dxvk/batman_arkham_city_goty/14e0878a6a9605c9/fs.8/0     724        722        -2         -0.28%
 mesa/steam-dxvk/batman_arkham_city_goty/d859c2ae858269dc/fs.8/0     744        742        -2         -0.27%
 mesa/steam-dxvk/total_war_warhammer3/18b9d4a3b1961616/vs/0          1539       1535       -4         -0.26%
 mesa/steam-dxvk/total_war_warhammer3/a21827ce57dc0e29/vs/0          1539       1535       -4         -0.26%
(and a bunch of others where the delta is -2, -4 or -6)

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31694>
2024-10-23 20:19:48 +00:00
Sviatoslav Peleshko
ebd6738260 intel/elk/chv: Implement WaClearArfDependenciesBeforeEot
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31746>
2024-10-23 15:02:27 +00:00
Sviatoslav Peleshko
2a4efe21c5 intel/brw/gfx9: Implement WaClearArfDependenciesBeforeEot
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11928
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31746>
2024-10-23 15:02:27 +00:00
Kenneth Graunke
834b919f6a brw: Optimize 16-bit texture fetches later
At the point we were calling this, we hadn't necessarily cleaned up
derefs via nir_lower_vars_to_ssa, nor movs/vecs via copy propagation,
so it wasn't necessarily easy for this pass to see the actual usage of
the destination.

Moving this later allows us to detect f2f32(txf(...)) and avoid
converting it to a 16-bit txf (why convert with ALU instructions
when the sampler could do it for us?).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31750>
2024-10-22 01:15:10 +00:00
Caio Oliveira
019770f026 intel/brw: Add SHADER_OPCODE_VOTE_*
Add opcodes for VOTE_ALL, VOTE_ANY and VOTE_EQUAL.  The first two
are also used for the quad variants.  Move their lowering from
NIR conversion to brw_lower_subgroup_ops.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
2024-10-19 02:44:20 +00:00
Caio Oliveira
f20df2984d intel/brw: Ensure BROADCAST() value respect register alignment
If we have a non-register-aligned source, MOV it to a new register
so that the invariant expected when generating SHADER_OPCODE_BROADCAST
is respected.

Added to ensure a later patch won't hit the `src.subnr == 0` assertion
in brw_broadcast() generation code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
2024-10-19 02:44:20 +00:00
Caio Oliveira
d97381efd8 intel/brw: Add fs_builder::BROADCAST() helper
Include in the helper which already take care of using exec_all() and
taking the first component of the result.  Both are expected by
SHADER_OPCODE_BROADCAST.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31029>
2024-10-19 02:44:20 +00:00
Lionel Landwerlin
608d521086 elk: Don't apply discard_if condition opt if it can change results
Replicates the change from 57344052b6 ("intel/brw: Don't apply
discard_if condition opt if it can change results")

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0ba9497e66 ("intel/fs: Improve discard_if code generation")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31604>
2024-10-18 01:57:58 +00:00
Lionel Landwerlin
97b17aa0b1 brw/nir: rework inline_data_intel to work with compute
This intrinsic was initially dedicated to mesh/task shaders, but the
mechanism it exposes also exists in the compute shaders on Gfx12.5+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Lionel Landwerlin
1dc125338e brw: fix mesh fence emission
In SIMD32, the fence instruction is currently going to read grf0-3
leading to such assertions in the backend :

 ../src/intel/compiler/brw_fs_reg_allocate.cpp:206:
   void fs_visitor::calculate_payload_ranges(bool, unsigned int, int*) const:
     Assertion `j < payload_node_count' failed.

The reason we haven't seen the problem yet is that there always enough
payload register to accomodate this. But the following change is going
to make the inline parameter register optional.

Since SHADER_OPCODE_MEMORY_FENCE is emitted in the generator as SIMD1
NoMask (see brw_memory_fence), we can limit ourselves to SIMD1
exec_all() in the IR as well so that the IR accounts for grf0 as a
source.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Lionel Landwerlin
b2c5ca0ade brw: remove rebuild single element special case
No shader-db difference on DG2.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Lionel Landwerlin
19eb601cfc brw: avoid clashing nested loop indices
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Lionel Landwerlin
f5d123b977 brw: delay printf lowering
Useful to insert debug traces a bit later in the lowering process (in
particular after load/store vectorization).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Lionel Landwerlin
be3f62af15 brw: remove unused prototype
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31508>
2024-10-17 19:35:59 +00:00
Georg Lehmann
cba575f4df nir: always emit ddx intrinsics
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Georg Lehmann
6cb6bc7133 elk: remove alu fddx/fddy check
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31014>
2024-10-17 09:50:19 +00:00
Kenneth Graunke
4cb67cb07a intel/brw: Use whole 512-bit registers in constant combining on Xe2
Xe2 increased the register size from 256-bits to 512-bits.  So we can
store 32 16-bit values in a register, rather than 16 values.  Prior to
this patch, we hadn't updated the pass, so the second half of each of
our registers was unused.

Backport-to: 24.2
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00
Kenneth Graunke
d9e5022650 intel/brw: Delete more Gfx8 code from brw_fs_combine_constants
These platforms are supported by elk, not brw.

Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31499>
2024-10-15 18:14:37 +00:00