Commit graph

197436 commits

Author SHA1 Message Date
Ian Romanick
7aad19ccd2 brw/lower: Lower invalid source conversion to better code
There are two fragment shaders from RDR2 that is hurt for spills and
fills on Lunar Lake.

    Totals from 2 (0.00% of 551413) affected shaders:
    Spill count: 1252 -> 1317 (+5.19%)
    Fill count: 2518 -> 2642 (+4.92%)

Those shaders... have a lot of room for improvement. There are some
patterns in those shaders that we handle very, very poorly. Improving
those patterns would likely improve the spills and fills in these
shaders quite dramatically.

Given how much other platforms are helped, I don't this should block
this commit.

No shader-db or fossil-db changes on any pre-Gfx12.5 Intel platforms.

v2: Add some comments and an additional assertion. Suggested by Ken.

shader-db:

Lunar Lake
total instructions in shared programs: 18094517 -> 18094511 (<.01%)
instructions in affected programs: 809 -> 803 (-0.74%)
helped: 6 / HURT: 0

total cycles in shared programs: 921532158 -> 921532168 (<.01%)
cycles in affected programs: 2266 -> 2276 (0.44%)
helped: 0 / HURT: 3

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19820845 -> 19820839 (<.01%)
instructions in affected programs: 803 -> 797 (-0.75%)
helped: 6 / HURT: 0

total cycles in shared programs: 906372999 -> 906372949 (<.01%)
cycles in affected programs: 3216 -> 3166 (-1.55%)
helped: 6 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 141887377 -> 141884465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21990301498 -> 21990267232 (-0.00%); split: -0.00%, +0.00%
Spill count: 69732 -> 69797 (+0.09%)
Fill count: 128521 -> 128645 (+0.10%)

Totals from 349 (0.06% of 551413) affected shaders:
Instrs: 506117 -> 503205 (-0.58%); split: -0.79%, +0.21%
Cycle count: 32362996 -> 32328730 (-0.11%); split: -0.52%, +0.41%
Spill count: 1951 -> 2016 (+3.33%)
Fill count: 4899 -> 5023 (+2.53%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152773732 -> 152761383 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17187529968 -> 17187450663 (-0.00%); split: -0.00%, +0.00%
Spill count: 79279 -> 79003 (-0.35%)
Fill count: 148803 -> 147942 (-0.58%)
Scratch Memory Size: 3949568 -> 3946496 (-0.08%)
Max live registers: 31879325 -> 31879230 (-0.00%)

Totals from 366 (0.06% of 633185) affected shaders:
Instrs: 557377 -> 545028 (-2.22%); split: -2.22%, +0.01%
Cycle count: 26171205 -> 26091900 (-0.30%); split: -0.54%, +0.24%
Spill count: 3238 -> 2962 (-8.52%)
Fill count: 10018 -> 9157 (-8.59%)
Scratch Memory Size: 257024 -> 253952 (-1.20%)
Max live registers: 28187 -> 28092 (-0.34%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
2a57568ebd brw/build: Add scalar_group() helper
Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is
deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems
different, so I decided to keep it.

The next commit wants to use this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
5dfea87623 brw/opt: Always do both kinds of copy propagation before lower_load_payload
shader-db:

All Intel platforms except Skylake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18092932 -> 18092713 (<.01%)
instructions in affected programs: 139290 -> 139071 (-0.16%)
helped: 103
HURT: 18
helped stats (abs) min: 1 max: 8 x̄: 2.43 x̃: 2
helped stats (rel) min: 0.02% max: 9.09% x̄: 0.73% x̃: 0.29%
HURT stats (abs)   min: 1 max: 5 x̄: 1.72 x̃: 1
HURT stats (rel)   min: 0.02% max: 0.55% x̄: 0.10% x̃: 0.08%
95% mean confidence interval for instructions value: -2.17 -1.45
95% mean confidence interval for instructions %-change: -0.83% -0.38%
Instructions are helped.

total cycles in shared programs: 922792268 -> 921495900 (-0.14%)
cycles in affected programs: 400296984 -> 399000616 (-0.32%)
helped: 765
HURT: 635
helped stats (abs) min: 2 max: 77018 x̄: 6739.33 x̃: 60
helped stats (rel) min: <.01% max: 35.59% x̄: 1.98% x̃: 0.32%
HURT stats (abs)   min: 2 max: 88658 x̄: 6077.51 x̃: 152
HURT stats (rel)   min: <.01% max: 51.33% x̄: 2.75% x̃: 0.63%
95% mean confidence interval for cycles value: -1620.41 -231.54
95% mean confidence interval for cycles %-change: -0.10% 0.44%
Inconclusive result (%-change mean confidence interval includes 0).

LOST:   4
GAINED: 3

Skylake
total instructions in shared programs: 18658324 -> 18579715 (-0.42%)
instructions in affected programs: 2089957 -> 2011348 (-3.76%)
helped: 9842
HURT: 23
helped stats (abs) min: 1 max: 24 x̄: 7.99 x̃: 8
helped stats (rel) min: 0.05% max: 40.00% x̄: 5.37% x̃: 4.52%
HURT stats (abs)   min: 1 max: 5 x̄: 1.57 x̃: 1
HURT stats (rel)   min: 0.02% max: 1.28% x̄: 0.36% x̃: 0.24%
95% mean confidence interval for instructions value: -7.98 -7.95
95% mean confidence interval for instructions %-change: -5.43% -5.29%
Instructions are helped.

total cycles in shared programs: 860031654 -> 860237548 (0.02%)
cycles in affected programs: 449175235 -> 449381129 (0.05%)
helped: 7895
HURT: 4416
helped stats (abs) min: 1 max: 14129 x̄: 113.70 x̃: 22
helped stats (rel) min: <.01% max: 40.95% x̄: 1.31% x̃: 0.56%
HURT stats (abs)   min: 1 max: 33397 x̄: 249.89 x̃: 34
HURT stats (rel)   min: <.01% max: 67.47% x̄: 2.65% x̃: 0.65%
95% mean confidence interval for cycles value: 1.46 31.98
95% mean confidence interval for cycles %-change: 0.02% 0.19%
Cycles are HURT.

LOST:   557
GAINED: 900

fossil-db:

Lunar Lake
Totals:
Instrs: 141933621 -> 141884681 (-0.03%); split: -0.03%, +0.00%
Cycle count: 21990657282 -> 21990200212 (-0.00%); split: -0.14%, +0.14%
Spill count: 69754 -> 69732 (-0.03%); split: -0.05%, +0.02%
Fill count: 128559 -> 128521 (-0.03%); split: -0.05%, +0.02%
Scratch Memory Size: 5934080 -> 5925888 (-0.14%)
Max live registers: 48021653 -> 48051253 (+0.06%); split: -0.00%, +0.06%

Totals from 13510 (2.45% of 551410) affected shaders:
Instrs: 19497180 -> 19448240 (-0.25%); split: -0.25%, +0.00%
Cycle count: 2455370202 -> 2454913132 (-0.02%); split: -1.25%, +1.23%
Spill count: 10975 -> 10953 (-0.20%); split: -0.32%, +0.12%
Fill count: 21709 -> 21671 (-0.18%); split: -0.28%, +0.10%
Scratch Memory Size: 674816 -> 666624 (-1.21%)
Max live registers: 2502653 -> 2532253 (+1.18%); split: -0.01%, +1.19%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152763523 -> 152772716 (+0.01%); split: -0.00%, +0.01%
Cycle count: 17188701887 -> 17187510768 (-0.01%); split: -0.10%, +0.09%
Spill count: 79280 -> 79279 (-0.00%); split: -0.00%, +0.00%
Fill count: 148809 -> 148803 (-0.00%)
Max live registers: 31879240 -> 31879093 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559984 -> 5559712 (-0.00%); split: +0.00%, -0.01%

Totals from 20524 (3.24% of 633183) affected shaders:
Instrs: 20366964 -> 20376157 (+0.05%); split: -0.01%, +0.05%
Cycle count: 2406162382 -> 2404971263 (-0.05%); split: -0.68%, +0.63%
Spill count: 19935 -> 19934 (-0.01%); split: -0.02%, +0.01%
Fill count: 34487 -> 34481 (-0.02%)
Max live registers: 1745598 -> 1745451 (-0.01%); split: -0.01%, +0.01%
Max dispatch width: 117992 -> 117720 (-0.23%); split: +0.03%, -0.26%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150694108 -> 150683859 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15526754059 -> 15529031079 (+0.01%); split: -0.10%, +0.12%
Max live registers: 31791599 -> 31791441 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5569488 -> 5569296 (-0.00%); split: +0.00%, -0.01%

Totals from 15000 (2.37% of 632406) affected shaders:
Instrs: 10965577 -> 10955328 (-0.09%); split: -0.11%, +0.02%
Cycle count: 2025347115 -> 2027624135 (+0.11%); split: -0.80%, +0.91%
Max live registers: 983373 -> 983215 (-0.02%); split: -0.02%, +0.00%
Max dispatch width: 83064 -> 82872 (-0.23%); split: +0.12%, -0.35%

Skylake
Totals:
Instrs: 140588784 -> 140413758 (-0.12%); split: -0.13%, +0.00%
Cycle count: 14724286265 -> 14723402393 (-0.01%); split: -0.04%, +0.04%
Fill count: 100130 -> 100129 (-0.00%)
Max live registers: 31418029 -> 31417146 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5513400 -> 5535192 (+0.40%); split: +0.89%, -0.49%

Totals from 39733 (6.35% of 625986) affected shaders:
Instrs: 17240737 -> 17065711 (-1.02%); split: -1.02%, +0.01%
Cycle count: 1994668203 -> 1993784331 (-0.04%); split: -0.31%, +0.27%
Fill count: 44481 -> 44480 (-0.00%)
Max live registers: 2766781 -> 2765898 (-0.03%); split: -0.03%, +0.00%
Max dispatch width: 210600 -> 232392 (+10.35%); split: +23.23%, -12.89%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
be26012f1d brw/opt: Always do copy prop, DCE, and register coalesce after lower_regioning
shader-db:

Lunar Lake
total instructions in shared programs: 18100289 -> 18083853 (-0.09%)
instructions in affected programs: 790048 -> 773612 (-2.08%)
helped: 3058 / HURT: 1

total cycles in shared programs: 921691992 -> 921293816 (-0.04%)
cycles in affected programs: 37210762 -> 36812586 (-1.07%)
helped: 2329 / HURT: 624

LOST:   27
GAINED: 26

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19825635 -> 19821391 (-0.02%)
instructions in affected programs: 138675 -> 134431 (-3.06%)
helped: 877 / HURT: 0

total cycles in shared programs: 907900598 -> 907885713 (<.01%)
cycles in affected programs: 7127161 -> 7112276 (-0.21%)
helped: 318 / HURT: 242

total spills in shared programs: 5790 -> 5758 (-0.55%)
spills in affected programs: 660 -> 628 (-4.85%)
helped: 8 / HURT: 0

total fills in shared programs: 6744 -> 6712 (-0.47%)
fills in affected programs: 708 -> 676 (-4.52%)
helped: 8 / HURT: 0

LOST:   10
GAINED: 0

Skylake
total instructions in shared programs: 18722197 -> 18637637 (-0.45%)
instructions in affected programs: 2757553 -> 2672993 (-3.07%)
helped: 12290 / HURT: 1

total cycles in shared programs: 859716039 -> 859432560 (-0.03%)
cycles in affected programs: 113731837 -> 113448358 (-0.25%)
helped: 9555 / HURT: 2422

LOST:   265
GAINED: 714

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 142000618 -> 141928331 (-0.05%); split: -0.05%, +0.00%
Subgroup size: 10995136 -> 10995072 (-0.00%)
Cycle count: 21994723230 -> 21990481140 (-0.02%); split: -0.08%, +0.06%
Spill count: 69911 -> 69754 (-0.22%); split: -0.23%, +0.00%
Fill count: 128723 -> 128559 (-0.13%); split: -0.15%, +0.02%
Scratch Memory Size: 5936128 -> 5934080 (-0.03%)
Max live registers: 48006880 -> 48020936 (+0.03%); split: -0.01%, +0.04%

Totals from 17450 (3.16% of 551410) affected shaders:
Instrs: 14984149 -> 14911862 (-0.48%); split: -0.48%, +0.00%
Subgroup size: 365744 -> 365680 (-0.02%)
Cycle count: 2585095128 -> 2580853038 (-0.16%); split: -0.71%, +0.54%
Spill count: 20893 -> 20736 (-0.75%); split: -0.76%, +0.00%
Fill count: 44181 -> 44017 (-0.37%); split: -0.44%, +0.07%
Scratch Memory Size: 995328 -> 993280 (-0.21%)
Max live registers: 2378069 -> 2392125 (+0.59%); split: -0.20%, +0.79%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150719758 -> 150676269 (-0.03%); split: -0.04%, +0.01%
Subgroup size: 7764560 -> 7764632 (+0.00%)
Cycle count: 15526689814 -> 15525687740 (-0.01%); split: -0.03%, +0.02%
Spill count: 60120 -> 59472 (-1.08%); split: -1.17%, +0.10%
Fill count: 105973 -> 104675 (-1.22%); split: -1.40%, +0.17%
Scratch Memory Size: 2396160 -> 2381824 (-0.60%); split: -0.73%, +0.13%
Max live registers: 31782879 -> 31788857 (+0.02%); split: -0.01%, +0.03%
Max dispatch width: 5569200 -> 5569344 (+0.00%); split: +0.00%, -0.00%

Totals from 10089 (1.60% of 632405) affected shaders:
Instrs: 6389866 -> 6346377 (-0.68%); split: -0.87%, +0.19%
Subgroup size: 102912 -> 102984 (+0.07%)
Cycle count: 681310278 -> 680308204 (-0.15%); split: -0.65%, +0.51%
Spill count: 19571 -> 18923 (-3.31%); split: -3.61%, +0.30%
Fill count: 38229 -> 36931 (-3.40%); split: -3.88%, +0.48%
Scratch Memory Size: 808960 -> 794624 (-1.77%); split: -2.15%, +0.38%
Max live registers: 677473 -> 683451 (+0.88%); split: -0.45%, +1.33%
Max dispatch width: 88672 -> 88816 (+0.16%); split: +0.27%, -0.11%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
b2d7a823be brw/lower: Don't emit spurious moves to or from NULL register
Previously an instruction like

    cmp.l.f0.0(16) null:F, v359:F, 0f

would get lowered to

    undef(16) v13703:UD
    cmp.l.f0.0(16) v13703:F, v359:F, 0f
    mov(16) null:UD, v13703:UD

After copy propagation and dead-code elimination are run again, the
original CMP gets turned back into its original form!

Some cases can also emit MOVs from the original NULL register.

It should be possible to not do any lowering here, but there are some
interactions with source lowering passes for things like

    cmp.l.f0.0(16) null:HF, g89.1<16,16,1>:HF, 0hf

What inspired this was... diff'ing step-by-step dumps from
INTEL_DEBUG=optimizer had a lot of useless changes due to these MOVs
and undefs. It was very annoying.  This low-effort change gets the
majority of the possible benefit.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
9aba731d03 brw/cse: Don't eliminate instructions that write flags
With other changes in my tree, I observed this code from
dEQP-VK.subgroups.vote.compute.subgroupallequal_float have the second
cmp.z removed.

    undef(8) %69:UD
    cmp.z.f0.0(8) %69:F, %37:F, %57+0.0<0>:F
    mov(1) v58+0.0:D, 0d NoMask group0
    (+f0.0) mov(1) v58+0.0:D, -1d NoMask group0
    cmp.nz.f0.0(8) null:D, v58+0.0<0>:D, 0d
    ...
    undef(8) %72:UD
    cmp.z.f0.0(8) %72:F, %37:F, %57+0.0<0>:F
    mov(1) v63+0.0:D, 0d NoMask group0
    (+f0.0) mov(1) v63+0.0:D, -1d NoMask group0

This was also fixed by running dead-code elimination before CSE. That
seems more like avoiding the problem than fixing it, though.

I believe this affects shader-db results because leaving the second
CMP in the shader can give more opportunities for cmod propagation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 234c45c929 ("intel/brw: Write a new global CSE pass that works on defs")

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 922097690 -> 922260862 (0.02%)
cycles in affected programs: 3178926 -> 3342098 (5.13%)
helped: 130
HURT: 88
helped stats (abs) min: 2 max: 2194 x̄: 296.71 x̃: 16
helped stats (rel) min: <.01% max: 16.56% x̄: 1.86% x̃: 0.18%
HURT stats (abs)   min: 4 max: 11992 x̄: 2292.55 x̃: 47
HURT stats (rel)   min: 0.04% max: 57.32% x̄: 11.82% x̃: 0.61%
95% mean confidence interval for cycles value: 320.36 1176.63
95% mean confidence interval for cycles %-change: 1.59% 5.73%
Cycles are HURT.

LOST:   2
GAINED: 1

fossil-db:

Lunar Lake, Meteor Lake, Tiger Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 142022960 -> 142022928 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21995242782 -> 21995384040 (+0.00%); split: -0.00%, +0.00%
Max live registers: 48013385 -> 48013343 (-0.00%)

Totals from 507 (0.09% of 551441) affected shaders:
Instrs: 886191 -> 886159 (-0.00%); split: -0.01%, +0.01%
Cycle count: 69302492 -> 69443750 (+0.20%); split: -0.66%, +0.86%
Max live registers: 94413 -> 94371 (-0.04%)

DG2
Totals:
Instrs: 152856370 -> 152856093 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17237159885 -> 17236804052 (-0.00%); split: -0.00%, +0.00%
Fill count: 150673 -> 150631 (-0.03%)
Max live registers: 31871520 -> 31871476 (-0.00%)

Totals from 506 (0.08% of 633197) affected shaders:
Instrs: 831795 -> 831518 (-0.03%); split: -0.04%, +0.01%
Cycle count: 55578509 -> 55222676 (-0.64%); split: -1.38%, +0.74%
Fill count: 2779 -> 2737 (-1.51%)
Max live registers: 51383 -> 51339 (-0.09%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 152017826 -> 152017793 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15180773451 -> 15180761166 (-0.00%); split: -0.00%, +0.00%
Fill count: 106610 -> 106614 (+0.00%)
Max live registers: 32195006 -> 32194966 (-0.00%)

Totals from 411 (0.06% of 637268) affected shaders:
Instrs: 705935 -> 705902 (-0.00%); split: -0.01%, +0.01%
Cycle count: 47830019 -> 47817734 (-0.03%); split: -0.05%, +0.02%
Fill count: 2865 -> 2869 (+0.14%)
Max live registers: 42883 -> 42843 (-0.09%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
80a5d158ae brw/copy: Don't copy propagate through smaller entry dest size
Copy propagation would incorrectly occur in this code

    mov(16) v4+2.0:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0

to create

    mov(16) v4+2.0:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, u0<0>:UD NoMask group0

This has different behavior. I think I just made a mistake when I
changed this condition in e3f502e007.

It seems like this condition could be relaxed to cover cases like (note
the change of destination stride)

    mov(16) v4+2.0<2>:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0

I'm not sure it's worth it.

No shader-db or fossil-db changes on any Intel platform. Even the code
for the test case mentioned in the original commit did not change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e3f502e007 ("intel/fs: Allow copy propagation between MOVs of mixed sizes")
Closes: #12116
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Samuel Pitoiset
ced2404cb4 vulkan/runtime: return same cmdbuf level from the command pool freelist
This fixes a performance issue on RADV because secondaries are
allocated in GTT instead of VRAM for primaries.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12119
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32010>
2024-11-08 17:20:43 +00:00
Ian Romanick
c1c09e3c4a brw/emit: Add correct 3-source instruction assertions for each platform
Specifically, allow two immediate sources for BFE on Gfx12+. I stumbled
on this while trying some stuff with !31852.

v2: Don't be lazy. Add proper assertions for all the things on all the
platforms. Based on a suggestion by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7bed11fbde ("intel/brw: Allow immediates in the BFE instruction on Gfx12+")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31858>
2024-11-08 16:48:57 +00:00
Gurchetan Singh
aebc6c974f gfxstream: use vulkan_lite_runtime
This results in faster compiles.

Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32015>
2024-11-08 08:10:09 -08:00
Gurchetan Singh
dd5244e6ac gfxstream: nuke android::base::SubAllocator
Use u_mm, one less dependency on libaemu v0.1.2

Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32015>
2024-11-08 08:10:05 -08:00
Gurchetan Singh
6a9eb986c2 gfxstream: move isHostVisible function
It's separable from the rest of CoherentMemory class.

Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32015>
2024-11-08 08:10:00 -08:00
Gurchetan Singh
5d299a0bd4 util: add c++ guards to u_mm.h
Needed for gfxstream.

Reviewed-by: Marcin Radomski <dextero@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32015>
2024-11-08 08:09:49 -08:00
Hans-Kristian Arntzen
5f70858ece vulkan/wsi/wayland: Use X11-style image count strategy when using FIFO.
This is required, otherwise we regress latency in cases where
applications are using FIFO without explicit KHR_present_wait.
This is an unacceptable regression.

The fix is to normalize the behavior to X11 WSI.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes: d052b0201e ("vulkan/wsi/wayland: Use fifo protocol for FIFO")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32029>
2024-11-08 14:28:08 +00:00
Samuel Pitoiset
437bd63265 radv,aco: dump m0 and exec from the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:15 +00:00
Samuel Pitoiset
d1d41be43f aco: declare phys regs for tba_hi/tma_hi
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:15 +00:00
Samuel Pitoiset
13bab450a2 aco: fix storing SQ_WAVE_STATUS in the trap handler shader
SQ_WAVE_STATUS can change inside the trap because of SCC.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
494050d2ea aco: add a helper to dump SGPR to memory for the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
8c6f2fef1b aco: use scalar buffer stores for dumping SGPRS from the trap on GFX8
This avoids using any VGPRs on GFX8.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
17f6b4e51e aco: save/restore SCC in the trap handler shader
SCC is only updated on GFX9+ but let's do it by default because the
trap handler shader is likely going to be more and more complex over
time.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
7b4386facd aco: cleanup using fixed registers in the trap handler shader
It's easier to read and potentially less error prone.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Pierre-Eric Pelloux-Prayer
9c3ac69568 ac/perfcounter: fix buffer overflow
If block->b->selectors is larger than 999, "+ 4" is not enough.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
2024-11-08 13:31:02 +00:00
Pierre-Eric Pelloux-Prayer
8467f57e30 radeonsi/tests: update expected results
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
2024-11-08 13:31:02 +00:00
Pierre-Eric Pelloux-Prayer
cce45dc0bf ac: switch AMD_FORCE_FAMILY handling to using ac_fake_hw_db
ac_fake_hw_db can be the single place where radeon_info content
is emulated when overriding the GPU type.

For some fields we need to avoid overriding them with the value
coming from the ioctls to get the correct behavior.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
2024-11-08 13:31:02 +00:00
Pierre-Eric Pelloux-Prayer
c097c37455 ac: add 'polaris12' gpu to ac_fake_hw_db
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
2024-11-08 13:31:02 +00:00
Pierre-Eric Pelloux-Prayer
1c31cec31e ac: rename ac_surface_test_common -> ac_fake_hw_db
The next commit will reuse the radeon_info when AMD_FORCE_FAMILY
is used.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
2024-11-08 13:31:02 +00:00
Pierre-Eric Pelloux-Prayer
2ff67083e5 radeonsi: refuse to import texture with family_overriden being set
If the gfx version is overriden by the exporter process, the surface
layout might not be compatible with the importer process (which uses
the real gfx version).

So fail early, except if the layout is LINEAR (because it should work
on all gen) or a modifier is used (which should be rejected elsewhere
if not supported).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
2024-11-08 13:31:02 +00:00
Pierre-Eric Pelloux-Prayer
9d0aba1f97 ac/surface: add flags to surface metadata
Instead of increasing the version number to describe which fields
are set, use the lower 16 bits for the metadata format version,
and the other bits as flags.
This way the version number defines the layout, and the flags
tell which values are set.

The format version is bumped to 3 (= can have flags), and 2 flags
are defined:
* AC_SURF_METADATA_FLAG_EXTRA_MD_BIT: replaces what was version
  number = 2. This means the metadata contains extra information
  for tools.
* AC_SURF_METADATA_FLAG_FAMILY_OVERRIDEN_BIT: if set, it means the
  surface was allocated from a context that used an overriden gfx
  family. This allows the importer process to fail the import early,
  as the surface is likely to be invalid.
  It also adds an extra dw at the end, to store the fake family.

This is a breaking change for existing code that interpreted
"version > 1" as 2, but only in one case:
AC_SURF_METADATA_FLAG_FAMILY_OVERRIDEN_BIT being set, but not
AC_SURF_METADATA_FLAG_EXTRA_MD_BIT, which produces a version number
of 0x20001 but there's not extra data.
I think this is ok, since both gfx family overriding and extra_md
are debugging tools.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
2024-11-08 13:31:02 +00:00
Pierre-Eric Pelloux-Prayer
acc32cadf5 radv: set info->family_overridden when RADV_FORCE_FAMILY is used
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
2024-11-08 13:31:02 +00:00
Karol Herbst
3154920c36 gallium: drop PIPE_SHADER_IR_NIR_SERIALIZED
It's not used anymore

Acked-by: David Heidelberg <david@ixit.cz>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27783>
2024-11-08 12:49:23 +00:00
Karol Herbst
80c4ffb61a clover: drop support for nir drivers
People had enough time to migrate to rusticl, also nobody would support
this anyway anymore.

Acked-by: David Heidelberg <david@ixit.cz>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27783>
2024-11-08 12:49:23 +00:00
Karol Herbst
277925471e nvc0: return NULL instead of asserting in nvc0_resource_from_user_memory
Fixes: 212f1ab40e ("nvc0: support PIPE_CAP_RESOURCE_FROM_USER_MEMORY_COMPUTE_ONLY")
Acked-by: David Heidelberg <david@ixit.cz>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27783>
2024-11-08 12:49:23 +00:00
Corentin Noël
89d709a43e virgl: Propagate the GL_MAX_stage_SHADER_STORAGE_BLOCKS for each stage
Some hardware have a higher number in the computer stage than in others, let's
simply propagate everything when possible.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12003

Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31666>
2024-11-08 12:26:06 +00:00
Collabora's Gfx CI Team
85d25cc5c8 Uprev Piglit to eebe1b555f51dbb702f696d08ad5ae8153bcdcdd
c2b3133392...eebe1b555f

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32020>
2024-11-08 11:21:05 +00:00
David Rosca
79b12001fd radeonsi/vcn: Stop clearing decode internal buffers
FW will clear them if needed.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31677>
2024-11-08 09:48:54 +00:00
David Rosca
1f00dfd1a7 radeonsi: Support PIPE_VIDEO_CAP_SKIP_CLEAR_SURFACE
Starting with .59 amdgpu now clears VRAM on allocation, so we don't
need to clear video buffers which are always allocated in VRAM.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31677>
2024-11-08 09:48:54 +00:00
David Rosca
b4b74617ae frontends/vdpau: Support skip clear on surface creation
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31677>
2024-11-08 09:48:54 +00:00
David Rosca
5df9097c95 frontends/va: Support skip clear on surface creation
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31677>
2024-11-08 09:48:54 +00:00
David Rosca
76df53f59b gallium: Add PIPE_VIDEO_CAP_SKIP_CLEAR_SURFACE
Used to skip calling clear_render_target when creating surface.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31677>
2024-11-08 09:48:54 +00:00
Karol Herbst
47a1565c3d nv/codegen: Do not use a zero immediate for tex instructions
They aren't always legal for tex instructions, specifically for TXQ when
an actual source is needed.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11999
Fixes: 85a31fa1fc ("nv50/ir/nir: fix txq emission on MS textures")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32043>
2024-11-08 09:18:54 +00:00
David Rosca
2c3dd2a37d frontends/va: Add minus_1 to AV1 render_width/height
Rename to match the spec and to match the actual value.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31977>
2024-11-08 08:39:49 +00:00
David Rosca
7f2624e6ae radeonsi/vcn: Fix coding AV1 render size
This is only header metadata hint, so it should be passed directly
from packed headers to output. Also fix the value as render_width from
frontend is actually render_width_minus_1 (and same for height).

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31977>
2024-11-08 08:39:49 +00:00
Eric Engestrom
4ad8a5443b ci/build: add workaround for incorrect maybe-uninitialized error
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31890>
2024-11-08 07:09:15 +00:00
Eric Engestrom
f09ae95c10 ci/build: drop "verify after bump to F39" as that did not help
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31890>
2024-11-08 07:09:15 +00:00
Eric Engestrom
45e1ffeceb ci: upgrade the fedora image from 38 to 41
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31890>
2024-11-08 07:09:15 +00:00
Lionel Landwerlin
3ecf2a0518 anv: fix extent computation in image->image host copies
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0317c44872 ("anv: add VK_EXT_host_image_copy support")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32027>
2024-11-07 22:44:41 +00:00
Eric Engestrom
625ad5bc52 freedreno/ci: add more flakes seen recently
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32045>
2024-11-07 21:49:29 +01:00
Eric Engestrom
a1b309a177 broadcom/ci: add more flakes seen recently
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32045>
2024-11-07 21:49:29 +01:00
Eric Engestrom
e83613d906 radv/ci: add more flakes seen recently
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32045>
2024-11-07 21:49:29 +01:00
Eric Engestrom
9229bcaf13 radeonsi/ci: add more flakes seen recently
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32045>
2024-11-07 21:49:29 +01:00