Commit graph

13129 commits

Author SHA1 Message Date
Rhys Perry
8d5bd3ca48 aco: check logical_phi_info at p_logical_end when eliminating exec writes
This is when the copies actually happen, not at the branch.

fossil-db (gfx1100):
Totals from 1 (0.00% of 79332) affected shaders:
Instrs: 424 -> 423 (-0.24%)
CodeSize: 2172 -> 2168 (-0.18%)
Latency: 2899 -> 2896 (-0.10%)
Copies: 24 -> 23 (-4.17%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25244>
2023-09-18 11:19:28 +00:00
Rhys Perry
ac48334ecd aco/optimizer_postRA: check overwritten_subdword in is_overwritten_since()
Fixes crash for
dEQP-VK.mesh_shader.ext.in_out.with_f16.permutation_0.mesh_only and
similar tests on GFX11.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 3d29779a25 ("aco/optimizer_postRA: Distinguish overwritten untrackable and subdword.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25223>
2023-09-18 10:26:16 +00:00
Samuel Pitoiset
fa4d4f84a1 ac/spm: enable support for multi-instance counters
This is what RGP expects and this will give us more fine grained
results given that all shader engines/shader arrays etc would be
sampled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:32 +00:00
Samuel Pitoiset
414783162a ac/spm: move the counter instance to ac_spm_counter_create_info
This will allow us to configure multi-instance counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:32 +00:00
Samuel Pitoiset
d5a5473185 ac,radv,radeonsi: prepare support for multi-instance SPM generic counters
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
ed0d3d8cbd ac,radv,radeonsi: prepare support for multi-instance SPM SQ counters
Each SQG modules can configure up to 16 counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
af4d4f5203 ac/spm: fix number of instances of GL2C
It's a global block, so the number of instances is equal.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
0e4d5b171a radv,radeonsi: make sure to emit GRBM_GFX_INDEX before SQ select registers
This was missing, but not sure if it was required given that only the
first SE is currently sampled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
3e8922d9f7 ac/spm: select correct segment type for per-SE blocks
This currently does nothing because only the first instance is used,
but this will be needed for multi-instance.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
033e49995d ac/spm: use block flags to initialize instance mapping
This simplify this a bit, ideally we would also have a per-SA flag
for performance counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
037d7d0f5b radv: reserve more CS space in SQTT/SPM paths
This will prevent an assertion when SPM will emit more counters.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240>
2023-09-18 07:07:31 +00:00
Konstantin Seurer
73fec95358 radv: Remove ray tracing shader module identifier skips
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25142>
2023-09-14 16:07:46 +00:00
Konstantin Seurer
28dcc5959d radv/rt: Handle stages without nir properly
Fixes: e039e3cd76 ('radv/rt: Store NIR shaders separately')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25142>
2023-09-14 16:07:46 +00:00
Konstantin Seurer
3fd9894e3a radv: Update navi21 llvm fails
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010>
2023-09-14 15:39:39 +00:00
Konstantin Seurer
77bf1408f3 radv: Don't advertise features requiring PS epilogs with LLVM
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010>
2023-09-14 15:39:39 +00:00
Konstantin Seurer
4c168635f8 ac/llvm: Use float types for float atomics
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010>
2023-09-14 15:39:39 +00:00
Konstantin Seurer
60e7b1c69c ac/llvm: Use the correct return type for uadd_carry and usub_borrow
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010>
2023-09-14 15:39:39 +00:00
Konstantin Seurer
3ae0562c23 ac/llvm: Fix typed loads with 16bit formats
For some reason, LLVM can't handle those. Emit a 32bit load and type
conversion instead,

Fixes: 22ca8c8 ("ac/llvm: Implement typed buffer load intrinsic.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010>
2023-09-14 15:39:38 +00:00
Konstantin Seurer
0cada27826 radv/ci: Improve ray tracing skips
I didn't know they were regexes. This also excludes all "1048576" tests.
They build an acceleration structure with 1 primitive 1048576 times
which only warms up the Valve farm and doesn't accomplish anything else.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24720>
2023-09-14 15:12:44 +00:00
Konstantin Seurer
97b1caf9f6 radv: Perform multiple sorts in parallel
This was the last part that didn't scale with multiple infos. Reducing
the amount of barriers in this case improves DOOM Eternal performance by
50%. (Running with low resolution)

Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24720>
2023-09-14 15:12:44 +00:00
Konstantin Seurer
44c47054bc radv/radix_sort: Vendor the radix sort dispatch code
This needs to be done so we can optimize it for occpuancy when building
multiple acceleration structures in parallel. Changes to the original
code:

- Change // to /* */
- clang-format
- Replace vkCmd calls with calls to the driver entrypoints
- Add a light weight info struct
- Use radv_fill_buffer directly

Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24720>
2023-09-14 15:12:44 +00:00
Konstantin Seurer
1cacc64ea7 radv: Remove dead radix_sort_vk_get_memory_requirements call
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24720>
2023-09-14 15:12:43 +00:00
Samuel Pitoiset
84390c5c98 ac/spm: initialize and set instance mapping for counters
This configures global, per-SE and per-SA counters with different
indexes. This is still unused because only for the first instance is
used by RADV/RadeonSI, but this will be changed.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:19 +00:00
Samuel Pitoiset
0864a7dfa9 ac/spm: rework how segment muxsel RAM are filled
This is more close to PAL and it will be easier to add GFX11 support
on top of it.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:19 +00:00
Samuel Pitoiset
6ae64900e2 ac/spm: fix checking if the counter instance is valid
This should be compared against the number of global instances, and
there is also an off-by-one error.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:19 +00:00
Samuel Pitoiset
90d9406436 ac/perfcounter: compute the number of global instances of TCP,SQ,GL1C and GL2C
This will be used by SPM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:19 +00:00
Samuel Pitoiset
60cb257d26 ac/perfcounter: set the number of instances of GL1C to 4
According to PAL there is 4 GL1C quadrants. This will also be used
by SPM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
10dc97b20f ac/gpu_info: init num_cu_per_sh from the kernel
This will be used to configure the number of instances of TCP.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
9552716208 ac/spm: add SPM block definition for GFX10-GFX10.3
Instead of using magic values.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
b1ce30539b ac/spm: remove useless SPM block setting for GFX9 and older GPUs
SPM is only implemented for GFX10+ on RADV/RadeonSI, although it's
technically possible on GFX9 but unused by RGP, so don't care.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
303184e4e5 radv,radeonsi: use AC_SPM_SEGMENT_TYPE_xxx instead of magic values
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
db6e16a515 radv: enable the PKT3 CAM bit for some SPM register writes
PAL does that.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211>
2023-09-14 14:17:18 +00:00
David Rosca
d57241d290 radeonsi/vcn: Set H264/HEVC chroma sample location in bitstream
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25078>
2023-09-14 13:39:59 +00:00
Samuel Pitoiset
aca2adc36c ac/spm: add SPM counters configuration for GFX11
All SQ counters changed to SQ_WGP and the L2 miss changed too.
Sourced from PAL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25175>
2023-09-14 12:30:52 +00:00
Samuel Pitoiset
42d67183e7 ac/perfcounter: add new SQ_WGP block for GFX11+
According to PAL, these SQ counters can be sampled at WGP granularity.
Some SPM counters captured for RGP are using this GPU block.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25175>
2023-09-14 12:30:52 +00:00
Samuel Pitoiset
31e6c05527 ac,radv,radeonsi: rework SPM counters configuration and share it
This should be easier to add GFX11 support.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25175>
2023-09-14 12:30:52 +00:00
Daniel Schürmann
6eaf416f35 aco/insert_exec_mask: Simplify WQM handling (2/2)
by calculating WQM requirements on demand.

No fossil-db changes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:23 +00:00
Daniel Schürmann
5f66723188 aco/insert_exec_mask: Simplify WQM handling (1/2)
by using p_end_wqm as indicator for when to end WQM mode.

Totals from 10049 (13.12% of 76572) affected shaders: (GFX11)

MaxWaves: 301126 -> 301136 (+0.00%)
Instrs: 7061909 -> 7049272 (-0.18%); split: -0.21%, +0.03%
CodeSize: 37720684 -> 37664244 (-0.15%); split: -0.18%, +0.03%
VGPRs: 357204 -> 357180 (-0.01%); split: -0.13%, +0.12%
Latency: 62757830 -> 62827080 (+0.11%); split: -0.06%, +0.17%
InvThroughput: 8589248 -> 8589963 (+0.01%); split: -0.02%, +0.02%
VClause: 132541 -> 132547 (+0.00%); split: -0.03%, +0.03%
SClause: 322916 -> 322964 (+0.01%); split: -0.04%, +0.05%
Copies: 546446 -> 547657 (+0.22%); split: -0.13%, +0.35%
Branches: 189527 -> 188293 (-0.65%)
PreSGPRs: 332792 -> 332529 (-0.08%); split: -0.08%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:23 +00:00
Daniel Schürmann
45f6d38a76 aco: insert a single p_end_wqm after the last derivative calculation
This new instruction replaces p_wqm.

Totals from 28065 (36.65% of 76572) affected shaders: (GFX11)
MaxWaves: 823922 -> 823952 (+0.00%); split: +0.01%, -0.01%
Instrs: 22221375 -> 22180465 (-0.18%); split: -0.26%, +0.08%
CodeSize: 117310676 -> 117040684 (-0.23%); split: -0.30%, +0.07%
VGPRs: 1183476 -> 1186656 (+0.27%); split: -0.19%, +0.46%
SpillSGPRs: 2305 -> 2302 (-0.13%)
Latency: 176559310 -> 176427793 (-0.07%); split: -0.21%, +0.14%
InvThroughput: 26245204 -> 26195550 (-0.19%); split: -0.26%, +0.07%
VClause: 368058 -> 369460 (+0.38%); split: -0.21%, +0.59%
SClause: 857077 -> 842588 (-1.69%); split: -2.06%, +0.37%
Copies: 1245650 -> 1249434 (+0.30%); split: -0.33%, +0.63%
Branches: 394837 -> 396070 (+0.31%); split: -0.01%, +0.32%
PreSGPRs: 1019139 -> 1019567 (+0.04%); split: -0.02%, +0.06%
PreVGPRs: 925739 -> 931860 (+0.66%); split: -0.00%, +0.66%

Changes are due to scheduling and re-enabling cross-lane optimizations.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:23 +00:00
Daniel Schürmann
28904839da aco: don't insert a copy when emitting p_wqm
Totals from 351 (0.46% of 76572) affected shaders: (GFX11)

Instrs: 709202 -> 709600 (+0.06%); split: -0.02%, +0.08%
CodeSize: 3606364 -> 3608040 (+0.05%); split: -0.01%, +0.06%
Latency: 3589841 -> 3590756 (+0.03%); split: -0.01%, +0.03%
InvThroughput: 463303 -> 463324 (+0.00%)
SClause: 28147 -> 28201 (+0.19%); split: -0.02%, +0.22%
Copies: 43243 -> 43204 (-0.09%); split: -0.24%, +0.15%
PreSGPRs: 21028 -> 21042 (+0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Daniel Schürmann
040142684c aco: make p_wqm a marker instruction without Operands/Definitions
Totals from 28277 (36.93% of 76572) affected shaders: (GFX11)

MaxWaves: 833930 -> 833898 (-0.00%); split: +0.01%, -0.01%
Instrs: 21366950 -> 21353346 (-0.06%); split: -0.11%, +0.05%
CodeSize: 112855368 -> 112610508 (-0.22%); split: -0.24%, +0.03%
VGPRs: 1157748 -> 1158540 (+0.07%); split: -0.10%, +0.17%
SpillSGPRs: 2465 -> 2463 (-0.08%); split: -0.16%, +0.08%
Latency: 168339886 -> 168383646 (+0.03%); split: -0.10%, +0.12%
InvThroughput: 25164895 -> 25158376 (-0.03%); split: -0.08%, +0.06%
VClause: 347660 -> 346256 (-0.40%); split: -0.55%, +0.15%
SClause: 794460 -> 799521 (+0.64%); split: -0.33%, +0.97%
Copies: 1151908 -> 1148370 (-0.31%); split: -0.54%, +0.23%
Branches: 359447 -> 359437 (-0.00%); split: -0.01%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Daniel Schürmann
1275981df8 aco: don't optimize cross-lane instructions across p_wqm
We will use p_wqm as a marker in the next step.

Totals from 8846 (11.55% of 76572) affected shaders: (GFX11)

Instrs: 7031274 -> 7072729 (+0.59%); split: -0.02%, +0.60%
CodeSize: 37060272 -> 37355244 (+0.80%); split: -0.01%, +0.80%
VGPRs: 402660 -> 398724 (-0.98%); split: -0.99%, +0.01%
Latency: 62231926 -> 62322311 (+0.15%); split: -0.01%, +0.15%
InvThroughput: 10341361 -> 10392589 (+0.50%); split: -0.00%, +0.50%
VClause: 105344 -> 105368 (+0.02%); split: -0.03%, +0.05%
SClause: 218330 -> 218469 (+0.06%); split: -0.07%, +0.14%
Copies: 378609 -> 377644 (-0.25%); split: -0.42%, +0.17%
Branches: 97218 -> 97207 (-0.01%); split: -0.01%, +0.00%
PreSGPRs: 307654 -> 307644 (-0.00%); split: -0.08%, +0.08%
PreVGPRs: 314744 -> 308650 (-1.94%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Daniel Schürmann
0907b53740 aco/insert_exec_mask: set Exact mode after p_discard_if when necessary
Fixes: 5e9df85b1a ('aco: optimize discard_if when WQM is not needed afterwards')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Rhys Perry
41b6020ff3 aco: remove fast path in insert_exec_mask's process_instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Daniel Schürmann
0e8192a76b aco: append p_logical_end after monolithic RT shaders
Fixes: bdec044c88 ('aco: Do not fixup registers if there are no shader calls')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Dave Airlie
c5fb2fff18 ac,radeonsi: move vcn enc av1 default cdf file to common
This can be used by radv.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25196>
2023-09-14 07:51:24 +00:00
Dave Airlie
daa01703cc ac,radeonsi: move vcn enc structs to common
This just moves the header to make it easier to share with radv.

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25196>
2023-09-14 07:51:24 +00:00
Samuel Pitoiset
f8a7c8edd1 radv: emit relocation for mesh/task shaders
RGP requires shaders to be contiguous in memory.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25144>
2023-09-14 07:19:26 +00:00
Samuel Pitoiset
312103e0ff radv: set THREAD_TRACE_MARKER_ENABLE for mesh/task draws
PAL does that.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25144>
2023-09-14 07:19:25 +00:00
Samuel Pitoiset
505c2ee92d ac/rgp: use correct API stage string for mesh/task shaders
This allows RGP to report the ISA.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25144>
2023-09-14 07:19:25 +00:00