Rhys Perry
8d5bd3ca48
aco: check logical_phi_info at p_logical_end when eliminating exec writes
...
This is when the copies actually happen, not at the branch.
fossil-db (gfx1100):
Totals from 1 (0.00% of 79332) affected shaders:
Instrs: 424 -> 423 (-0.24%)
CodeSize: 2172 -> 2168 (-0.18%)
Latency: 2899 -> 2896 (-0.10%)
Copies: 24 -> 23 (-4.17%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25244 >
2023-09-18 11:19:28 +00:00
Rhys Perry
ac48334ecd
aco/optimizer_postRA: check overwritten_subdword in is_overwritten_since()
...
Fixes crash for
dEQP-VK.mesh_shader.ext.in_out.with_f16.permutation_0.mesh_only and
similar tests on GFX11.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 3d29779a25 ("aco/optimizer_postRA: Distinguish overwritten untrackable and subdword.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25223 >
2023-09-18 10:26:16 +00:00
Samuel Pitoiset
fa4d4f84a1
ac/spm: enable support for multi-instance counters
...
This is what RGP expects and this will give us more fine grained
results given that all shader engines/shader arrays etc would be
sampled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:32 +00:00
Samuel Pitoiset
414783162a
ac/spm: move the counter instance to ac_spm_counter_create_info
...
This will allow us to configure multi-instance counters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:32 +00:00
Samuel Pitoiset
d5a5473185
ac,radv,radeonsi: prepare support for multi-instance SPM generic counters
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
ed0d3d8cbd
ac,radv,radeonsi: prepare support for multi-instance SPM SQ counters
...
Each SQG modules can configure up to 16 counters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
af4d4f5203
ac/spm: fix number of instances of GL2C
...
It's a global block, so the number of instances is equal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
0e4d5b171a
radv,radeonsi: make sure to emit GRBM_GFX_INDEX before SQ select registers
...
This was missing, but not sure if it was required given that only the
first SE is currently sampled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
3e8922d9f7
ac/spm: select correct segment type for per-SE blocks
...
This currently does nothing because only the first instance is used,
but this will be needed for multi-instance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
033e49995d
ac/spm: use block flags to initialize instance mapping
...
This simplify this a bit, ideally we would also have a per-SA flag
for performance counters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:31 +00:00
Samuel Pitoiset
037d7d0f5b
radv: reserve more CS space in SQTT/SPM paths
...
This will prevent an assertion when SPM will emit more counters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25240 >
2023-09-18 07:07:31 +00:00
Konstantin Seurer
73fec95358
radv: Remove ray tracing shader module identifier skips
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25142 >
2023-09-14 16:07:46 +00:00
Konstantin Seurer
28dcc5959d
radv/rt: Handle stages without nir properly
...
Fixes: e039e3cd76 ('radv/rt: Store NIR shaders separately')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25142 >
2023-09-14 16:07:46 +00:00
Konstantin Seurer
3fd9894e3a
radv: Update navi21 llvm fails
...
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010 >
2023-09-14 15:39:39 +00:00
Konstantin Seurer
77bf1408f3
radv: Don't advertise features requiring PS epilogs with LLVM
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010 >
2023-09-14 15:39:39 +00:00
Konstantin Seurer
4c168635f8
ac/llvm: Use float types for float atomics
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010 >
2023-09-14 15:39:39 +00:00
Konstantin Seurer
60e7b1c69c
ac/llvm: Use the correct return type for uadd_carry and usub_borrow
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010 >
2023-09-14 15:39:39 +00:00
Konstantin Seurer
3ae0562c23
ac/llvm: Fix typed loads with 16bit formats
...
For some reason, LLVM can't handle those. Emit a 32bit load and type
conversion instead,
Fixes: 22ca8c8 ("ac/llvm: Implement typed buffer load intrinsic.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25010 >
2023-09-14 15:39:38 +00:00
Konstantin Seurer
0cada27826
radv/ci: Improve ray tracing skips
...
I didn't know they were regexes. This also excludes all "1048576" tests.
They build an acceleration structure with 1 primitive 1048576 times
which only warms up the Valve farm and doesn't accomplish anything else.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24720 >
2023-09-14 15:12:44 +00:00
Konstantin Seurer
97b1caf9f6
radv: Perform multiple sorts in parallel
...
This was the last part that didn't scale with multiple infos. Reducing
the amount of barriers in this case improves DOOM Eternal performance by
50%. (Running with low resolution)
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24720 >
2023-09-14 15:12:44 +00:00
Konstantin Seurer
44c47054bc
radv/radix_sort: Vendor the radix sort dispatch code
...
This needs to be done so we can optimize it for occpuancy when building
multiple acceleration structures in parallel. Changes to the original
code:
- Change // to /* */
- clang-format
- Replace vkCmd calls with calls to the driver entrypoints
- Add a light weight info struct
- Use radv_fill_buffer directly
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24720 >
2023-09-14 15:12:44 +00:00
Konstantin Seurer
1cacc64ea7
radv: Remove dead radix_sort_vk_get_memory_requirements call
...
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24720 >
2023-09-14 15:12:43 +00:00
Samuel Pitoiset
84390c5c98
ac/spm: initialize and set instance mapping for counters
...
This configures global, per-SE and per-SA counters with different
indexes. This is still unused because only for the first instance is
used by RADV/RadeonSI, but this will be changed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:19 +00:00
Samuel Pitoiset
0864a7dfa9
ac/spm: rework how segment muxsel RAM are filled
...
This is more close to PAL and it will be easier to add GFX11 support
on top of it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:19 +00:00
Samuel Pitoiset
6ae64900e2
ac/spm: fix checking if the counter instance is valid
...
This should be compared against the number of global instances, and
there is also an off-by-one error.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:19 +00:00
Samuel Pitoiset
90d9406436
ac/perfcounter: compute the number of global instances of TCP,SQ,GL1C and GL2C
...
This will be used by SPM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:19 +00:00
Samuel Pitoiset
60cb257d26
ac/perfcounter: set the number of instances of GL1C to 4
...
According to PAL there is 4 GL1C quadrants. This will also be used
by SPM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
10dc97b20f
ac/gpu_info: init num_cu_per_sh from the kernel
...
This will be used to configure the number of instances of TCP.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
9552716208
ac/spm: add SPM block definition for GFX10-GFX10.3
...
Instead of using magic values.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
b1ce30539b
ac/spm: remove useless SPM block setting for GFX9 and older GPUs
...
SPM is only implemented for GFX10+ on RADV/RadeonSI, although it's
technically possible on GFX9 but unused by RGP, so don't care.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
303184e4e5
radv,radeonsi: use AC_SPM_SEGMENT_TYPE_xxx instead of magic values
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:18 +00:00
Samuel Pitoiset
db6e16a515
radv: enable the PKT3 CAM bit for some SPM register writes
...
PAL does that.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25211 >
2023-09-14 14:17:18 +00:00
David Rosca
d57241d290
radeonsi/vcn: Set H264/HEVC chroma sample location in bitstream
...
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25078 >
2023-09-14 13:39:59 +00:00
Samuel Pitoiset
aca2adc36c
ac/spm: add SPM counters configuration for GFX11
...
All SQ counters changed to SQ_WGP and the L2 miss changed too.
Sourced from PAL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25175 >
2023-09-14 12:30:52 +00:00
Samuel Pitoiset
42d67183e7
ac/perfcounter: add new SQ_WGP block for GFX11+
...
According to PAL, these SQ counters can be sampled at WGP granularity.
Some SPM counters captured for RGP are using this GPU block.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25175 >
2023-09-14 12:30:52 +00:00
Samuel Pitoiset
31e6c05527
ac,radv,radeonsi: rework SPM counters configuration and share it
...
This should be easier to add GFX11 support.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25175 >
2023-09-14 12:30:52 +00:00
Daniel Schürmann
6eaf416f35
aco/insert_exec_mask: Simplify WQM handling (2/2)
...
by calculating WQM requirements on demand.
No fossil-db changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:23 +00:00
Daniel Schürmann
5f66723188
aco/insert_exec_mask: Simplify WQM handling (1/2)
...
by using p_end_wqm as indicator for when to end WQM mode.
Totals from 10049 (13.12% of 76572) affected shaders: (GFX11)
MaxWaves: 301126 -> 301136 (+0.00%)
Instrs: 7061909 -> 7049272 (-0.18%); split: -0.21%, +0.03%
CodeSize: 37720684 -> 37664244 (-0.15%); split: -0.18%, +0.03%
VGPRs: 357204 -> 357180 (-0.01%); split: -0.13%, +0.12%
Latency: 62757830 -> 62827080 (+0.11%); split: -0.06%, +0.17%
InvThroughput: 8589248 -> 8589963 (+0.01%); split: -0.02%, +0.02%
VClause: 132541 -> 132547 (+0.00%); split: -0.03%, +0.03%
SClause: 322916 -> 322964 (+0.01%); split: -0.04%, +0.05%
Copies: 546446 -> 547657 (+0.22%); split: -0.13%, +0.35%
Branches: 189527 -> 188293 (-0.65%)
PreSGPRs: 332792 -> 332529 (-0.08%); split: -0.08%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:23 +00:00
Daniel Schürmann
45f6d38a76
aco: insert a single p_end_wqm after the last derivative calculation
...
This new instruction replaces p_wqm.
Totals from 28065 (36.65% of 76572) affected shaders: (GFX11)
MaxWaves: 823922 -> 823952 (+0.00%); split: +0.01%, -0.01%
Instrs: 22221375 -> 22180465 (-0.18%); split: -0.26%, +0.08%
CodeSize: 117310676 -> 117040684 (-0.23%); split: -0.30%, +0.07%
VGPRs: 1183476 -> 1186656 (+0.27%); split: -0.19%, +0.46%
SpillSGPRs: 2305 -> 2302 (-0.13%)
Latency: 176559310 -> 176427793 (-0.07%); split: -0.21%, +0.14%
InvThroughput: 26245204 -> 26195550 (-0.19%); split: -0.26%, +0.07%
VClause: 368058 -> 369460 (+0.38%); split: -0.21%, +0.59%
SClause: 857077 -> 842588 (-1.69%); split: -2.06%, +0.37%
Copies: 1245650 -> 1249434 (+0.30%); split: -0.33%, +0.63%
Branches: 394837 -> 396070 (+0.31%); split: -0.01%, +0.32%
PreSGPRs: 1019139 -> 1019567 (+0.04%); split: -0.02%, +0.06%
PreVGPRs: 925739 -> 931860 (+0.66%); split: -0.00%, +0.66%
Changes are due to scheduling and re-enabling cross-lane optimizations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:23 +00:00
Daniel Schürmann
28904839da
aco: don't insert a copy when emitting p_wqm
...
Totals from 351 (0.46% of 76572) affected shaders: (GFX11)
Instrs: 709202 -> 709600 (+0.06%); split: -0.02%, +0.08%
CodeSize: 3606364 -> 3608040 (+0.05%); split: -0.01%, +0.06%
Latency: 3589841 -> 3590756 (+0.03%); split: -0.01%, +0.03%
InvThroughput: 463303 -> 463324 (+0.00%)
SClause: 28147 -> 28201 (+0.19%); split: -0.02%, +0.22%
Copies: 43243 -> 43204 (-0.09%); split: -0.24%, +0.15%
PreSGPRs: 21028 -> 21042 (+0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:22 +00:00
Daniel Schürmann
040142684c
aco: make p_wqm a marker instruction without Operands/Definitions
...
Totals from 28277 (36.93% of 76572) affected shaders: (GFX11)
MaxWaves: 833930 -> 833898 (-0.00%); split: +0.01%, -0.01%
Instrs: 21366950 -> 21353346 (-0.06%); split: -0.11%, +0.05%
CodeSize: 112855368 -> 112610508 (-0.22%); split: -0.24%, +0.03%
VGPRs: 1157748 -> 1158540 (+0.07%); split: -0.10%, +0.17%
SpillSGPRs: 2465 -> 2463 (-0.08%); split: -0.16%, +0.08%
Latency: 168339886 -> 168383646 (+0.03%); split: -0.10%, +0.12%
InvThroughput: 25164895 -> 25158376 (-0.03%); split: -0.08%, +0.06%
VClause: 347660 -> 346256 (-0.40%); split: -0.55%, +0.15%
SClause: 794460 -> 799521 (+0.64%); split: -0.33%, +0.97%
Copies: 1151908 -> 1148370 (-0.31%); split: -0.54%, +0.23%
Branches: 359447 -> 359437 (-0.00%); split: -0.01%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:22 +00:00
Daniel Schürmann
1275981df8
aco: don't optimize cross-lane instructions across p_wqm
...
We will use p_wqm as a marker in the next step.
Totals from 8846 (11.55% of 76572) affected shaders: (GFX11)
Instrs: 7031274 -> 7072729 (+0.59%); split: -0.02%, +0.60%
CodeSize: 37060272 -> 37355244 (+0.80%); split: -0.01%, +0.80%
VGPRs: 402660 -> 398724 (-0.98%); split: -0.99%, +0.01%
Latency: 62231926 -> 62322311 (+0.15%); split: -0.01%, +0.15%
InvThroughput: 10341361 -> 10392589 (+0.50%); split: -0.00%, +0.50%
VClause: 105344 -> 105368 (+0.02%); split: -0.03%, +0.05%
SClause: 218330 -> 218469 (+0.06%); split: -0.07%, +0.14%
Copies: 378609 -> 377644 (-0.25%); split: -0.42%, +0.17%
Branches: 97218 -> 97207 (-0.01%); split: -0.01%, +0.00%
PreSGPRs: 307654 -> 307644 (-0.00%); split: -0.08%, +0.08%
PreVGPRs: 314744 -> 308650 (-1.94%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:22 +00:00
Daniel Schürmann
0907b53740
aco/insert_exec_mask: set Exact mode after p_discard_if when necessary
...
Fixes: 5e9df85b1a ('aco: optimize discard_if when WQM is not needed afterwards')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:22 +00:00
Rhys Perry
41b6020ff3
aco: remove fast path in insert_exec_mask's process_instructions
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:22 +00:00
Daniel Schürmann
0e8192a76b
aco: append p_logical_end after monolithic RT shaders
...
Fixes: bdec044c88 ('aco: Do not fixup registers if there are no shader calls')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038 >
2023-09-14 09:25:22 +00:00
Dave Airlie
c5fb2fff18
ac,radeonsi: move vcn enc av1 default cdf file to common
...
This can be used by radv.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25196 >
2023-09-14 07:51:24 +00:00
Dave Airlie
daa01703cc
ac,radeonsi: move vcn enc structs to common
...
This just moves the header to make it easier to share with radv.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25196 >
2023-09-14 07:51:24 +00:00
Samuel Pitoiset
f8a7c8edd1
radv: emit relocation for mesh/task shaders
...
RGP requires shaders to be contiguous in memory.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25144 >
2023-09-14 07:19:26 +00:00
Samuel Pitoiset
312103e0ff
radv: set THREAD_TRACE_MARKER_ENABLE for mesh/task draws
...
PAL does that.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25144 >
2023-09-14 07:19:25 +00:00
Samuel Pitoiset
505c2ee92d
ac/rgp: use correct API stage string for mesh/task shaders
...
This allows RGP to report the ISA.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25144 >
2023-09-14 07:19:25 +00:00