Commit graph

2893 commits

Author SHA1 Message Date
Qiang Yu
5f601361ed ac/nir: lower access for shared and scratch memory
OpenCL may load and store vec16 data, while ACO only
support <=32byte. Radeonsi is going to use
ac_nir_lower_mem_access_bit_sizes() for lowering these
access.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32781>
2024-12-27 01:58:38 +00:00
Marek Olšák
c0e5e8f932 amd: update addrlib
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32687>
2024-12-26 21:02:21 +00:00
Marek Olšák
c6fd69bd5e ac: remove unused code
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32780>
2024-12-26 10:12:43 +00:00
Marek Olšák
de996ac481 radeonsi: kill Z and stencil PS outputs if depth or stencil is disabled
This adds kill_z and kill_stencil flags to the shader PS epilog key, which
removes those outputs if depth or stencil are disabled.

It must be implemented in:
* ACO PS epilog
* LLVM PS epilog
* ac_nir_lower_ps for monolithic shaders

Some of the samplemask code wasn't completely correct, but probably harmless.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
2024-12-24 12:02:20 +00:00
Daniel Schürmann
28a214728c ac/lower_ngg: move readlane into break blocks in streamout code generation for gfx12/ACO
This avoids unnecessary shuffle code and s_wait_loadcnt.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32743>
2024-12-21 12:32:25 +00:00
Daniel Schürmann
47227089d6 ac/lower_ngg: move break blocks after loop in streamout code generation for gfx12/ACO
By inverting the break condition, the loop becomes shorter.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32743>
2024-12-21 12:32:25 +00:00
Daniel Schürmann
39dcd9dedb ac/lower_ngg: Fix collecting buffer offsets from 4 lanes on gfx12
Also use readlane for improved performance.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32743>
2024-12-21 12:32:25 +00:00
Marek Olšák
4d8a508510 ac/nir: call nir_gather_tcs_info only once for RADV
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Marek Olšák
8c2f9f0665 radv: switch to the new TCS LDS/offchip size computation
to use the same logic as radeonsi. This could be improved, see TODOs.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Marek Olšák
3056bf1cb1 ac/nir: add new helpers for computing the TCS LDS/offchip size accurately
This is based on how the HS lowering passes address TCS inputs and
outputs. The new LDS size is lower in some cases.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Marek Olšák
85c20def94 ac,radv,radeonsi: enable TCS input reads from VGPRs for all compatible loads
Cross-invocation TCS input access doesn't prevent same-invocation access.
This improves shaders that use both for the same inputs.

Also, if some components of a vec4 slot only use same-invocation access and
other components only use cross-invocation access (it's possible after
compaction), this takes the VGPR path for the components with
same-invocation access, which didn't happen previously because all masks
only describe whole vec4s.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Marek Olšák
99a03dc9d5 ac/nir: allow a TCS input to be available from both VGPRs and LDS
Both can be used. Cross-invocation access can read it from LDS, while
same-invocation access can read it from VGPRs.

The entrypoints of the passes don't allow that flexibility yet,
but the logic inside the pass allows it.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Marek Olšák
b49eab68a8 ac/nir: use s_sendmsg(HS_TESSFACTOR) to optimize writing tess factors for gfx11
This uses the new shader message. It eliminates memory stores and latency
for simple cases of tess level values.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Marek Olšák
f4eebb373c ac/nir: reserve the first LDS vec4 for the HS tf0/1 group vote in TCS
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Marek Olšák
cdecbee922 radeonsi/gfx12: adjust HiZ/HiS logic
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32653>
2024-12-16 21:54:28 +00:00
Marek Olšák
e3cef02c24 radeonsi/gfx12: set DB_RENDER_OVERRIDE based on stencil state
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32653>
2024-12-16 21:54:28 +00:00
Marek Olšák
8328e57512 ac/surface/gfx12: enable DCC 256B compressed blocks and reorder modifiers
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32653>
2024-12-16 21:54:27 +00:00
Marek Olšák
e6345e2fd3 ac: update SPI_GRP_LAUNCH_GUARANTEE_* register values for gfx12
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32653>
2024-12-16 21:54:27 +00:00
Marek Olšák
3943ed8199 ac/lower_ngg: improve streamout code generation for gfx12/ACO to match LLVM
ACO is still not perfect:
* It generates s_wait_loadcnt 0x0-0x3 when the only required wait instruction
  is s_wait_loadcnt 0x5.
* It generates a lot of unnecessary jumps and blocks for uniform loop breaks.
  Only scc1 jumps are necessary to break the loop. This is 10x better than
  LLVM, but even ACO might consider using nir_intrinsic_ordered_add_loop_gfx12_amd
  for the best performance.

How to print the streamout asm on any GPU:
    PIGLIT_PLATFORM=gbm AMD_FORCE_FAMILY=gfx12_16pipe AMD_DEBUG=vs,mono,asm,useaco ../piglit/bin/shader-io-rate vs_out_xfb

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
2024-12-16 07:35:07 +00:00
Qiang Yu
b14cc34415 ac/surf: add more modifiers to gfx12 supported list
OpenGL will export these modifiers for various sized
textures.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
2024-12-16 07:35:06 +00:00
Qiang Yu
b3a218d444 ac/surface/tests: support all block sizes
We are going to add more modifiers.

GFX9 has 4K DCC and non-DCC modifiers while others only have
4K non-DCC modifiers.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
2024-12-16 07:35:06 +00:00
Timur Kristóf
9224b9a752 ac/nir/ngg: Add ability to store primitive ID as per-primitive.
This configuration will be enabled in RADV in a subsequent commit.

On GFX10.3:
Do this together with the primitive export, to avoid adding extra
CF, and to ensure optimal access of the export space.

On GFX11:
It's not an export but a memory store instruction, so always do
it earlier and ensure the optimal attribute ring access pattern.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32270>
2024-12-12 18:11:45 +00:00
Samuel Pitoiset
c3a050da07 radv: fix alpha-to-coverage with alpha-to-one without MRTZ
This injects a MRTZ export with only the alpha channel to select it
with COVERAGE_TO_MASK_ENABLE for alpha-to-coverage.

Co-Authored-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32583>
2024-12-12 10:07:25 +00:00
Tim Huang
ad75b9f1a6 amd: add GFX v11.5.3 support
This enables support for GFX version 11.5.3.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32567>
2024-12-11 19:14:34 +00:00
Samuel Pitoiset
1037830098 ac/nir: export alpha to MRTZ.a and one to MRT0.a for alpha-to-one on GFX11
When alpha-to-coverage and alpha-to-one are both enabled in the
fragment shader, the alpha value should be exported through MRTZ and
one to MRT0.a. Otherwise, alpha-to-one will be performed before
alpha-to-coverage.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>
2024-12-11 10:50:31 +00:00
Rhys Perry
033e76a82a ac/nir: have ac_nir_lower_mem_access_bit_sizes preserve >128 bit SMEM
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32408>
2024-12-09 16:56:29 +00:00
Marek Olšák
d8468d5463 amd,zink: remove options.varying_estimate_instr_cost callbacks
They are a maintainenance burden since they would need changes to
support more instruction types that nir_opt_varyings will be able to
move between shaders, and they are almost identical to
default_varying_estimate_instr_cost, so just use that.

The cost threshold is adjusted for AMD because
default_varying_estimate_instr_cost is slightly different.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
2024-12-04 13:40:41 +00:00
Samuel Pitoiset
9df3c9e4a1 ac/parse_ib: print VA for the SDMA CONSTANT_FILL/WRITE packets
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32456>
2024-12-03 15:29:40 +00:00
Samuel Pitoiset
31524d42a2 ac/parse_ib: fix parsing SDMA CONSTANT_FILL packet
This packet only has 5 DWORDS.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32456>
2024-12-03 15:29:39 +00:00
Yogesh Mohan Marimuthu
f930201898 ac/gpu_info: populate fw info using new fw info ioctl for userq
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
2024-12-03 12:02:06 +00:00
Yogesh Mohan Marimuthu
8447cb563f winsys/amdgpu: send hdp flush packet for userq
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
2024-12-03 12:02:06 +00:00
Yogesh Mohan Marimuthu
94c41852bd ac: add inherit vmid field to indirect buffer packet
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
2024-12-03 12:02:06 +00:00
Yogesh Mohan Marimuthu
cda75d6497 ac: add new userq signal and wait packet id
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
2024-12-03 12:02:06 +00:00
Yogesh Mohan Marimuthu
093cf74b26 ac/gpuinfo: add use_userq and AMD_USERQ variable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
2024-12-03 12:02:06 +00:00
Shashank Sharma
42d49faee5 amd: add new AMDGPU_INFO subquery for userqueue metadata
This patch:
- adds a new subquery (AMDGPU_INFO_UQ_FW_AREAS) in AMDGPU_INFO_IOCTL
  to get the size and alignment of shadow and csa objects from the
  kernel. This information is required for a userqueue consumer (like
  MESA/libdrm) to create the userqueue metadata objects properly.
- also adds supporting metadata structures and a high level wrapper
  function (amdgpu_query_uq_metadata_info) to the query, to make it
  easy to use.

The corresponding kernel changes for this UAPI extension can be found
in amd-gfx mailing list, link:
https://patchwork.freedesktop.org/patch/621390/?series=139715&rev=2

This patch adds support only for the GFX IP, and the other engines may
be supported in subsequent development.

This patch was reviewed in libdrm library at
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/400

Cc: Marek Olsak <marek.olsak@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Arvind Yadav <arvind.yadav@amd.com>
Reviewed-by: Marek Olsak <marek.olsak@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
2024-12-03 12:02:06 +00:00
Arvind Yadav
b0a70da496 amd: Add amdgpu userqueue IOCTL functions
This patch adds new IOCTL functions to support
userqueue create, remove, signal and wait etc.

This patch was reviewed in libdrm library at
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/392

Cc: Deucher, Alexander <alexander.deucher@amd.com>
Cc: Koenig, Christian <christian.koenig@amd.com>
Cc: Sharma, Shashank <shashank.sharma@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
2024-12-03 12:02:06 +00:00
Yogesh Mohan Marimuthu
3981b017eb amd: include amdgpu_drm.h from mesa instead of system for ac_fake_hw_db.h
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
2024-12-03 12:02:06 +00:00
David Rosca
308bae950f ac/surface: Add RADEON_SURF_VIDEO_REFERENCE
Select supported swizzle mode for VCN DPB surfaces.

Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32303>
2024-12-02 13:48:22 +00:00
Benjamin Cheng
323b59a5b5 radv/video: support event for pre-VCN4 decode queues
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32400>
2024-11-29 10:03:48 +10:00
Benjamin Cheng
152b06acd8 ac/vcn: allow sq signature package to be skipped
This is preparing for radv event support on pre-VCN4 encode queues.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32400>
2024-11-29 10:01:49 +10:00
David Rosca
489ba819b0 radeonsi/vcn: Support tiling for JPEG decode
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32301>
2024-11-28 08:52:37 +00:00
Pierre-Eric Pelloux-Prayer
272addc672 ac/nir: remove prim_stride_ret arg from ngg_build_streamout_buffer_info
This is not used outside of this function, so declare it as a local
variable instead.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32281>
2024-11-27 19:00:20 +00:00
Georg Lehmann
239c0124df radv: optimize sample mask comparisons
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32327>
2024-11-26 18:44:39 +00:00
Marek Olšák
a3516dafc9 util,amd: add inlinable versions of drmIoctl/drmCommandWrite*
The reason for this is to inline those calls in drivers.
They are very trivial, so why not.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32067>
2024-11-26 00:16:02 -05:00
Marek Olšák
049641ca54 amd: import libdrm_amdgpu ioctl wrappers
This imports 35 libdrm_amdgpu functions into Mesa.

The following 15 functions are still in use:
   amdgpu_bo_alloc
   amdgpu_bo_cpu_map
   amdgpu_bo_cpu_unmap
   amdgpu_bo_export
   amdgpu_bo_free
   amdgpu_bo_import
   amdgpu_create_bo_from_user_mem
   amdgpu_device_deinitialize
   amdgpu_device_get_fd
   amdgpu_device_initialize
   amdgpu_get_marketing_name
   amdgpu_query_sw_info
   amdgpu_va_get_start_addr
   amdgpu_va_range_alloc
   amdgpu_va_range_free

We can't import them because they make sure that we only use 1 VMID
per process shared by all APIs. (except the marketing name)

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32067>
2024-11-25 21:03:41 -05:00
Timur Kristóf
8653abac09 ac/nir/ngg: Remove erroneous NUW addition from workgroup scan.
This may add constant -1 so naturally it can indeed cause
an unsigned wrap.

Fixes: 492d8f3778
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12204
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32338>
2024-11-25 21:43:45 +00:00
Timur Kristóf
45c523104a ac/nir/ngg: Implement optional primitive compaction.
It's an experimental feature that we may enable later.
Instead of exporting NULL primitives, perform a compaction
on primitives after culling.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32290>
2024-11-25 01:56:20 +01:00
Timur Kristóf
492d8f3778 ac/nir/ngg: Workgroup scan over two bools.
Implement two workgroup scans over two boolean values in parallel,
so that they can be done with very minimal ALU overhead.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32290>
2024-11-25 01:56:08 +01:00
Timur Kristóf
78f77e161c ac/nir/ngg: Pass wg_repack_result as pointer instead of returning it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32290>
2024-11-25 01:55:30 +01:00
Timur Kristóf
73fc29b25c ac/nir/ngg: Slightly refactor workgroup scan.
No functional changes, just makes the code more readable.
Use inverse_ballot instead of elect.
Wrap if contents, rename if.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31973>
2024-11-22 01:01:39 +01:00