ACO is still not perfect:
* It generates s_wait_loadcnt 0x0-0x3 when the only required wait instruction
is s_wait_loadcnt 0x5.
* It generates a lot of unnecessary jumps and blocks for uniform loop breaks.
Only scc1 jumps are necessary to break the loop. This is 10x better than
LLVM, but even ACO might consider using nir_intrinsic_ordered_add_loop_gfx12_amd
for the best performance.
How to print the streamout asm on any GPU:
PIGLIT_PLATFORM=gbm AMD_FORCE_FAMILY=gfx12_16pipe AMD_DEBUG=vs,mono,asm,useaco ../piglit/bin/shader-io-rate vs_out_xfb
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32570>
This configuration will be enabled in RADV in a subsequent commit.
On GFX10.3:
Do this together with the primitive export, to avoid adding extra
CF, and to ensure optimal access of the export space.
On GFX11:
It's not an export but a memory store instruction, so always do
it earlier and ensure the optimal attribute ring access pattern.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32270>
This injects a MRTZ export with only the alpha channel to select it
with COVERAGE_TO_MASK_ENABLE for alpha-to-coverage.
Co-Authored-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32583>
When alpha-to-coverage and alpha-to-one are both enabled in the
fragment shader, the alpha value should be exported through MRTZ and
one to MRT0.a. Otherwise, alpha-to-one will be performed before
alpha-to-coverage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>
They are a maintainenance burden since they would need changes to
support more instruction types that nir_opt_varyings will be able to
move between shaders, and they are almost identical to
default_varying_estimate_instr_cost, so just use that.
The cost threshold is adjusted for AMD because
default_varying_estimate_instr_cost is slightly different.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>
This patch:
- adds a new subquery (AMDGPU_INFO_UQ_FW_AREAS) in AMDGPU_INFO_IOCTL
to get the size and alignment of shadow and csa objects from the
kernel. This information is required for a userqueue consumer (like
MESA/libdrm) to create the userqueue metadata objects properly.
- also adds supporting metadata structures and a high level wrapper
function (amdgpu_query_uq_metadata_info) to the query, to make it
easy to use.
The corresponding kernel changes for this UAPI extension can be found
in amd-gfx mailing list, link:
https://patchwork.freedesktop.org/patch/621390/?series=139715&rev=2
This patch adds support only for the GFX IP, and the other engines may
be supported in subsequent development.
This patch was reviewed in libdrm library at
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/400
Cc: Marek Olsak <marek.olsak@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Arvind Yadav <arvind.yadav@amd.com>
Reviewed-by: Marek Olsak <marek.olsak@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
This patch adds new IOCTL functions to support
userqueue create, remove, signal and wait etc.
This patch was reviewed in libdrm library at
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/392
Cc: Deucher, Alexander <alexander.deucher@amd.com>
Cc: Koenig, Christian <christian.koenig@amd.com>
Cc: Sharma, Shashank <shashank.sharma@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29010>
This imports 35 libdrm_amdgpu functions into Mesa.
The following 15 functions are still in use:
amdgpu_bo_alloc
amdgpu_bo_cpu_map
amdgpu_bo_cpu_unmap
amdgpu_bo_export
amdgpu_bo_free
amdgpu_bo_import
amdgpu_create_bo_from_user_mem
amdgpu_device_deinitialize
amdgpu_device_get_fd
amdgpu_device_initialize
amdgpu_get_marketing_name
amdgpu_query_sw_info
amdgpu_va_get_start_addr
amdgpu_va_range_alloc
amdgpu_va_range_free
We can't import them because they make sure that we only use 1 VMID
per process shared by all APIs. (except the marketing name)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32067>
It's an experimental feature that we may enable later.
Instead of exporting NULL primitives, perform a compaction
on primitives after culling.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32290>
Implement two workgroup scans over two boolean values in parallel,
so that they can be done with very minimal ALU overhead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32290>
No functional changes, just makes the code more readable.
Use inverse_ballot instead of elect.
Wrap if contents, rename if.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31973>
Move the NIR control flow out of the cull_small_primitive_triangle
function to make it more readable and follow the other functions.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31973>
Change the workgroup scan to be inclusive and adjust
the scalar operations after it.
This gets rid of 1 VALU instruction for 2 SALU. Win!
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31973>
dot_op would be dead code when v_dot instructions are unavailable.
It was originally added there because ACO didn't have an ILP
scheduler yet, but now it does so let's trust it to do its job.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31973>
This will need to be set to true when the GLSL linker lowers IO, which
can later be unlowered by st/mesa, and then drivers can lower it again
without load_interpolated_input. Therefore, it can't be a global
immutable option.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32229>
This is to match the code names used in other enums.
Also add comments to separate GFX11.5 and GFX12 chips.
v2 by Marek Olšák:
- Rename GFX1103 to in addrlib also
- Rework ac_get_family_name
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32170>
This was enabled by default in nir_opt_varyings, but vc4 can't handle
when shader outputs write Y but not X. Add an option for it and enable
it only for the driver that benefits from it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32174>
If nir_tcs_info::all_invocations_define_tess_levels is true, the pass
doesn't have to insert a barrier and use output loads to get tess level
output values. It can just use the SSA defs that are being stored (or phis
thereof) to get the tess level output values.
The remaining tcs_info fields will be used by the HS shader message.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32171>
ac_fake_hw_db can be the single place where radeon_info content
is emulated when overriding the GPU type.
For some fields we need to avoid overriding them with the value
coming from the ioctls to get the correct behavior.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>
The next commit will reuse the radeon_info when AMD_FORCE_FAMILY
is used.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31841>