Lowering them earlier right after VTN would allow us to implement
embedded samplers for descriptor heap properly for merged shaders.
Non-immediate samplers are still lowered in
radv_nir_apply_pipeline_layout because they require shader arguments.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37688>
Use vk_video_is_profile_supported first, and add AMD specific
restrictions later.
vulkaninfo reports on Navi31:
H.264 Decode (4:2:0 8-bit) Baseline progressive
H.264 Decode (4:2:0 8-bit) Main progressive
H.264 Decode (4:2:0 8-bit) High progressive
H.264 Decode (4:2:0 8-bit) Baseline interlaced (interleaved lines)
H.264 Decode (4:2:0 8-bit) Main interlaced (interleaved lines)
H.264 Decode (4:2:0 8-bit) High interlaced (interleaved lines)
H.264 Decode (monochrome 8-bit) High progressive
H.264 Decode (monochrome 8-bit) High interlaced (interleaved lines)
H.265 Decode (4:2:0 8-bit) Main
H.265 Decode (4:2:0 8-bit) Main 10
H.265 Decode (4:2:0 8-bit) Main Still Picture
H.265 Decode (4:2:0 10-bit) Main 10
VP9 Decode (4:2:0 8-bit) Profile 0
VP9 Decode (4:2:0 10-bit) Profile 2
AV1 Decode (4:2:0 8-bit) Main with film grain support
AV1 Decode (4:2:0 8-bit) Main without film grain support
AV1 Decode (4:2:0 10-bit) Main with film grain support
AV1 Decode (4:2:0 10-bit) Main without film grain support
AV1 Decode (4:2:0 12-bit) Professional with film grain support
AV1 Decode (4:2:0 12-bit) Professional without film grain support
AV1 Decode (monochrome 8-bit) Main with film grain support
AV1 Decode (monochrome 8-bit) Main without film grain support
AV1 Decode (monochrome 10-bit) Main with film grain support
AV1 Decode (monochrome 10-bit) Main without film grain support
AV1 Decode (monochrome 12-bit) Professional with film grain support
AV1 Decode (monochrome 12-bit) Professional without film grain support
H.264 Encode (4:2:0 8-bit) Baseline
H.264 Encode (4:2:0 8-bit) Main
H.264 Encode (4:2:0 8-bit) High
H.265 Encode (4:2:0 8-bit) Main
H.265 Encode (4:2:0 8-bit) Main 10
H.265 Encode (4:2:0 8-bit) Main Still Picture
H.265 Encode (4:2:0 10-bit) Main 10
AV1 Encode (4:2:0 8-bit) Main
AV1 Encode (4:2:0 10-bit) Main
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37656>
Currently we wait until the second dword in feedback buffer changes
from 0 to 1, and then the rest of the feedback is read. There is no
guarantee that the rest of the feedback will be available, which can
cause bitstream size to be incorrectly returned as 0.
Add write memory command after encode, marking the query as available
to ensure the entire feedback buffer is ready.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13601
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36772>
This can trigger an assert otherwise. The space reserved before
executing DGC IBs is an arbitrary number which should be large enough
in all cases.
Found this while implementing descriptor heap.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37681>
Based on a patch by llyyr <llyyr.public@gmail.com>:
!36827 added the copy_sync_payloads function, but didn't enable use of
it in radv. This commit mirrors similar MRs for anv/panvk/nvk and uses
the common vk_drm_syncobj_copy_payloads function for copy_sync_payloads.
I'm not too familiar with radv internals, so there's potentially a good
reason why this isn't a good change. However, I've personally been using
this patch locally for around a month and have experienced no
regressions and around 8% uplift on vkmark test scores with a 6600 XT.
[vertex] device-local=true: 45110 -> 48489 (+7.5%)
[vertex] device-local=false: 17529 -> 17488 (-0.2%)
[texture] anisotropy=0: 44768 -> 48679 (+8.7%)
[texture] anisotropy=16: 44920 -> 48572 (+8.1%)
[shading] shading=gouraud: 44931 -> 48467 (+7.9%)
[shading] shading=blinn-phong-inf: 44849 -> 48740 (+8.7%)
[shading] shading=phong: 44695 -> 48645 (+8.8%)
[shading] shading=cel: 44809 -> 47938 (+7.0%)
[effect2d] kernel=edge: 45185 -> 47837 (+5.9%)
[effect2d] kernel=blur: 26919 -> 26762 (-0.6%)
[desktop] <default>: 40974 -> 44034 (+7.5%)
[cube] <default>: 45090 -> 49270 (+9.3%)
[clear] <default>: 41102 -> 44375 (+8.0%)
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37606)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37640>
If there are no leaves, the root node bounds still span -inf/inf.
Making empty BLASs infinite-sized guarantees ray traversal needs to
enter the BLAS (and immediately exit because it's empty). Remove the
BLAS from the BVH entirely by marking its bounds as NaN. As a bonus,
this works around RADV encountering issues in Silent Hill 2 on RDNA4 due
to infinite-sized BVHs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37492>
This reduces duplication: we only need to distinguish between Windows
and Unix in one place.
The previous code was inconsistent about using either the `platforms`
option, or the `host_machine`. Following the logic described in
commit 94379377 "lavapipe: build "Windows" check should use the host machine, not the `platforms` option.",
I've assumed that checking the host machine is the more-correct version
and used that.
Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>
This consistently uses `NAME.dll` on Windows, `libNAME.dylib` on Darwin
derivatives such as macOS, and `libNAME.so` on Linux, *BSD and so on.
It's also consistent about using the local variable name `icd_file_name`
for this name in every Vulkan driver, which was already the case in many
but not all drivers.
Some of these drivers probably don't make sense (or don't work) on
Windows and/or macOS, but if this is kept consistent for all drivers,
it should avoid the need for driver-specific commits like
commit 611e9f29e "lavapipe: fix icd generation for windows",
commit 951f3287 "lavapipe: set empty dll prefix",
commit 13e7a39f "lavapipe: fixes for macOS support",
commit 7008e655 "radv: Update JSON generator if Windows" and so on,
each time a driver is found to be relevant on more platforms than
previously believed.
Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>
Because s_sendmsg dealloc_vgprs waits for every counter except vs_count,
and the message bus has limited throughput, we should only insert the dealloc
when we know that it's beneficial.
Foz-DB Navi31:
Totals from 5280 (6.58% of 80273) affected shaders:
Instrs: 4186851 -> 4197416 (+0.25%)
CodeSize: 21910004 -> 21952264 (+0.19%)
Latency: 31679067 -> 31679173 (+0.00%)
InvThroughput: 9182625 -> 9183417 (+0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508>
Only create nir_load_rt_arg_scratch_offset_amd if needed.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35069>
[WHY]
get a clear definition of fastload support and actual 3d lut
container size
[HOW]
Added related code
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Nawwar Ali <Nawwar.Ali@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37504>
[Why]
Route case where dest rect has
zero dimensions to perform background
color fill.
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Iswara Nagulendran <Iswara.Nagulendran@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37504>
[WHY & How]
Ensures type-safe comparison for the sys_event callback assignment by
casting the NULL constant to the appropriate function pointer type.
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Muhammad Ansari <Muhammad.Ansari@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37504>
[HOW]
Created check_blending_support function and condition to check for
readable purpose
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Zhao, Jiali <Jiali.Zhao@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37504>
radv_cmd_buffer_upload_alloc_aligned is used with alignment=0, which
guarantees that the alignment is at least 4.
Fixes: 9e16ed7a13 - ac/nir: switch nir_load_smem_amd uses to ac_nir_load_smem wrapper
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37345>
Usually wave64 performs better for fragment shaders,
because LDS sharing for interpolation is better.
But the rt traversal loop divergence is likely high enough to make
wave32 better on GFX10.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37360>
If the application really thinks it needs pswave32, let it use it.
Fragment shaders also have no concept of full subgroups, so the existing
code that chooses the subgroup size will work already.
For pre raster stages, we cannot allow this because of potential mismatches
in merged stages.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37360>