Commit graph

18229 commits

Author SHA1 Message Date
Daniel Schürmann
e32eec52f0 aco/ra: generalize register affinities
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Daniel Schürmann
caa2c22d8b aco/tests: Fix p_startpgm definitions to registers
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36345>
2025-08-01 17:15:54 +00:00
Alyssa Rosenzweig
82ae8b1d33 treewide: simplify nir_def_rewrite_uses_after
Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.

We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.

Via Coccinelle patch:

    @@
    expression a, b;
    @@

    -nir_def_rewrite_uses_after(a, b, b->parent_instr)
    +nir_def_rewrite_uses_after_def(a, b)

Followed by a bunch of sed.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Alyssa Rosenzweig
cc6e3b84cb treewide: use nir_def_as_*
Via Coccinelle patch:

    @@
    expression definition;
    @@

    -nir_instr_as_alu(definition->parent_instr)
    +nir_def_as_alu(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_intrinsic(definition->parent_instr)
    +nir_def_as_intrinsic(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_phi(definition->parent_instr)
    +nir_def_as_phi(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_load_const(definition->parent_instr)
    +nir_def_as_load_const(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_deref(definition->parent_instr)
    +nir_def_as_deref(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_tex(definition->parent_instr)
    +nir_def_as_tex(definition)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Martin Roukala (né Peres)
eab24d7793 radv/ci: document new flakes
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36439>
2025-08-01 09:42:03 +03:00
Eric Engestrom
15d554592b radv/ci: add missing GPU_VERSION for navi10 in kws farm
Fixes: d40438031c ("radv/ci: deduplicate GPU_VERSION in ci-tron jobs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36499>
2025-07-31 21:50:12 +00:00
Collabora's Gfx CI Team
f99a60f499 Uprev Piglit to c3a3e29d59e0972650a6d30d20de930c87739c14
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
0980079dcf...c3a3e29d59

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36340>
2025-07-31 21:05:20 +00:00
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00
Georg Lehmann
a6a6c2f691 aco/ra: convert bitwise instruction to gfx11+ 16bit on demand
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
The 32bit versions are smaller, allow more optimizations and VOPD,
so only use the 16bit opcodes if nessecary.

Foz-DB Navi31:
Totals from 84 (0.10% of 80237) affected shaders:
Instrs: 176673 -> 176347 (-0.18%); split: -0.20%, +0.01%
CodeSize: 970148 -> 969716 (-0.04%); split: -0.08%, +0.03%
VGPRs: 5876 -> 5864 (-0.20%)
Latency: 2805974 -> 2805674 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 769007 -> 768738 (-0.03%); split: -0.04%, +0.01%
VClause: 2593 -> 2597 (+0.15%)
Copies: 23749 -> 23487 (-1.10%); split: -1.11%, +0.00%
VALU: 107124 -> 106862 (-0.24%); split: -0.25%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919>
2025-07-31 12:07:07 +00:00
Georg Lehmann
404e1f13e8 aco/print_asm: use real true16 instr on gfx11+
Fake16 doesn't print opsel on v_cndmask_b16, so it looks really broken.
Restrict to LLVM20+ because older versions have incomplete tru16 support.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35919>
2025-07-31 12:07:07 +00:00
Eric Engestrom
d40438031c radv/ci: deduplicate GPU_VERSION in ci-tron jobs
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36354>
2025-07-30 23:18:04 +00:00
Eric Engestrom
84ca8c54f7 radv/ci: deduplicate DEQP_SUITE: radv-valve in ci-tron jobs
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36354>
2025-07-30 23:18:03 +00:00
Georg Lehmann
4683187f49 radv/nir/lower_cmat: load gfx11 8bit ACC using the B layout to get aligned loads
This allows us to use aligned loads that can be vectorized, without any
downside as 8bit scalar loads always write 16bits of a register.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shader:
MaxWaves: 71 -> 68 (-4.23%)
Instrs: 60146 -> 59781 (-0.61%); split: -0.67%, +0.06%
CodeSize: 412448 -> 413428 (+0.24%); split: -0.11%, +0.35%
VGPRs: 2112 -> 2160 (+2.27%)
SpillVGPRs: 89 -> 68 (-23.60%)
Scratch: 11776 -> 8704 (-26.09%)
Latency: 196628 -> 193770 (-1.45%); split: -2.62%, +1.17%
InvThroughput: 224944 -> 226274 (+0.59%); split: -0.02%, +0.61%
VClause: 862 -> 796 (-7.66%)
Copies: 3166 -> 3342 (+5.56%); split: -6.22%, +11.78%
Branches: 37 -> 38 (+2.70%)
PreSGPRs: 311 -> 312 (+0.32%)
PreVGPRs: 2153 -> 2214 (+2.83%); split: -1.35%, +4.18%
VALU: 51073 -> 51448 (+0.73%); split: -0.03%, +0.77%
SALU: 1072 -> 1074 (+0.19%)
VMEM: 3275 -> 2765 (-15.57%)
VOPD: 1739 -> 1783 (+2.53%); split: +7.99%, -5.46%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>
2025-07-30 07:25:51 +00:00
Marek Olšák
8d3e76c250 nir: split nir_move_load_frag_coord from nir_move_load_input
It's a pure system value on AMD, not an input.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36357>
2025-07-29 16:20:48 -04:00
Antonio Ospite
0dd8051f34 radv: fix returning _Bool instead of pointer
When building for C23 the compiler warns about returning a boolean when
a different type is expected instead.

Change the code to return NULL instead of false, fixing the following
errors:

-----------------------------------------------------------------------
../src/amd/vulkan/radv_pipeline_cache.c: In function ‘radv_pipeline_cache_object_search’:
../src/amd/vulkan/radv_pipeline_cache.c:338:14: error: incompatible types when returning type ‘_Bool’ but ‘struct radv_pipeline_cache_object *’ was expected
  338 |       return false;
      |              ^~~~~
../src/amd/vulkan/radv_pipeline_cache.c:352:14: error: incompatible types when returning type ‘_Bool’ but ‘struct radv_pipeline_cache_object *’ was expected
  352 |       return false;
      |              ^~~~~
-----------------------------------------------------------------------

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36323>
2025-07-29 14:07:06 +00:00
llyyr
c869971d05 radv: don't set HOST_IMAGE_TRANSFER_BIT if host_image_copy not enabled
This can't work if the extension isn't enabled, so only set if the
extension is enabled.

Fixes: d89b11011f ("radv: add support for formats with host-transfer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36400>
2025-07-29 13:48:59 +00:00
Nagulendran, Iswara
6088dbe05c amd/vpelib: Fix cost profiling support
Add additional changes/logs to profile total register writes.

Signed-off-by: Iswara Nagulendran <Iswara.Nagulendran@amd.com>
Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:25 +00:00
Chang, Tomson
02beb30d6b amd/vpelib: Update register header and definitions macros
Update header and related macros and functions

Reviewed-by: Min-Hsuan You <Min-Hsuan.You@amd.com>
Reviewed-by: Ricky Lin <Ricky.Lin@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Tomson Chang <tomson.chang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:25 +00:00
Chan, Roy
08fd9aab30 amd/vpelib: fix memory corruption
[WHY]
Wrong structure size being allocated

[HOW]
fixed the structure size during allocation

Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:25 +00:00
Nagulendran, Iswara
74c88f740f amd/vpelib: Fix Issues with Background Color insertions
[WHY]
Background Color Insertion, test cases involving studio output fails

[HOW]
Move background color convertion into revision specific resource
files and isolated what needed to be executed for VPE

Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Iswara Nagulendran <Iswara.Nagulendran@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:25 +00:00
Assadian, Navid
b529d38ae9 amd/vpelib: Exit when VPE not support in debug
When the debug flag is set to assert when vpe is not supported, instead
of assert it is preferable to exit so the CI aborts the process instead
of waiting on the assert message.

[WHY]
In debug mode for CI, when assert the process doesn't abort and the CI
terminates on time out.

[HOW]
Using exit instead of assert

Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Navid Assadian <navid.assadian@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:24 +00:00
Kovac, Krunoslav
5b8b5c4c49 amd/vpelib: Fix Possible dereferencing null
pointer issue

[WHY]
Mostly dereferencing possible null ptrs

[HOW]
Add checks / refactor code.

Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:24 +00:00
Chang, Tomson
48495c142a amd/vpelib: Add missing swizzle and dcc info
Add missing swizzle mode and dcc info

Reviewed-by: Ricky Lin <Ricky.Lin@amd.com>
Reviewed-by: Jude Shih <Jude.Shih@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Tomson Chang <tomson.chang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:24 +00:00
Hsieh, Mike
d281a5587d amd/vpelib: add max/min input output capability
[WHY]
Capability need to show max and min input/output size.

[HOW]
Add max_input_size, max_ouptut_size, min_output_size and min_imput_size.

Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Ricky Lin <Ricky.Lin@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Mike Hsieh <Mike.Hsieh@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:24 +00:00
Hsieh, Mike
6b279abcac amd/vpelib: bug fix: remove unnecessary free
[WHY]
vpe_priv.resource should not be freed when destroy resource

[HOW]
Remove unnecessary free.

Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Reviewed-by: Brendan Steve Leder <BrendanSteven.Leder@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Mike Hsieh <Mike.Hsieh@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:24 +00:00
Hsieh, Mike
e1ff093e63 amd/vpelib: add format, colorspace check function
[WHY]
VPE does not support pixel format and colorspace support check.

[HOW]
add vpe_create_engine function to support stateless API.
Add new function to support pixel format check and colorspace support
check.

Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Mike Hsieh <Mike.Hsieh@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:24 +00:00
Agate, Jesse
0541d73cbd amd/vpelib: Use Ceil Division Macro
Use available ceil division macro

[WHY]
Code Cleanup

[HOW]
Use available macro

Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Navid Assadian <Navid.Assadian@amd.com>
Acked-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36433>
2025-07-29 09:17:24 +00:00
Georg Lehmann
b12db991eb aco/gfx10: optimize subgroupRotate(x, 32) and subgroupShuffleXor(x, 32)
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
We don't have v_permlane64_b32 yet, but we can still optimize it using
shared vgprs. Using the DPP16 row mask, we can even avoid writing exec.

With v0 input/output and v24/v25 as shared vgprs, this results in:
v_mov_b32_dpp v24, v0 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf
v_mov_b32_dpp v25, v0 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v24 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
v_mov_b32_dpp v0, v25 quad_perm:[0,1,2,3] row_mask:0x3 bank_mask:0xf

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>
2025-07-29 06:33:20 +00:00
Georg Lehmann
eb4df58a3d aco/isel: refactor shared vgpr usage
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36390>
2025-07-29 06:33:20 +00:00
Georg Lehmann
8a2aca8d6f aco/select_alu: avoid vector get_alu_src for instructions with scalar operands
Foz-DB Navi21:
Totals from 1 (0.00% of 80237) affected shaders:
Instrs: 22 -> 21 (-4.55%)
CodeSize: 112 -> 108 (-3.57%)
Latency: 392 -> 386 (-1.53%)
InvThroughput: 25 -> 24 (-4.00%)
Copies: 4 -> 3 (-25.00%)
PreVGPRs: 8 -> 4 (-50.00%)
VALU: 10 -> 9 (-10.00%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35728>
2025-07-29 06:07:15 +00:00
Georg Lehmann
ad9c340d86 aco: insert VALU s_delay_alu for WMMA
This should avoid some SIMD stalls.

I think this special case was added to try to handle this case:

First Instruction: WMMA
Second Instruction: WMMA instruction with same VGPR of previous WMMA instruction’s Matrix D as Matrix C
Stall if the first and second instruction are not the same type of WMMA or use ABS/NEG on SRC2 of the second instruction

If I read it correctly, we shouldn't need a delay if the type is the same and no
modifier is used. That's kind of complex to handle, so leave it for now.
Not inserting any delays likely hurts more than this.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>
2025-07-29 05:48:29 +00:00
Georg Lehmann
413d0d2ec8 aco/statistics: update GFX12 WMMA cost
Based on marketing numbers, but they seem to match RGP.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>
2025-07-29 05:48:29 +00:00
Georg Lehmann
8f61c85880 aco/statistics: add latency to WMMA
Assume the normal VALU latency of 4 cycles.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>
2025-07-29 05:48:29 +00:00
Georg Lehmann
004f8aa2f4 aco: optimize get_alu_src with constant source and size > 1
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Emulated FSR4, Navi31:
Totals from 14 (100.00% of 14) affected shaders:
MaxWaves: 130 -> 131 (+0.77%)
Instrs: 67887 -> 67470 (-0.61%); split: -0.70%, +0.09%
CodeSize: 464428 -> 461668 (-0.59%); split: -0.67%, +0.07%
VGPRs: 2544 -> 2520 (-0.94%)
SpillVGPRs: 92 -> 89 (-3.26%)
Latency: 256823 -> 257574 (+0.29%); split: -0.37%, +0.66%
InvThroughput: 253895 -> 252929 (-0.38%); split: -0.40%, +0.02%
VClause: 997 -> 984 (-1.30%); split: -2.11%, +0.80%
Copies: 4501 -> 3788 (-15.84%); split: -17.35%, +1.51%
PreSGPRs: 504 -> 519 (+2.98%)
PreVGPRs: 2460 -> 2448 (-0.49%)
VALU: 57202 -> 56726 (-0.83%); split: -0.88%, +0.05%
SALU: 1231 -> 1384 (+12.43%)
VMEM: 3807 -> 3801 (-0.16%)
VOPD: 2693 -> 2303 (-14.48%); split: +1.19%, -15.67%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36090>
2025-07-25 11:33:00 +00:00
David Rosca
cc7178b9eb radv/video: Use the new defines for H264 SPS info flags
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Also set gaps_in_frame_num_value_allowed_flag.

Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36225>
2025-07-25 09:32:08 +00:00
Konstantin Seurer
48d15c3cf8 radv/bvh: Specialize the update shader for geometryCount==1
The geometry data can be loaded from push constants in that case.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35445>
2025-07-25 09:05:13 +00:00
Konstantin Seurer
b20ab07e4a radv/bvh: Update leaf nodes before refitting
This should reduce latency between refitting nodes and their parent
nodes.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35445>
2025-07-25 09:05:13 +00:00
Konstantin Seurer
33a694fe9b radv: Initialize base IDs when doing a BVH update with src!=dst
Fixes: 2d48b2c ("radv: Use subgroup OPs for BVH updates on GFX12")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35445>
2025-07-25 09:05:12 +00:00
Konstantin Seurer
4a4251dc16 radv/bvh: Use a fixed indices midpoint on GFX12
This saves a couple of loads inside the update shader.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35445>
2025-07-25 09:05:12 +00:00
Konstantin Seurer
7ad02416f6 radv/bvh: Fix flush in bit_writer_skip_to
If temp is not cleared, the next flushed dword will contain data from
the previous one.

Fixes: 97f6287 ("radv: Use the BVH8 format on GFX12")
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35445>
2025-07-25 09:05:11 +00:00
Konstantin Seurer
6201e24307 radv: Only write leaf node offsets when required
They are only used for serialization and position fetch which makes them
unnecessary most of the times.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35445>
2025-07-25 09:05:11 +00:00
Konstantin Seurer
703a154f29 radv: Add and use RADV_OFFSET_UNUSED
This deduplicates the logic to figure out what needs to be written.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35445>
2025-07-25 09:05:10 +00:00
David Rosca
1dda9d56cb radv/video: Disable rate control modes for H265 encode on VCN1
VCN1 doesn't have FW interface to enable cu_qp_delta with rate control
disabled, which means we can only support either rate control enabled or
disabled. Spec requires VK_VIDEO_ENCODE_RATE_CONTROL_MODE_DISABLED_BIT_KHR
to always be supported, thus the rate control modes needs to be disabled
on VCN1.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36353>
2025-07-25 08:46:14 +00:00
David Rosca
627fdb368d radv/video: Fix session_init and rc_per_pic on VCN2
Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36353>
2025-07-25 08:46:14 +00:00
David Rosca
c11508ad41 radv/video: Fix setting H265 encode cu_qp_delta on VCN2
Fixes H265 encoding with rate control disabled.

Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36353>
2025-07-25 08:46:13 +00:00
David Rosca
70473690f5 radv/video: Fix encode bitstream buffer offset and alignment
Caused issues on VCN2.

Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36353>
2025-07-25 08:46:13 +00:00
David Rosca
a30f91b71a radv/video: Add more encode session params overrides
Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36353>
2025-07-25 08:46:13 +00:00
David Rosca
e3715df4ee radv/video: Send slice control, spec misc and deblocking params every frame
These params can change per frame, so we need to send the values
to firmware on every frame instead of only once at session init.

Cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36353>
2025-07-25 08:46:12 +00:00
David Rosca
947e647df8 radv/video: Always send the latency command
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36353>
2025-07-25 08:46:12 +00:00
David Rosca
8368e3519e radv/video: Set H264 encode cabac_init_idc and Cb/Cr QP offsets
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36353>
2025-07-25 08:46:11 +00:00