Commit graph

16171 commits

Author SHA1 Message Date
Friedrich Vock
8135a7614d radv/rt: Remove nir_intrinsic_execute_callable instrs in monolithic mode
It's allowed to place OpExecuteCallableKHR in a SPIR-V, even if the RT
pipeline doesn't contain any callable shaders. Unreal hits this case and
crashes. We can assume the intrinsic never gets executed, so we can
simply remove it.

Cc: mesa-stable
(cherry picked from commit 0c02a7e8e8)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-13 09:18:00 -08:00
Samuel Pitoiset
cab3f06713 radv: fix disabling DCC for stores with drirc
Displayable DCC should also be disabled, otherwise it's asserting
somewhere in ac_surface.c

Fixes: e3d1f27b31 ("radv: add radv_disable_dcc_stores and enable for Indiana Jones: The Great Circle")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4d1aa9a2d0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-12 09:43:47 -08:00
Samuel Pitoiset
f565dcdf54 radv: add radv_disable_dcc_stores and enable for Indiana Jones: The Great Circle
Likely a game bug but can't be 100% sure because the game uses RT by
default and renderdoc still doesn't have support for it.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit e3d1f27b31)

Conflicts:
	src/util/00-radv-defaults.conf

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-12 09:43:43 -08:00
Samuel Pitoiset
9a73b89f28 radv: fix initializing HTILE when the image has VRS rates
VRS rates should only be preserved for clears, otherwise the HTILE
buffer should be cleared completely.

This fixes some failures/flakes in CI.

Fixes: 8197d744f5 ("radv: Do not overwrite VRS rates when doing fast clears")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 8b755840fc)

Conflicts:
	src/amd/vulkan/meta/radv_meta_clear.c

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-12 09:38:30 -08:00
Friedrich Vock
ee78db8c14 aco/lower_to_hw_instr: Check the right instruction's opcode
instr is the branch instruction, its opcode won't ever be writelane. We
should check inst instead.

Found by inspection.

Cc: mesa-stable
(cherry picked from commit 845660f2b7)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-12 09:32:44 -08:00
Georg Lehmann
e92e02a71f aco/ra: don't write to scc/ttmp with s_fmac
Fixes: 4bd229ac50 ("aco/gfx11.5: select SOP2 float instructions")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 65506e635b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-12 09:32:42 -08:00
Georg Lehmann
1a98685055 aco/ra: disallow s_cmpk with scc operand
Fixes: 2d6b0a4177 ("aco/optimizer: Optimize SOPC with literal to SOPK.")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 0b9e2a5427)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-12 09:32:42 -08:00
Georg Lehmann
2ba6a1f300 aco/ra: don't write to exec/ttmp with mulk/addk/cmovk
ttmp sgprs are readonly outside of trap handlers, so the instructions were
probably skipped. RA should also never create additional exec writes.

Fixes: e06773281b ("aco/ra: Optimize some SOP2 instructions with literal to SOPK.")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit fe0c72caec)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-12 09:32:41 -08:00
Dave Airlie
37d72c978f radv/video: set max slice counts to 1 for h264/5 encode
Right now the driver doesn't support multi-slice encodes, so
report the correct value.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Autumn Ashton
Cc: mesa-stable
(cherry picked from commit 699afb88ec)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-12 09:32:36 -08:00
Rhys Perry
3cffcc3da7 aco: don't CSE p_shader_cycles_hi_lo_hi
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: fae2a85d57 ("aco/gfx12: implement subgroup shader clock")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12243
(cherry picked from commit ab26b99c2c)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-06 08:44:04 -08:00
Georg Lehmann
f750108aa9 aco/gfx12: disable vinterp ddx/ddy optimization
This only seems to work on gfx11 and gfx11.5, and it's only faster on gfx11.5.

We could continue to use vinterp, with constants copied to vgprs, but
whether that's beneficial depends on the shader.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12250
(cherry picked from commit 7425e71ae0)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32730>
2024-12-06 08:44:03 -08:00
Samuel Pitoiset
69e950d853 radv: fix skipping on-disk shaders cache when not useful
This was just broken because individual shaders were still stored
on-disk in many situations:
- for shader object, all compute/graphics shaders were stored
- for fast-GPL, graphics shaders were stored
- for pipeline binaries, when the create flag was used
- for rt capture/replay and ray history

This should stop storing unused binaries on-disk and save space.

Found this by inspection.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32476>
2024-12-04 14:09:01 -08:00
Georg Lehmann
dade5eab3f radv: fix reporting mesh/task/rt as supported dgc indirect stages
Fixes: 8300378bf3 ("radv: advertise VK_EXT_device_generated_commands on GFX8+")

Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32457>
(cherry picked from commit b961537a17)
2024-12-04 09:54:34 -08:00
David Rosca
236b71542e radv/video: Always use setup reference slot when valid
Reviewed-by: Benjamin Cheng <ben@bcheng.me>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10977
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32414>
(cherry picked from commit 76e3004fef)
2024-12-02 10:54:59 -08:00
Konstantin
108ab09453 radv: Do not overwrite VRS rates when doing fast clears
Fixes a whole bunch of VRS tests on navi24.

cc: mesa-stable

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32318>
(cherry picked from commit 8197d744f5)
2024-12-02 10:54:54 -08:00
Hans-Kristian Arntzen
d68c6558bd radv: Fix missing gang barriers for task shaders.
It's also possible to use ALL_GRAPHICS and PRE_RASTERIZATION as
alternatives.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32323>
(cherry picked from commit fc9ae4b974)
2024-11-25 14:44:37 -08:00
Daniel Schürmann
9448cd6071 aco/ra: use bitset for sgpr_operands_alias_defs
We cannot rely on SGPR Temps being fully aligned to 64 SGPRs.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32217>
(cherry picked from commit 17da551133)
2024-11-21 09:13:33 -08:00
Daniel Schürmann
b1f8e15781 aco/ra: set Pseudo_instruction::scratch_sgpr to SCC if it doesn't need to be preserved
Also ensure that 'needs_scratch_reg' is always true if SCC might be overwritten.
Few changes, because some p_split_vector get SCC as scratch reg assigned,
and thus, can inhibit some postRA optimizations.

Totals from 3 (0.00% of 79395) affected shaders: (Navi31)
Instrs: 10501 -> 10500 (-0.01%); split: -0.02%, +0.01%
CodeSize: 51580 -> 51520 (-0.12%); split: -0.12%, +0.01%
Latency: 84166 -> 84174 (+0.01%)
InvThroughput: 13109 -> 13111 (+0.02%)
SALU: 859 -> 860 (+0.12%)

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32217>
(cherry picked from commit a04e096339)
2024-11-21 09:13:32 -08:00
Samuel Pitoiset
185ae19141 radv: add a new drirc option to disable DCC for mips and enable it for RDR2
The game aliases two images. It binds a memory object to two different
images, the first one being an image with 4 mips and the second with
only one mip but the bind offset is incorrect. It's like it queried
the first image size with different usage flags, so that DCC was
disabled.

Force disabling DCC for mips fixes the incorrect rendering and doesn't
hurt performance.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10200
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 2f13723c0a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
2024-11-19 14:29:07 -08:00
Michel Zou
49e5090f79 ac/gpu_info: Fix missing prototype mingw error
Fixes: 246051ebc6 ("ac/gpu_info: print 32bpp modifiers")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Closes #8858

(cherry picked from commit 795a36325a)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
2024-11-19 14:29:06 -08:00
Samuel Pitoiset
b4b12c6708 radv: fix ignoring src stage mask when dst stage mask is BOTTOM_OF_PIPE
Otherwise the driver doesn't synchronize if there are image layout
transitions.

This fixes rendering issues with displayable DCC (usually black squares
in the bottom of screen). This mostly happens when an application
uses a lower resolution than the screen supports and fshack
(wine/proton) which upscales images uses COMPUTE_SHADER->BOTTOM_OF_PIPE
for the barrier after a dispatch.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11547
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11600
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11789
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8705
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9890
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit c08d2c40ed)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
2024-11-18 09:40:05 -08:00
David Rosca
c1517edde6 radv/video: Avoid selecting rc layer over maximum
Vulkan spec doesn't say if this is allowed or not, but trying
to do this will hang.

Fixes: 4a19047d32 ("radv/video: Select temporal layer when encoding each frame")
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit d1c1a33b35)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
2024-11-14 09:31:38 -08:00
David Rosca
bab3391381 radv/video: Report correct encodeInputPictureGranularity
Only aligned size can be encoded.

Fixes: 54d499818c ("radv/video: add initial support for encoding with h264.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e941acfb9d)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
2024-11-14 09:31:37 -08:00
David Rosca
42822bbca2 radv/video: Fix HEVC slice control
This needs to use aligned size, otherwise it will output two
slices when the size is not 64 aligned.

Fixes: 967e4e09de ("radv/video: add h265 encode support")
Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e4ec135d8b)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
2024-11-14 09:31:36 -08:00
David Rosca
ecc3f03d83 radv/video: Fix H264 slice control
This needs to use aligned size, otherwise it will output two
slices when the size is not 16 aligned.

Fixes: 54d499818c ("radv/video: add initial support for encoding with h264.")

Reviewed-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6a121f1507)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
2024-11-14 09:31:36 -08:00
Samuel Pitoiset
9cc07bbd09 radv: mark some GFX6-7 GPUs as Vulkan 1.3 conformant
It's the first time RADV is Vulkan conformant on GFX6-7! Some chips
are missing because we don't have access but most of the GFX6-7 GPUs
are covered.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32022>
2024-11-07 11:50:10 +00:00
Samuel Pitoiset
b67218645d radv: save the trap handler report in the HOME directory
It's similar to where GPU hang reports are saved.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31988>
2024-11-07 09:28:16 +01:00
Rhys Perry
215c44c124 aco: apply extract to v_cvt_f32_ubyte0
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
f1a932bc29 aco: apply extract to p_extract_vector
fossil-db (navi21):
Totals from 46 (0.06% of 79395) affected shaders:
Instrs: 80126 -> 79944 (-0.23%); split: -0.27%, +0.04%
CodeSize: 486860 -> 485668 (-0.24%); split: -0.31%, +0.06%
Latency: 1615395 -> 1614218 (-0.07%); split: -0.07%, +0.00%
InvThroughput: 705479 -> 705013 (-0.07%); split: -0.07%, +0.00%
Copies: 18934 -> 18797 (-0.72%); split: -0.98%, +0.25%
VALU: 52452 -> 52268 (-0.35%); split: -0.41%, +0.06%
SALU: 17253 -> 17255 (+0.01%); split: -0.02%, +0.03%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
6cb9d39bc2 aco: combine extracts with sub-dword definitions
fossil-db (navi21):
Totals from 23 (0.03% of 79395) affected shaders:
Instrs: 55133 -> 55099 (-0.06%)
CodeSize: 335744 -> 335512 (-0.07%)
Latency: 1709146 -> 1709031 (-0.01%)
InvThroughput: 613788 -> 613713 (-0.01%)
Copies: 14405 -> 14407 (+0.01%); split: -0.03%, +0.04%
VALU: 37038 -> 37000 (-0.10%)
SALU: 11125 -> 11131 (+0.05%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
30af7ae44f aco: add and use apply_extract_twice helper
This will be used in the next commit.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
05d0fa894e aco: allow applying sign-extended sel to p_extract more often
In the case of v1=p_extract(v1=p_extract(src, 0, 16, 1), 0, 32, 0).
When we apply extracts with sub-dword definitions, this will also
include v2b=p_extract(v2b=p_extract(src, 0, 8, 1), 0, 16, 0).

No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
e47bc3e750 aco: shrink code size of some p_extract
fossil-db (navi21):
Totals from 37 (0.05% of 79395) affected shaders:
CodeSize: 2048204 -> 2047836 (-0.02%)

fossil-db (navi31):
Totals from 307 (0.39% of 79395) affected shaders:
CodeSize: 3075732 -> 3065236 (-0.34%); split: -0.34%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
d285333800 aco: add a bit more p_extract/p_insert validation
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
d3ac69f79b aco: handle SGPR limitations when applying extract
We were already doing this, but missing it in a few places.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
07e28dad75 aco: disallow p_extract(,,32,)
Nothing uses these.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
f528597906 aco: check for SDWA before applying extract to lshl/cvt_f32
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
6ce51ea168 aco/gfx11: fix v1b=p_extract(src, 0, 16, 0)
This is weird, but the SDWA path supports this.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31762>
2024-11-06 19:31:20 +00:00
Rhys Perry
b318fe47e9 aco: don't byte align global VMEM loads if it might be unsafe
Using the byte align path can be unsafe even when 12 byte loads are
supported.

fossil-db (navi21):
Totals from 185 (0.23% of 79395) affected shaders:
Instrs: 391501 -> 391575 (+0.02%); split: -0.03%, +0.05%
CodeSize: 2147336 -> 2147672 (+0.02%); split: -0.03%, +0.05%
Latency: 3762613 -> 3860941 (+2.61%); split: -0.01%, +2.62%
InvThroughput: 871429 -> 888013 (+1.90%); split: -0.08%, +1.98%
VClause: 9712 -> 10210 (+5.13%)
Copies: 53775 -> 53010 (-1.42%); split: -1.46%, +0.04%
VALU: 254009 -> 252146 (-0.73%)
SALU: 56698 -> 56699 (+0.00%); split: -0.00%, +0.00%
VMEM: 18503 -> 19601 (+5.93%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 391bf3ea30 ("aco: don't expand smem/mubuf global loads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31807>
2024-11-06 19:07:16 +00:00
Marek Olšák
2a9d590b6c Revert "amd/ci: adjust stoney traces checksums"
This reverts commit 5882b5b93b.

It was added because nir_opt_varyings was accidentally disabled.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31994>
2024-11-06 15:51:51 +00:00
Eric Engestrom
cdeb284dce amd/ci: document flakes seen lately
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32007>
2024-11-06 14:14:38 +00:00
Georg Lehmann
2cd8a9fef7 amd: lower gl_FragCoord.w rcp in NIR
This allows NIR to remove the rcps if the application uses rcp(gl_FragCoord.w).
D3D provides w, not 1/w like GL/VK in the shader, so this is commonly used.

Foz-DB Navi21:
Totals from 2068 (2.61% of 79206) affected shaders:
MaxWaves: 45636 -> 45652 (+0.04%)
Instrs: 2173444 -> 2169671 (-0.17%); split: -0.18%, +0.00%
CodeSize: 11881304 -> 11867208 (-0.12%); split: -0.12%, +0.01%
VGPRs: 118000 -> 117968 (-0.03%)
Latency: 35689676 -> 35675909 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 9167199 -> 9159801 (-0.08%); split: -0.08%, +0.00%
VClause: 45076 -> 45078 (+0.00%); split: -0.01%, +0.02%
SClause: 92503 -> 92366 (-0.15%); split: -0.31%, +0.17%
Copies: 140282 -> 140303 (+0.01%); split: -0.13%, +0.14%
Branches: 53347 -> 53346 (-0.00%); split: -0.01%, +0.00%
PreVGPRs: 96495 -> 96465 (-0.03%)
VALU: 1522980 -> 1519252 (-0.24%); split: -0.25%, +0.01%
SALU: 213451 -> 213460 (+0.00%); split: -0.02%, +0.02%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31967>
2024-11-06 12:57:08 +00:00
Rhys Perry
5375d77488 aco: wait for scratch stores to complete before dealloc_vgprs
fossil-db (navi31):
Totals from 392 (0.49% of 79395) affected shaders:
Instrs: 5052043 -> 5054100 (+0.04%)
CodeSize: 26701200 -> 26709428 (+0.03%)
Latency: 43614861 -> 43615368 (+0.00%)
InvThroughput: 7353147 -> 7353216 (+0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:05 +00:00
Rhys Perry
575f24d19f aco: don't emit early exit over dealloc_vgprs
fossil-db (navi31):
Totals from 3308 (4.17% of 79395) affected shaders:
Instrs: 387145 -> 375373 (-3.04%)
CodeSize: 2018276 -> 1964380 (-2.67%)
Latency: 6588004 -> 6549068 (-0.59%)
InvThroughput: 458792 -> 457025 (-0.39%); split: -0.39%, +0.00%
Branches: 10710 -> 7402 (-30.89%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:05 +00:00
Rhys Perry
295b7d606f aco: insert NOP before dealloc_vgpr in the insert_NOPs pass
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:05 +00:00
Rhys Perry
4dfc564669 aco: fix printing of block_kind_discard_early_exit
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:04 +00:00
Rhys Perry
0ad713ca9f aco: add waitcnt build helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24884>
2024-11-06 09:58:04 +00:00
Timur Kristóf
766617e8da radv: Enable NGG culling by default on GFX10.
We never took the time to actually test this, but it works fine.
Improves performance on Navi 10 in the following test cases:

Baldur's Gate 3 Vulkan: up to 10%
Witcher 3 D3D11: around 4%
Granite primitive stress test: 107%
FSR2 sample app: 57%

Notes:
NGG is still disabled on Navi 14.
Not tested on Navi 12.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31971>
2024-11-06 03:16:54 +00:00
Timur Kristóf
6bf19b2d70 radv: Increase NGG culling PS param limit to 12 on GFX10.
Helps performance in Baldur's Gate 3 on Navi 10
when NGG culling is enabled.

Also fix the description of the RADV_PERFTEST=nggc env var.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31971>
2024-11-06 03:16:53 +00:00
Evan
c3c80491f9 amd/vpelib: Input Format Adjustment
Reviewed-by: Jiali Zhao <Jiali.Zhao@amd.com>
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Signed-off-by: Evan <evan.damphousse@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31918>
2024-11-06 02:19:39 +00:00