Commit graph

1251 commits

Author SHA1 Message Date
Samuel Pitoiset
034014a165 aco: restore m0/exec before exiting the trap handler
Dumping VGPRs will overwrite m0 and exec and they need to be restored
if we want to return to the shader.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090>
2024-11-13 15:27:53 +00:00
Samuel Pitoiset
f4cf6a71ed aco: use a 64-bit mov to save exec in the trap handler shader
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32090>
2024-11-13 15:27:53 +00:00
Rhys Perry
17cc8a5a54 aco: remove load byte_align
8/16-bit loads given to instruction selection now always use VMEM and
scalar load instructions unless alignment easily allows a vector load.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Rhys Perry
d3ae1842a2 aco,ac/nir: flag loads to use smem in NIR
This pass will be re-used later.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Rhys Perry
0619e4db63 nir,aco,ac/llvm: add nir_op_alignbyte_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Rhys Perry
db0cbb7e9b aco: optimize nir_op_shfr with <32 src1
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>
2024-11-13 12:59:26 +00:00
Samuel Pitoiset
0c77469995 aco: fix saving/restoring VGPRS in the trap handler on GFX9
When ADD_TID_ENABLE=1, DATA_FORMAT is STRIDE[14:17], so the stride
was too large.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32109>
2024-11-13 11:12:54 +00:00
Georg Lehmann
7e8a08ae77 aco: use nir_def_all_uses_ignore_sign_bit
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31844>
2024-11-12 18:03:57 +00:00
Samuel Pitoiset
2d5df46c25 aco: emit nir_intrinsic_debug_break
s_trap is used to enter the trap.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32061>
2024-11-12 16:05:17 +00:00
Samuel Pitoiset
5f79b8ea2d radv,aco: save/restore overwritten VGPRs in the trap handler shader
The trap currently doesn't return to the shader but it will be needed
for example for the debug mode.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
6ec0c85908 radv,aco: use the trap handler layout struct while compiling the shader
It's less error prone to rely on the layout for offsets.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
44dfeb4479 radv,aco: add a separate function to compile the trap handler shader
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Samuel Pitoiset
62e335c779 radv,aco: dump more SQ_WAVE regs from the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32056>
2024-11-12 11:16:13 +00:00
Konstantin Seurer
f8ef1afec8 aco: Handle nir_debug_info_instr
Propagated debug info using p_debug_info and Program::debug_info.
Offsets into the shader binary are gathered during assembly.
This will be usefull for mapping back the disassembled shader to
nir, glsl or spirv.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:13 +00:00
Samuel Pitoiset
437bd63265 radv,aco: dump m0 and exec from the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:15 +00:00
Samuel Pitoiset
d1d41be43f aco: declare phys regs for tba_hi/tma_hi
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:15 +00:00
Samuel Pitoiset
13bab450a2 aco: fix storing SQ_WAVE_STATUS in the trap handler shader
SQ_WAVE_STATUS can change inside the trap because of SCC.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
494050d2ea aco: add a helper to dump SGPR to memory for the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
8c6f2fef1b aco: use scalar buffer stores for dumping SGPRS from the trap on GFX8
This avoids using any VGPRs on GFX8.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
17f6b4e51e aco: save/restore SCC in the trap handler shader
SCC is only updated on GFX9+ but let's do it by default because the
trap handler shader is likely going to be more and more complex over
time.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Samuel Pitoiset
7b4386facd aco: cleanup using fixed registers in the trap handler shader
It's easier to read and potentially less error prone.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32026>
2024-11-08 14:00:14 +00:00
Rhys Perry
b318fe47e9 aco: don't byte align global VMEM loads if it might be unsafe
Using the byte align path can be unsafe even when 12 byte loads are
supported.

fossil-db (navi21):
Totals from 185 (0.23% of 79395) affected shaders:
Instrs: 391501 -> 391575 (+0.02%); split: -0.03%, +0.05%
CodeSize: 2147336 -> 2147672 (+0.02%); split: -0.03%, +0.05%
Latency: 3762613 -> 3860941 (+2.61%); split: -0.01%, +2.62%
InvThroughput: 871429 -> 888013 (+1.90%); split: -0.08%, +1.98%
VClause: 9712 -> 10210 (+5.13%)
Copies: 53775 -> 53010 (-1.42%); split: -1.46%, +0.04%
VALU: 254009 -> 252146 (-0.73%)
SALU: 56698 -> 56699 (+0.00%); split: -0.00%, +0.00%
VMEM: 18503 -> 19601 (+5.93%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes: 391bf3ea30 ("aco: don't expand smem/mubuf global loads")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31807>
2024-11-06 19:07:16 +00:00
Georg Lehmann
2cd8a9fef7 amd: lower gl_FragCoord.w rcp in NIR
This allows NIR to remove the rcps if the application uses rcp(gl_FragCoord.w).
D3D provides w, not 1/w like GL/VK in the shader, so this is commonly used.

Foz-DB Navi21:
Totals from 2068 (2.61% of 79206) affected shaders:
MaxWaves: 45636 -> 45652 (+0.04%)
Instrs: 2173444 -> 2169671 (-0.17%); split: -0.18%, +0.00%
CodeSize: 11881304 -> 11867208 (-0.12%); split: -0.12%, +0.01%
VGPRs: 118000 -> 117968 (-0.03%)
Latency: 35689676 -> 35675909 (-0.04%); split: -0.06%, +0.02%
InvThroughput: 9167199 -> 9159801 (-0.08%); split: -0.08%, +0.00%
VClause: 45076 -> 45078 (+0.00%); split: -0.01%, +0.02%
SClause: 92503 -> 92366 (-0.15%); split: -0.31%, +0.17%
Copies: 140282 -> 140303 (+0.01%); split: -0.13%, +0.14%
Branches: 53347 -> 53346 (-0.00%); split: -0.01%, +0.00%
PreVGPRs: 96495 -> 96465 (-0.03%)
VALU: 1522980 -> 1519252 (-0.24%); split: -0.25%, +0.01%
SALU: 213451 -> 213460 (+0.00%); split: -0.02%, +0.02%

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31967>
2024-11-06 12:57:08 +00:00
Samuel Pitoiset
32a537b25b aco: use inlined constant offsets for storing SGPRs in the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31976>
2024-11-05 11:55:24 +00:00
Samuel Pitoiset
9bcf17ef5a aco: add support for the trap handler shader on GFX11
This has been verified on navi31.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
6d5a2ae928 aco: clear the current wave exception in the trap handler
This is required to re-enable VALU instructions in this wave, only
float exception seem to be affected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Samuel Pitoiset
81f4670ed6 radv,aco: dump all SGPRS from the trap handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31960>
2024-11-05 07:58:38 +00:00
Georg Lehmann
a58d2b59e9 aco: implement load_pixel_coord
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31864>
2024-11-04 12:34:30 +00:00
Samuel Pitoiset
1fa0fe1e0c aco: add support for the trap handler shader on GFX9-GFX10.3
This has been tested on navi21.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31926>
2024-11-04 10:48:52 +00:00
Samuel Pitoiset
281eb14df8 aco: fix reading registers from the trap handler shader
It should read 32-bit values, otherwise some MSB are 0 and it's missing
some information.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31926>
2024-11-04 10:48:52 +00:00
Samuel Pitoiset
49682fc0cb radv,aco: save SQ_WAVE_GPR_ALLOC from the trap handler
This would be used to dump SGPRs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31934>
2024-11-01 15:40:25 +00:00
Daniel Schürmann
10958d04d5 aco: Respect addressible SGPR limit in VS prologs
On Tonga, the effective SGPR limit is 96, including VCC.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31859>
2024-10-28 11:29:06 +00:00
Daniel Schürmann
c8348139fd nir: change signature of nir_src_is_divergent()
Now, it takes nir_src * instead of nir_src.
Also move the implementation to nir_divergence_analysis.c.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>
2024-10-24 10:06:17 +00:00
Daniel Schürmann
e2705a9d85 aco: set Precolored flag before register allocation
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31387>
2024-10-22 12:29:18 +00:00
Georg Lehmann
10951bb11a aco: fix 64bit extract_i8/extract_i16
The old code only sign extended to 32bit, with a zero hi half.

Fixes: 1f2518ef9f ("aco: implement nir_op_extract/nir_op_insert")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31734>
2024-10-21 07:13:57 +00:00
Rhys Perry
33eb2d7fe4 aco: skip uniformization of certain merge phis
If a source is a VGPR, then skip if it's safe. This fixes the regressions
from the previous commit.

fossil-db (navi31):
Totals from 5118 (6.45% of 79395) affected shaders:
MaxWaves: 159560 -> 159520 (-0.03%); split: +0.01%, -0.03%
Instrs: 2165351 -> 2138456 (-1.24%); split: -1.26%, +0.02%
CodeSize: 11260340 -> 11152460 (-0.96%); split: -0.98%, +0.02%
VGPRs: 218124 -> 225144 (+3.22%); split: -0.13%, +3.35%
Latency: 11059208 -> 11116102 (+0.51%); split: -0.18%, +0.69%
InvThroughput: 1252148 -> 1230193 (-1.75%); split: -1.77%, +0.01%
VClause: 39513 -> 39518 (+0.01%); split: -0.48%, +0.49%
SClause: 59434 -> 59378 (-0.09%); split: -0.11%, +0.02%
Copies: 165997 -> 156172 (-5.92%); split: -6.68%, +0.76%
PreSGPRs: 181203 -> 181094 (-0.06%)
PreVGPRs: 139393 -> 139731 (+0.24%)
VALU: 1244301 -> 1220769 (-1.89%); split: -1.91%, +0.02%
SALU: 200240 -> 199567 (-0.34%); split: -0.34%, +0.00%

fossil-db (navi21):
Totals from 35520 (44.74% of 79395) affected shaders:
MaxWaves: 951870 -> 951830 (-0.00%)
Instrs: 20229388 -> 20227776 (-0.01%); split: -0.01%, +0.00%
CodeSize: 105379916 -> 105513740 (+0.13%); split: -0.01%, +0.13%
VGPRs: 1375232 -> 1375400 (+0.01%)
Latency: 81046435 -> 81013986 (-0.04%); split: -0.04%, +0.00%
InvThroughput: 15269166 -> 15273295 (+0.03%); split: -0.01%, +0.04%
VClause: 354314 -> 354310 (-0.00%); split: -0.00%, +0.00%
SClause: 417049 -> 417047 (-0.00%); split: -0.00%, +0.00%
Copies: 1699445 -> 1699488 (+0.00%); split: -0.01%, +0.01%
Branches: 591274 -> 591269 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 1371062 -> 1370567 (-0.04%)
PreVGPRs: 1100716 -> 1100953 (+0.02%)
VALU: 11076189 -> 11075167 (-0.01%); split: -0.01%, +0.00%
SALU: 3648002 -> 3647378 (-0.02%); split: -0.02%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30211>
2024-10-10 14:59:27 +00:00
Daniel Schürmann
30e7644e5f aco: simplify Definition constructors
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31362>
2024-10-07 07:00:19 +00:00
Georg Lehmann
07032102e9 aco: use s_pack_lh for bitfield_select(0xffff)
Foz-DB Navi31
Totals from 13 (0.02% of 79206) affected shaders:
Instrs: 44871 -> 44838 (-0.07%)
CodeSize: 223804 -> 223608 (-0.09%)
Latency: 220186 -> 220191 (+0.00%); split: -0.01%, +0.02%
InvThroughput: 54169 -> 54186 (+0.03%); split: -0.00%, +0.03%
SALU: 5048 -> 5023 (-0.50%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31509>
2024-10-05 17:55:08 +00:00
Georg Lehmann
a6f82cf16d aco: use s_pack_hl for shfr16
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31509>
2024-10-05 17:55:08 +00:00
Georg Lehmann
78b8ec9c93 aco: optimize lanecount_to_mask
s_bfe uses 7 bits for the size, so when we extract from -1,
we can get all possible lane masks in one instruction.

Foz-DB Navi31:
Totals from 38601 (48.62% of 79395) affected shaders:
Instrs: 13670163 -> 13509738 (-1.17%)
CodeSize: 68011644 -> 67368308 (-0.95%)
Latency: 61203404 -> 61065419 (-0.23%); split: -0.23%, +0.00%
InvThroughput: 6897028 -> 6894634 (-0.03%); split: -0.05%, +0.01%
SALU: 1491291 -> 1342553 (-9.97%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>
2024-09-26 14:29:14 +00:00
Georg Lehmann
bcfc5c09fa amd: add offset to is_subgroup_invocation_lt_amd
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184>
2024-09-26 14:29:13 +00:00
Georg Lehmann
0e21cd9e15 aco/gfx10+: work around non uniform ds_append wave64 result
In wave64 for hw with native wave32, ds_append seems to be split in a load for
the low half and an atomic for the high half, and other LDS instructions can
be scheduled between the two.
Which means the result of the low half is unusable because it might be out of date.

I was only able to reproduce this issue in WGP mode, but be conservative and
apply the workaround in CU mode too.

Foz-DB Navi31:
Totals from 13 (0.02% of 79395) affected shaders:
Instrs: 7599 -> 7656 (+0.75%)
CodeSize: 39708 -> 39972 (+0.66%)
Latency: 83174 -> 83572 (+0.48%)
InvThroughput: 8271 -> 8357 (+1.04%)
Copies: 718 -> 717 (-0.14%)
VALU: 3689 -> 3703 (+0.38%)
SALU: 935 -> 965 (+3.21%)

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11921
Fixes: 45e935800a ("aco: implement nir_shared_append/consume_amd")

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31301>
2024-09-23 13:17:58 +00:00
Georg Lehmann
96b9f695d4 aco/isel: use upper bound for v_mul_u32_u24
The optimizer can use this.

Foz-DB Navi31:
Totals from 577 (0.73% of 79395) affected shaders:
Instrs: 4209237 -> 4206859 (-0.06%); split: -0.06%, +0.00%
CodeSize: 21511192 -> 21511984 (+0.00%); split: -0.02%, +0.02%
SpillSGPRs: 679 -> 671 (-1.18%)
Latency: 28448559 -> 28443863 (-0.02%); split: -0.04%, +0.03%
InvThroughput: 5221932 -> 5218443 (-0.07%); split: -0.09%, +0.02%
Copies: 297965 -> 298076 (+0.04%); split: -0.01%, +0.05%
VALU: 2385304 -> 2383500 (-0.08%)
SALU: 485553 -> 485533 (-0.00%); split: -0.01%, +0.00%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31245>
2024-09-19 17:08:47 +00:00
Georg Lehmann
45e935800a aco: implement nir_shared_append/consume_amd
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31075>
2024-09-19 16:21:48 +00:00
Georg Lehmann
27cf11dc8a aco: remove per block inf/nan/sz control
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31172>
2024-09-18 20:46:17 +00:00
Georg Lehmann
9850f759dd aco/isel: set per instruction float control modes
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31172>
2024-09-18 20:46:17 +00:00
Georg Lehmann
fc4b23130c aco/isel: add function to create builder for alu
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31172>
2024-09-18 20:46:17 +00:00
Samuel Pitoiset
15b1790a1e radv,aco: fix legacy vertex attributes when offset >= stride on GFX6-7
The indexing needs to be adjusted and the best solution seems to
use soffset instead of const_offset, it's simpler and generate less
prologs than passing the vertex binding strides to the prolog.

Fixes dEQP-VK.pipeline.*.vertex_input.legacy_vertex_attributes.*stride_1*.

Fixes: 38cbc3c605 ("radv: advertise VK_EXT_legacy_vertex_attributes")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31209>
2024-09-18 07:21:28 +00:00
Samuel Pitoiset
26d8f1a306 aco,radv,radeonsi: move has_epilog to the fragment shader info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31150>
2024-09-16 07:53:00 +00:00
Georg Lehmann
9bb10b58f3 aco: use v_cvt_pk_u8_f32 for f2u8
Foz-DB Navi31:
Totals from 42 (0.05% of 79395) affected shaders:
Instrs: 3253747 -> 3248867 (-0.15%); split: -0.15%, +0.00%
CodeSize: 16690136 -> 16661772 (-0.17%); split: -0.17%, +0.00%
VGPRs: 4176 -> 4128 (-1.15%)
Latency: 18485157 -> 18479752 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 3659404 -> 3658222 (-0.03%); split: -0.03%, +0.00%
Copies: 231891 -> 228145 (-1.62%)
VALU: 1785800 -> 1782054 (-0.21%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29532>
2024-09-02 19:49:20 +00:00