Commit graph

184801 commits

Author SHA1 Message Date
Timur Kristóf
edde762b56 ac/nir/ngg: Move emitting GS vertex param exports to if.
On GFX10-10.3 (when no attribute ring is present), only emit
the GS vertex parameter exports on the vertex export threads.
Other threads don't have anything to export.

Move this code around to make it a bit easier to follow.
Also add some comments to better explain what's what.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:23 -06:00
Timur Kristóf
68dbcdd935 ac/nir/ngg: Move wait attr ring workaround for GS to better place.
The call depends on the phis created by create_output_phis so
the code becomes more readable if we move it closer to that.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:20 -06:00
Timur Kristóf
9acc2f2435 ac/nir/ngg: Remove dead code for attribute ring stores.
These are replaced by the new helpers added in previous commits.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:17 -06:00
Timur Kristóf
f528de896e ac/nir/ngg: Refactor export_pos0_wait_attr_ring.
There is no need to create phis in this function anymore,
because they can be already created by create_output_phis before.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:14 -06:00
Timur Kristóf
badbb01c5d ac/nir/ngg: Refactor GS attribute ring stores.
Use the new helper.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:11 -06:00
Timur Kristóf
23c615bde2 ac/nir/ngg: Refactor VS/TES attribute ring stores.
Use the new helper.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:09 -06:00
Timur Kristóf
f38680aa1c ac/nir: Introduce ac_nir_store_parameters_to_attr_ring.
This function is going to be used for storing parameter outputs
to the attribute ring, instead of the current implementation.

It is going to be taken into use in the following commits.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:06 -06:00
Timur Kristóf
c4b45f1ec8 ac/nir: Pass ac_nir_prerast_out to ac_nir_export_position.
In a subsequent	commit,	ac_nir_export_position will
start using other fields from ac_nir_prerast_out.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:04 -06:00
Timur Kristóf
3d291a98c4 ac/nir: Pass ac_nir_prerast_out to ac_nir_export_parameters.
In a subsequent commit, ac_nir_export_parameters will
start using other fields from ac_nir_prerast_out.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:31:01 -06:00
Timur Kristóf
896237b52e ac/nir/ngg: Simplify updating mesh shader output info.
All 64-bit outputs are already lowered to 32-bit.
There is no need to handle them here.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:30:58 -06:00
Timur Kristóf
f460e3a36b ac/nir/ngg: Use ac_nir_prerast_out in mesh shader lowering.
This will help us share more code between the mesh shader lowering
and other passes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32640>
2024-12-28 10:30:53 -06:00
David Rosca
a642ff15a6 frontends/va: Fix deinterlace filter
Deinterlace filter uses interlaced buffer for output which needs
to be converted to progressive. Add back code that handles this.

Fixes: c324364f39 ("frontends/va: Only use interlaced surfaces when progressive is not supported")
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32768>
2024-12-28 12:02:42 +00:00
Lionel Landwerlin
5e4aeb3ad7 anv: fix index buffer size changes
With vkCmdBindIndexBuffer2KHR only the provided size can change which
currently fails to reprogram the index buffer properly.

Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com>
Fixes: 5c2aca456e ("anv: implement vkCmdBindIndexBuffer2KHR")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12376
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32785>
2024-12-27 13:20:49 +00:00
David Rosca
96cb12ac68 radv/amdgpu: Set VCN version for ac_parse_ib
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32760>
2024-12-27 08:17:16 +00:00
David Rosca
e3d602de98 ac/parse_ib: Parse VCN IB_COMMON_OP_WRITEMEMORY
And more small fixes.

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32760>
2024-12-27 08:17:16 +00:00
Qiang Yu
b0c47871ec ac: remove ac_nir_lower_subdword_loads
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32781>
2024-12-27 01:58:38 +00:00
Qiang Yu
403cdacaff radeonsi: replace ac_nir_lower_subdword_loads
ac_nir_lower_mem_access_bit_sizes() does the work of it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32781>
2024-12-27 01:58:38 +00:00
Qiang Yu
955ae53efd radeonsi: fix OpenCL piglit tests fails when using ACO
Now no regression compared with using LLVM.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32781>
2024-12-27 01:58:38 +00:00
Qiang Yu
21f888a3ed ac,radv: move ac_nir_lower_bit_size_callback to common place
To be used by radeonsi for OpenCL.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32781>
2024-12-27 01:58:38 +00:00
Qiang Yu
5f601361ed ac/nir: lower access for shared and scratch memory
OpenCL may load and store vec16 data, while ACO only
support <=32byte. Radeonsi is going to use
ac_nir_lower_mem_access_bit_sizes() for lowering these
access.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32781>
2024-12-27 01:58:38 +00:00
Qiang Yu
9a8eef282b radeonsi: fix OpenCL shader compile fail
sel->stage is assigned with MESA_SHADER_COMPUTE statically,
change to use nir->info.stage need to handle with MESA_SHADER_KERNEL.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12372
Fixes: 9b7ea720c9 ("radeonsi: use nir->info instead of sel->info.base")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32781>
2024-12-27 01:58:38 +00:00
Marek Olšák
c0e5e8f932 amd: update addrlib
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32687>
2024-12-26 21:02:21 +00:00
Georg Lehmann
33a73203b0 aco/isel: skip and(exec) for top level demote_if/terminate_if
In nested control flow this is nessecary to not demote/terminate invocations
that are part of the global but not part of the local mask.

At the top level, the masks are the same and no additional invocations
can be accidentally disabled.

Foz-DB Navi21:
Totals from 2095 (2.64% of 79395) affected shaders:
Instrs: 1058326 -> 1056839 (-0.14%)
CodeSize: 5632480 -> 5626616 (-0.10%)
Latency: 12082761 -> 12080520 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 2246677 -> 2246636 (-0.00%); split: -0.00%, +0.00%
Copies: 114446 -> 114433 (-0.01%)
SALU: 230585 -> 229098 (-0.64%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32755>
2024-12-26 18:34:38 +00:00
Georg Lehmann
5b4b195f1b nir: optimize unpacking 8bit values from a 64bit source
Useful for load vectorization.

Foz-DB Navi21:
Totals from 299 (0.38% of 79395) affected shaders:
Instrs: 287818 -> 284333 (-1.21%); split: -1.21%, +0.00%
CodeSize: 1557124 -> 1540544 (-1.06%); split: -1.07%, +0.00%
Latency: 4009407 -> 4012389 (+0.07%); split: -0.05%, +0.12%
InvThroughput: 1260613 -> 1262530 (+0.15%); split: -0.01%, +0.17%
VClause: 5472 -> 5369 (-1.88%); split: -1.92%, +0.04%
SClause: 5419 -> 5305 (-2.10%); split: -2.58%, +0.48%
Copies: 36709 -> 36060 (-1.77%); split: -1.81%, +0.04%
PreSGPRs: 11861 -> 11655 (-1.74%)
SALU: 66920 -> 64310 (-3.90%)

Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32778>
2024-12-26 17:50:32 +00:00
Marek Olšák
47cdec24ee radeonsi: remove unused code
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32780>
2024-12-26 10:12:43 +00:00
Marek Olšák
357ee7f699 radeonsi: switch si_get_blitter_vs to IO intrinsics
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32780>
2024-12-26 10:12:43 +00:00
Marek Olšák
a0579f75fb radeonsi: fix a TCS regression
This change caused the regression:
@@ -853,7 +853,7 @@ bool si_llvm_compile_shader(struct si_screen *sscreen, struct ac_llvm_compiler *

       /* Reset the shader context. */
       ctx.shader = shader;
-      ctx.stage = sel->stage;
+      ctx.stage = nir->info.stage;

       bool same_thread_count = shader->key.ge.opt.same_patch_vertices;
       si_build_wrapper_function(&ctx, parts, same_thread_count);

because "nir" contains the previous shader (LS), not the current shader (HS).
Fix it by using prev_nir for the previous shader, so that we can keep using
"nir".

Fixes: 9b7ea720c9 - radeonsi: use nir->info instead of sel->info.base

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32780>
2024-12-26 10:12:43 +00:00
Marek Olšák
227a894775 radeonsi/ci: update failures
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32780>
2024-12-26 10:12:43 +00:00
Marek Olšák
19c00c586e ac/llvm: remove unused code
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32780>
2024-12-26 10:12:43 +00:00
Marek Olšák
c6fd69bd5e ac: remove unused code
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32780>
2024-12-26 10:12:43 +00:00
Evan
4e89690878 amd/vpelib: Shaper Refactor
- Refactor Shaper code to apply linear OR PQ based on input transfer function
- Program gamma based on shaper expected input CS
- fix fp16 input handling
- fix snake case in update_whitepoint

Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Signed-off-by: Evan Damphousse <evan.damphousse@amd.com>
Acked-by: Chih-Wei Chien <Chih-Wei.Chien@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32695>
2024-12-26 01:23:59 +00:00
Hsieh, Mike
596d9ff8cf amd/vpelib: Refactor 3D LUT parameters
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Chih-Wei Chien <Chih-Wei.Chien@amd.com>
Signed-off-by: Mike Hsieh <Mike.Hsieh@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32695>
2024-12-26 01:23:59 +00:00
Chen, Phoebe
7d326ab082 amd/vpelib: Refactor YUV format check
Using general vpe_is_yuv* helper function for the condition check

Reviewed-by: Evan Damphousse <evan.damphousse@amd.com>
Reviewed-by: Roy Chan <Roy.Chan@amd.com>
Reviewed-by: Navid Assadian <Navid.Assadian@amd.com>
Acked-by: Chih-Wei Chien <Chih-Wei.Chien@amd.com>
Signed-off-by: Phoebe Chen <phoebe.chen@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32695>
2024-12-26 01:23:59 +00:00
Ian Romanick
0f3a350087 brw/nir: Don't generate scalar byte to float conversions on DG2+ in optimize_extract_to_float
The lowering code does not generate efficient code. It is better to
just not emit the bad thing in the first place. The shaders that I
examined had blocks of NIR like:

    con 32     %527 = extract_u8 %456.o, %5 (0x0)
    con 32     %528 = extract_u8 %456.o, %35 (0x1)
    con 32     %529 = extract_u8 %456.o, %14 (0x2)
    con 32     %530 = extract_u8 %456.o, %11 (0x3)
    con 32     %531 = u2f32 %527
    con 32     %532 = u2f32 %528
    con 32     %533 = u2f32 %529
    con 32     %534 = u2f32 %530

In some cases the u2f results are multiplied with 1/255. There may be
a slightly more efficient way to do this by doing something like

    mov(8)    g40<1>UW        g12.1<32,8,4>UB
    mov(8)    g41<1>UW        g12.2<32,8,4>UB
    mov(8)    g42<1>UW        g12.3<32,8,4>UB
    mov(8)    g60<1>F         g12<32,8,4>UB
    mov(8)    g61<1>F         g40<1,1,0>UW
    mov(8)    g62<1>F         g41<1,1,0>UW
    mov(8)    g63<1>F         g42<1,1,0>UW

In SIMD16 and SIMD32 that would save temporary register space. It could
save a register in SIMD8 by using g40.8 instead of g42. Making that
happen might be tricky. Maybe we should just add a special NIR opcode
that converts a packed uint32 to a vec4?

v2: Add a bunch of documentation explaining what's going on. Suggested
by Ken.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 18228689 -> 18228720 (<.01%)
instructions in affected programs: 43091 -> 43122 (0.07%)
helped: 0 / HURT: 30

total cycles in shared programs: 932542994 -> 932544290 (<.01%)
cycles in affected programs: 8150758 -> 8152054 (0.02%)
helped: 15 / HURT: 17

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 142890605 -> 142890392 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21655049536 -> 21654693720 (-0.00%); split: -0.00%, +0.00%

Totals from 181 (0.03% of 553251) affected shaders:
Instrs: 188022 -> 187809 (-0.11%); split: -0.12%, +0.01%
Cycle count: 85291658 -> 84935842 (-0.42%); split: -0.47%, +0.05%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 154438050 -> 154436980 (-0.00%)
Cycle count: 15334650326 -> 15334644375 (-0.00%); split: -0.00%, +0.00%
Spill count: 56754 -> 56706 (-0.08%)
Fill count: 95919 -> 95808 (-0.12%)
Scratch Memory Size: 2306048 -> 2304000 (-0.09%)
Max live registers: 32469924 -> 32469899 (-0.00%)

Totals from 112 (0.02% of 642922) affected shaders:
Instrs: 156186 -> 155116 (-0.69%)
Cycle count: 11111478 -> 11105527 (-0.05%); split: -0.62%, +0.56%
Spill count: 1766 -> 1718 (-2.72%)
Fill count: 2815 -> 2704 (-3.94%)
Scratch Memory Size: 78848 -> 76800 (-2.60%)
Max live registers: 11526 -> 11501 (-0.22%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
1a7593ed36 brw/nir: Treat some ballot as convergent
v2: Fix for Xe2.

v3: Add a comment explaining the use of bld instead of xbld. Suggested
by Ken. Fix a bug in handing is_scalar source. Noticed by me while
applying Ken's review feedback.

shader-db:

Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18228657 -> 18228689 (<.01%)
instructions in affected programs: 9333 -> 9365 (0.34%)
helped: 2 / HURT: 26

total cycles in shared programs: 932511560 -> 932542994 (<.01%)
cycles in affected programs: 2263040 -> 2294474 (1.39%)
helped: 7 / HURT: 27

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20700370 -> 20700392 (<.01%)
instructions in affected programs: 18579 -> 18601 (0.12%)
helped: 1 / HURT: 28

total cycles in shared programs: 888385851 -> 888386325 (<.01%)
cycles in affected programs: 2571368 -> 2571842 (0.02%)
helped: 14 / HURT: 6

total spills in shared programs: 4373 -> 4371 (-0.05%)
spills in affected programs: 71 -> 69 (-2.82%)
helped: 1 / HURT: 0

total fills in shared programs: 4657 -> 4653 (-0.09%)
fills in affected programs: 196 -> 192 (-2.04%)
helped: 1 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 142887258 -> 142890605 (+0.00%); split: -0.00%, +0.00%
Cycle count: 21653599282 -> 21655049536 (+0.01%); split: -0.00%, +0.01%
Max live registers: 47942973 -> 47942837 (-0.00%)

Totals from 22209 (4.01% of 553251) affected shaders:
Instrs: 4337679 -> 4341026 (+0.08%); split: -0.00%, +0.08%
Cycle count: 261852040 -> 263302294 (+0.55%); split: -0.38%, +0.93%
Max live registers: 1299670 -> 1299534 (-0.01%)

Meteor Lake, DG2, Tiger Lake, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 156599915 -> 156590882 (-0.01%); split: -0.01%, +0.00%
Cycle count: 16940072009 -> 16940902317 (+0.00%); split: -0.01%, +0.01%
Max live registers: 32610801 -> 32610488 (-0.00%)
Max dispatch width: 5730736 -> 5731744 (+0.02%); split: +0.12%, -0.11%

Totals from 35528 (5.52% of 643617) affected shaders:
Instrs: 6175409 -> 6166376 (-0.15%); split: -0.21%, +0.06%
Cycle count: 230679923 -> 231510231 (+0.36%); split: -0.46%, +0.82%
Max live registers: 1354716 -> 1354403 (-0.02%)
Max dispatch width: 167648 -> 168656 (+0.60%); split: +4.26%, -3.66%

Ice Lake
Totals:
Instrs: 155330276 -> 155318037 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15019092327 -> 15019637026 (+0.00%); split: -0.00%, +0.01%
Max live registers: 32640341 -> 32637305 (-0.01%)
Max dispatch width: 5780720 -> 5780688 (-0.00%); split: +0.02%, -0.02%

Totals from 37773 (5.85% of 645641) affected shaders:
Instrs: 6643030 -> 6630791 (-0.18%); split: -0.24%, +0.05%
Cycle count: 223589025 -> 224133724 (+0.24%); split: -0.29%, +0.53%
Max live registers: 1491781 -> 1488745 (-0.20%)
Max dispatch width: 167600 -> 167568 (-0.02%); split: +0.75%, -0.77%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
f2d2014636 brw/nir: Simplify get_nir_image_intrinsic_image and get_nir_buffer_intrinsic_index
shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 20041625 -> 20041634 (<.01%)
instructions in affected programs: 1206 -> 1215 (0.75%)
helped: 0 / HURT: 5

total cycles in shared programs: 929993812 -> 929993816 (<.01%)
cycles in affected programs: 10930 -> 10934 (0.04%)
helped: 1 / HURT: 2

fossil-db:

Lunar Lake
Totals:
Instrs: 142892951 -> 142893049 (+0.00%)
Send messages: 6591165 -> 6591186 (+0.00%)
Cycle count: 21653727624 -> 21653732470 (+0.00%); split: -0.00%, +0.00%
Scratch Memory Size: 5664768 -> 5660672 (-0.07%)
Max live registers: 47944999 -> 47944983 (-0.00%)

Totals from 19 (0.00% of 553292) affected shaders:
Instrs: 10671 -> 10769 (+0.92%)
Send messages: 697 -> 718 (+3.01%)
Cycle count: 234508 -> 239354 (+2.07%); split: -0.01%, +2.08%
Scratch Memory Size: 38912 -> 34816 (-10.53%)
Max live registers: 2203 -> 2187 (-0.73%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 156744203 -> 156743428 (-0.00%); split: -0.00%, +0.00%
Send messages: 7654787 -> 7654808 (+0.00%)
Cycle count: 16942341318 -> 16942329195 (-0.00%); split: -0.00%, +0.00%
Spill count: 75549 -> 75499 (-0.07%)
Fill count: 140094 -> 140012 (-0.06%)
Scratch Memory Size: 3945472 -> 3944448 (-0.03%)
Max live registers: 32642020 -> 32642009 (-0.00%)

Totals from 19 (0.00% of 644000) affected shaders:
Instrs: 12489 -> 11714 (-6.21%); split: -7.00%, +0.79%
Send messages: 697 -> 718 (+3.01%)
Cycle count: 203873 -> 191750 (-5.95%); split: -6.77%, +0.82%
Spill count: 50 -> 0 (-inf%)
Fill count: 82 -> 0 (-inf%)
Scratch Memory Size: 25600 -> 24576 (-4.00%)
Max live registers: 1150 -> 1139 (-0.96%)

No fossil-db changes on Tiger Lake, Ice Lake, or Skylake.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
9a967c5ec4 brw/nir: Don't try optimize around emit_uniformize
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
63e395fa87 brw/nir: Eliminate nir_to_brw_state::uniform_values
No shader-db changes on any Intel platform. No fossil-db changes on
Tiger Lake, Ice Lake, or Skylake.

fossil-db:

Lunar Lake
Totals:
Cycle count: 21653230858 -> 21653230518 (-0.00%); split: -0.00%, +0.00%
Max live registers: 47941741 -> 47941737 (-0.00%)

Totals from 17 (0.00% of 553202) affected shaders:
Cycle count: 201232 -> 200892 (-0.17%); split: -0.19%, +0.02%
Max live registers: 1354 -> 1350 (-0.30%)

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 156455123 -> 156453396 (-0.00%); split: -0.00%, +0.00%
Cycle count: 16904545026 -> 16904393943 (-0.00%); split: -0.00%, +0.00%
Max live registers: 32638039 -> 32638035 (-0.00%)

Totals from 1201 (0.19% of 643905) affected shaders:
Instrs: 509360 -> 507633 (-0.34%); split: -0.34%, +0.00%
Cycle count: 1579931758 -> 1579780675 (-0.01%); split: -0.01%, +0.00%
Max live registers: 59633 -> 59629 (-0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
a13244e57b brw/nir: Treat some resource_intel as convergent
No shader-db changes on any Intel platform. No fossil-db changes on Ice
Lake or Skylake.

fossil-db:

Lunar Lake
Totals:
Cycle count: 21653232202 -> 21653230858 (-0.00%); split: -0.00%, +0.00%

Totals from 4 (0.00% of 553202) affected shaders:
Cycle count: 14276568 -> 14275224 (-0.01%); split: -0.01%, +0.00%

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 156453398 -> 156455123 (+0.00%); split: -0.00%, +0.00%
Cycle count: 16904394153 -> 16904545026 (+0.00%); split: -0.00%, +0.00%

Totals from 1189 (0.18% of 643905) affected shaders:
Instrs: 502891 -> 504616 (+0.34%); split: -0.00%, +0.34%
Cycle count: 1579688485 -> 1579839358 (+0.01%); split: -0.00%, +0.01%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
1b24612c57 brw/nir: Treat load_*_uniform_block_intel as convergent
Between 5 and 10 shaders (depending on the platform) from Blender are
massively helped for spills and fills (e.g., from 45 spills to 0, and
180 fills to 0).

Previously this commit cause a lot of spill and fill damage to
Wolfenstein Youngblood and Red Dead Redemption 2. I believe due to
!32041 and !32097, this is no longer the case. RDR2 is helped, and
Wolfenstein Youngblood has no changes.

However, q2rtx/q2rtx-rt-pipeline is hurt:

    Spill count: 126 -> 175 (+38.89%); split: -0.79%, +39.68%
    Fill count: 156 -> 235 (+50.64%); split: -1.92%, +52.56%

By the end of this series this damage is fixed, and q2rtx is helped
overall by -0.79% spills and -1.92% fills.

v2: Fix for Xe2.

v3: Just keep using bld for the group(1, 0) call. Suggested by Ken.

v4: Major re-write. Pass bld and xbld to fs_emit_memory_access. The big
fix is changing the way srcs[MEMORY_LOGICAL_ADDRESS] is calculated
(around line 7180). In previous versions of the commit, the address
would be calculated using bld (which is now xbld) even if the address
source was not is_scalar. This could cause the emit_uniformize (later in
the function) to fetch garbage. This also drops the special case
handling of constant offset. Constant propagation and algebraic will
handle this.

v5: Fix a subtle bug that was ultimately caused by the removal of
offset_to_component. The MEMORY_LOGICAL_ADDRESS for
load_shared_uniform_block_intel was being calculated as SIMD16 on LNL,
but the later emit_uniformize would treat it as SIMD32. This caused GPU
hangs in Assassin's Creed Valhalla.

v6: Fix a bug in D16 to D16U32 expansion. Noticed by Ken. Add a comment
explaining bld vs xbld vs ubld in fs_nir_emit_memory_access. Suggested
by Ken.

v7: Revert some of the v6 changes related to D16 to D16U32
expansion. This code was mostly correct. xbld is correct because DATA0
needs to be generated in size of the eventual SEND instruction. Using
offset(nir_src, xbld, c) will cause offset() to correctly added
component(..., 0) if nir_src.is_scalar but xbld is not scalar_group().

v8: nir_intrinsic_load_shared_uniform_block_intel was removed. This
caused reproducible hangs in Assassin's Creed: Valhalla. There are some
other compiler issues related to this game, and we're not yet sure
exactly what the cause of any of it is.

shader-db:

Lunar Lake
total instructions in shared programs: 18058270 -> 18068886 (0.06%)
instructions in affected programs: 5196846 -> 5207462 (0.20%)
helped: 4442 / HURT: 11416

total cycles in shared programs: 921324492 -> 919819398 (-0.16%)
cycles in affected programs: 733274162 -> 731769068 (-0.21%)
helped: 11312 / HURT: 31788

total spills in shared programs: 3633 -> 3585 (-1.32%)
spills in affected programs: 48 -> 0
helped: 5 / HURT: 0

total fills in shared programs: 2277 -> 2198 (-3.47%)
fills in affected programs: 79 -> 0
helped: 5 / HURT: 0

LOST:   123
GAINED: 377

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19703458 -> 19699173 (-0.02%)
instructions in affected programs: 5885251 -> 5880966 (-0.07%)
helped: 4545 / HURT: 14971

total cycles in shared programs: 903497253 -> 902054570 (-0.16%)
cycles in affected programs: 691762248 -> 690319565 (-0.21%)
helped: 16412 / HURT: 28080

total spills in shared programs: 4894 -> 4646 (-5.07%)
spills in affected programs: 248 -> 0
helped: 7 / HURT: 0

total fills in shared programs: 6638 -> 5581 (-15.92%)
fills in affected programs: 1057 -> 0
helped: 7 / HURT: 0

LOST:   427
GAINED: 978

Ice Lake and Skylake had similar results. (Ice Lake shonw)
total instructions in shared programs: 20384200 -> 20384889 (<.01%)
instructions in affected programs: 5295084 -> 5295773 (0.01%)
helped: 5309 / HURT: 12564

total cycles in shared programs: 873002832 -> 872515246 (-0.06%)
cycles in affected programs: 463413458 -> 462925872 (-0.11%)
helped: 16079 / HURT: 13339

total spills in shared programs: 4552 -> 4373 (-3.93%)
spills in affected programs: 546 -> 367 (-32.78%)
helped: 11 / HURT: 0

total fills in shared programs: 5298 -> 4657 (-12.10%)
fills in affected programs: 1798 -> 1157 (-35.65%)
helped: 10 / HURT: 0

LOST:   380
GAINED: 925

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 141528822 -> 141728392 (+0.14%); split: -0.21%, +0.35%
Subgroup size: 10968048 -> 10968144 (+0.00%)
Send messages: 6567930 -> 6567909 (-0.00%)
Cycle count: 22165780202 -> 21624534624 (-2.44%); split: -3.09%, +0.65%
Spill count: 69890 -> 66665 (-4.61%); split: -5.06%, +0.44%
Fill count: 128331 -> 120189 (-6.34%); split: -7.44%, +1.09%
Scratch Memory Size: 5829632 -> 5664768 (-2.83%); split: -2.86%, +0.04%
Max live registers: 47928290 -> 47611371 (-0.66%); split: -0.71%, +0.05%

Totals from 364369 (66.18% of 550563) affected shaders:
Instrs: 113448842 -> 113648412 (+0.18%); split: -0.26%, +0.44%
Subgroup size: 7694080 -> 7694176 (+0.00%)
Send messages: 5308287 -> 5308266 (-0.00%)
Cycle count: 21885237842 -> 21343992264 (-2.47%); split: -3.13%, +0.65%
Spill count: 65152 -> 61927 (-4.95%); split: -5.42%, +0.47%
Fill count: 122811 -> 114669 (-6.63%); split: -7.77%, +1.14%
Scratch Memory Size: 5438464 -> 5273600 (-3.03%); split: -3.07%, +0.04%
Max live registers: 34355310 -> 34038391 (-0.92%); split: -1.00%, +0.07%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
db2b1e4d76 brw/nir: Treat load_btd_{global,local}_arg_addr_intel and load_btd_shader_type_intel as convergent
No shader-db changes on any Intel platform. No fossil-db changes on
Tiger Lake, Ice Lake, or Skylake.

fossil-db:

Lunar Lake
Totals:
Instrs: 141808714 -> 141808513 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22177889310 -> 22181410192 (+0.02%); split: -0.00%, +0.02%
Spill count: 69892 -> 69890 (-0.00%); split: -0.01%, +0.01%
Fill count: 128313 -> 128331 (+0.01%)
Max live registers: 48052083 -> 48052742 (+0.00%); split: -0.00%, +0.00%

Totals from 549 (0.10% of 551446) affected shaders:
Instrs: 911251 -> 911050 (-0.02%); split: -0.10%, +0.07%
Cycle count: 1244153266 -> 1247674148 (+0.28%); split: -0.04%, +0.32%
Spill count: 15849 -> 15847 (-0.01%); split: -0.04%, +0.03%
Fill count: 35087 -> 35105 (+0.05%)
Max live registers: 68047 -> 68706 (+0.97%); split: -0.25%, +1.22%

Meteor Lake
Totals:
Instrs: 152744298 -> 152741241 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17410258529 -> 17405949054 (-0.02%); split: -0.04%, +0.01%
Spill count: 78528 -> 78598 (+0.09%); split: -0.01%, +0.09%
Fill count: 147893 -> 147978 (+0.06%); split: -0.00%, +0.06%
Scratch Memory Size: 3962880 -> 3969024 (+0.16%)
Max live registers: 31887206 -> 31887413 (+0.00%); split: -0.00%, +0.00%

Totals from 552 (0.09% of 633315) affected shaders:
Instrs: 907279 -> 904222 (-0.34%); split: -0.48%, +0.15%
Cycle count: 1152358569 -> 1148049094 (-0.37%); split: -0.56%, +0.19%
Spill count: 15290 -> 15360 (+0.46%); split: -0.03%, +0.48%
Fill count: 35313 -> 35398 (+0.24%); split: -0.02%, +0.26%
Scratch Memory Size: 1313792 -> 1319936 (+0.47%)
Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08%

DG2
Totals:
Instrs: 152766492 -> 152763061 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17406058608 -> 17406396943 (+0.00%); split: -0.02%, +0.02%
Spill count: 78626 -> 78624 (-0.00%); split: -0.01%, +0.01%
Fill count: 147956 -> 148007 (+0.03%); split: -0.01%, +0.04%
Scratch Memory Size: 3962880 -> 3969024 (+0.16%)
Max live registers: 31887158 -> 31887365 (+0.00%); split: -0.00%, +0.00%

Totals from 552 (0.09% of 633315) affected shaders:
Instrs: 908513 -> 905082 (-0.38%); split: -0.47%, +0.09%
Cycle count: 1148162185 -> 1148500520 (+0.03%); split: -0.23%, +0.26%
Spill count: 15364 -> 15362 (-0.01%); split: -0.07%, +0.06%
Fill count: 35343 -> 35394 (+0.14%); split: -0.03%, +0.17%
Scratch Memory Size: 1313792 -> 1319936 (+0.47%)
Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
f3593df877 brw/nir: Treat load_reloc_const_intel as convergent
shader-db:

Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
Lunar Lake
total instructions in shared programs: 18096549 -> 18096537 (<.01%)
instructions in affected programs: 26128 -> 26116 (-0.05%)
helped: 7 / HURT: 2

total cycles in shared programs: 922073090 -> 922093922 (<.01%)
cycles in affected programs: 10574198 -> 10595030 (0.20%)
helped: 19 / HURT: 76

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20503943 -> 20504053 (<.01%)
instructions in affected programs: 23378 -> 23488 (0.47%)
helped: 6 / HURT: 5

total cycles in shared programs: 875477036 -> 875480112 (<.01%)
cycles in affected programs: 13840528 -> 13843604 (0.02%)
helped: 22 / HURT: 55

total spills in shared programs: 4546 -> 4552 (0.13%)
spills in affected programs: 8 -> 14 (75.00%)
helped: 0 / HURT: 1

total fills in shared programs: 5280 -> 5298 (0.34%)
fills in affected programs: 24 -> 42 (75.00%)
helped: 0 / HURT: 1

One compute shader in Tomb Raider was hurt for spills and fills.

fossil-db:

Lunar Lake
Totals:
Instrs: 141808815 -> 141808714 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22185066952 -> 22177889310 (-0.03%); split: -0.05%, +0.02%
Spill count: 69859 -> 69892 (+0.05%); split: -0.03%, +0.07%
Fill count: 128344 -> 128313 (-0.02%); split: -0.04%, +0.01%
Scratch Memory Size: 5833728 -> 5829632 (-0.07%)

Totals from 13384 (2.43% of 551446) affected shaders:
Instrs: 13852162 -> 13852061 (-0.00%); split: -0.00%, +0.00%
Cycle count: 7691993336 -> 7684815694 (-0.09%); split: -0.15%, +0.06%
Spill count: 53266 -> 53299 (+0.06%); split: -0.03%, +0.10%
Fill count: 96492 -> 96461 (-0.03%); split: -0.05%, +0.02%
Scratch Memory Size: 3827712 -> 3823616 (-0.11%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152744735 -> 152744298 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17400199290 -> 17410258529 (+0.06%); split: -0.01%, +0.07%
Max live registers: 31887208 -> 31887206 (-0.00%)

Totals from 12435 (1.96% of 633315) affected shaders:
Instrs: 13445310 -> 13444873 (-0.00%); split: -0.00%, +0.00%
Cycle count: 6941685096 -> 6951744335 (+0.14%); split: -0.03%, +0.18%
Max live registers: 1071302 -> 1071300 (-0.00%)

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150644063 -> 150643944 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15618718733 -> 15622092285 (+0.02%); split: -0.01%, +0.03%
Spill count: 58816 -> 58790 (-0.04%)
Fill count: 101054 -> 101065 (+0.01%)
Max live registers: 31792771 -> 31792766 (-0.00%); split: -0.00%, +0.00%

Totals from 13383 (2.12% of 632544) affected shaders:
Instrs: 12016285 -> 12016166 (-0.00%); split: -0.00%, +0.00%
Cycle count: 5239956851 -> 5243330403 (+0.06%); split: -0.02%, +0.08%
Spill count: 28977 -> 28951 (-0.09%)
Fill count: 47568 -> 47579 (+0.02%)
Max live registers: 1001554 -> 1001549 (-0.00%); split: -0.00%, +0.00%

Skylake
Totals:
Instrs: 140943195 -> 140943154 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14818940190 -> 14816706154 (-0.02%); split: -0.02%, +0.00%
Max live registers: 31663173 -> 31663168 (-0.00%); split: -0.00%, +0.00%

Totals from 12625 (2.01% of 629351) affected shaders:
Instrs: 11598223 -> 11598182 (-0.00%); split: -0.00%, +0.00%
Cycle count: 4519027823 -> 4516793787 (-0.05%); split: -0.05%, +0.00%
Max live registers: 970275 -> 970270 (-0.00%); split: -0.00%, +0.00%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
fb9b363376 brw/nir: Treat load_inline_data_intel as convergent
No shader-db changes on any Intel platform.

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 141808595 -> 141808815 (+0.00%); split: -0.00%, +0.00%
Cycle count: 22181300418 -> 22185066952 (+0.02%); split: -0.01%, +0.03%
Max live registers: 48052077 -> 48052083 (+0.00%)

Totals from 720 (0.13% of 551446) affected shaders:
Instrs: 116778 -> 116998 (+0.19%); split: -0.01%, +0.20%
Cycle count: 1197931082 -> 1201697616 (+0.31%); split: -0.21%, +0.53%
Max live registers: 56552 -> 56558 (+0.01%)

No fossil-db changes on any other Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
3e63920ca5 brw/nir: Treat some load_ubo as convergent
v2: Fix for Xe2.

No changes in shader-db or fossil-db on Lunar Lake, Meteor Lake, or DG2.

shader-db:

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
total instructions in shared programs: 19626547 -> 19634353 (0.04%)
instructions in affected programs: 1591181 -> 1598987 (0.49%)
helped: 925 / HURT: 3595

total cycles in shared programs: 865236718 -> 866682659 (0.17%)
cycles in affected programs: 151284264 -> 152730205 (0.96%)
helped: 3430 / HURT: 5510

total sends in shared programs: 1032237 -> 1032233 (<.01%)
sends in affected programs: 20 -> 16 (-20.00%)
helped: 4 / HURT: 0

LOST:   48
GAINED: 141

fossil-db:

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150662952 -> 150641175 (-0.01%); split: -0.03%, +0.02%
Subgroup size: 7768880 -> 7768888 (+0.00%)
Send messages: 7502265 -> 7502044 (-0.00%)
Cycle count: 15621785298 -> 15618640525 (-0.02%); split: -0.06%, +0.04%
Spill count: 58818 -> 58816 (-0.00%)
Fill count: 101063 -> 101054 (-0.01%)
Max live registers: 31795403 -> 31792179 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 5572160 -> 5571488 (-0.01%); split: +0.00%, -0.01%

Totals from 10278 (1.62% of 632539) affected shaders:
Instrs: 5276493 -> 5254716 (-0.41%); split: -0.89%, +0.48%
Subgroup size: 156432 -> 156440 (+0.01%)
Send messages: 279259 -> 279038 (-0.08%)
Cycle count: 6483576378 -> 6480431605 (-0.05%); split: -0.16%, +0.11%
Spill count: 27133 -> 27131 (-0.01%)
Fill count: 49384 -> 49375 (-0.02%)
Max live registers: 675781 -> 672557 (-0.48%); split: -0.49%, +0.01%
Max dispatch width: 97256 -> 96584 (-0.69%); split: +0.08%, -0.77%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
c48570d2b2 brw/nir: Treat some ALU results as convergent
v2: Fix for Xe2.

v3: Fix handling of 64-bit CMP results.

v4: Scalarize 16-bit comparison temporary destination when used as a
source (as was already done for 64-bit). Suggested by Ken.

shader-db:

Lunar Lake
total instructions in shared programs: 18096500 -> 18096549 (<.01%)
instructions in affected programs: 15919 -> 15968 (0.31%)
helped: 8 / HURT: 21

total cycles in shared programs: 921841300 -> 922073090 (0.03%)
cycles in affected programs: 115946336 -> 116178126 (0.20%)
helped: 386 / HURT: 135

Meteor Lake and DG2 (Meteor Lake shown)
total instructions in shared programs: 19836053 -> 19836016 (<.01%)
instructions in affected programs: 19547 -> 19510 (-0.19%)
helped: 21 / HURT: 18

total cycles in shared programs: 906713777 -> 906588541 (-0.01%)
cycles in affected programs: 96914584 -> 96789348 (-0.13%)
helped: 335 / HURT: 134

total fills in shared programs: 6712 -> 6710 (-0.03%)
fills in affected programs: 52 -> 50 (-3.85%)
helped: 1 / HURT: 0

LOST:   1
GAINED: 1

Tiger Lake
total instructions in shared programs: 19641284 -> 19641278 (<.01%)
instructions in affected programs: 12358 -> 12352 (-0.05%)
helped: 10 / HURT: 19

total cycles in shared programs: 865413131 -> 865460513 (<.01%)
cycles in affected programs: 74641489 -> 74688871 (0.06%)
helped: 388 / HURT: 100

total spills in shared programs: 3899 -> 3898 (-0.03%)
spills in affected programs: 17 -> 16 (-5.88%)
helped: 1 / HURT: 0

total fills in shared programs: 3249 -> 3245 (-0.12%)
fills in affected programs: 51 -> 47 (-7.84%)
helped: 1 / HURT: 0

LOST:   1
GAINED: 1

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20495826 -> 20496111 (<.01%)
instructions in affected programs: 53220 -> 53505 (0.54%)
helped: 28 / HURT: 16

total cycles in shared programs: 875173550 -> 875243910 (<.01%)
cycles in affected programs: 51700652 -> 51771012 (0.14%)
helped: 400 / HURT: 39

total spills in shared programs: 4546 -> 4546 (0.00%)
spills in affected programs: 288 -> 288 (0.00%)
helped: 1 / HURT: 2

total fills in shared programs: 5224 -> 5280 (1.07%)
fills in affected programs: 795 -> 851 (7.04%)
helped: 0 / HURT: 4

LOST:   1
GAINED: 1

fossil-db:

Lunar Lake
Totals:
Instrs: 141811551 -> 141807640 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22183128332 -> 22181285594 (-0.01%); split: -0.06%, +0.05%
Spill count: 69890 -> 69859 (-0.04%); split: -0.09%, +0.04%
Fill count: 128877 -> 128344 (-0.41%); split: -0.42%, +0.00%
Max live registers: 48053415 -> 48051613 (-0.00%); split: -0.00%, +0.00%

Totals from 6817 (1.24% of 551443) affected shaders:
Instrs: 4300169 -> 4296258 (-0.09%); split: -0.14%, +0.05%
Cycle count: 17263755610 -> 17261912872 (-0.01%); split: -0.08%, +0.07%
Spill count: 41822 -> 41791 (-0.07%); split: -0.15%, +0.07%
Fill count: 75523 -> 74990 (-0.71%); split: -0.71%, +0.01%
Max live registers: 733647 -> 731845 (-0.25%); split: -0.29%, +0.04%

Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 152735305 -> 152735801 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 7733536 -> 7733616 (+0.00%)
Cycle count: 17398725539 -> 17400873100 (+0.01%); split: -0.00%, +0.02%
Max live registers: 31887018 -> 31885742 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5561696 -> 5561712 (+0.00%)

Totals from 5672 (0.90% of 633314) affected shaders:
Instrs: 2817606 -> 2818102 (+0.02%); split: -0.05%, +0.07%
Subgroup size: 81128 -> 81208 (+0.10%)
Cycle count: 10021470543 -> 10023618104 (+0.02%); split: -0.01%, +0.03%
Max live registers: 306520 -> 305244 (-0.42%); split: -0.43%, +0.01%
Max dispatch width: 74136 -> 74152 (+0.02%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
7eab2cb67e brw/nir: Treat load_workgroup_id as convergent
v2: Fix for Xe2.

shader-db:

Lunar Lake Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18096526 -> 18096500 (<.01%)
instructions in affected programs: 6759 -> 6733 (-0.38%)
helped: 9 / HURT: 3

total cycles in shared programs: 921727804 -> 921841300 (0.01%)
cycles in affected programs: 110049730 -> 110163226 (0.10%)
helped: 90 / HURT: 372

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20496591 -> 20496402 (<.01%)
instructions in affected programs: 48757 -> 48568 (-0.39%)
helped: 25 / HURT: 8

total cycles in shared programs: 875253948 -> 875237902 (<.01%)
cycles in affected programs: 56760140 -> 56744094 (-0.03%)
helped: 363 / HURT: 34

total spills in shared programs: 4555 -> 4546 (-0.20%)
spills in affected programs: 174 -> 165 (-5.17%)
helped: 2 / HURT: 0

total fills in shared programs: 5243 -> 5224 (-0.36%)
fills in affected programs: 382 -> 363 (-4.97%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 141811577 -> 141811551 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22173792370 -> 22183128332 (+0.04%); split: -0.00%, +0.04%
Max live registers: 48053498 -> 48053415 (-0.00%)

Totals from 3911 (0.71% of 551443) affected shaders:
Instrs: 2164804 -> 2164778 (-0.00%); split: -0.00%, +0.00%
Cycle count: 2404062476 -> 2413398438 (+0.39%); split: -0.02%, +0.41%
Max live registers: 413583 -> 413500 (-0.02%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
6fab1b77c2 brw/nir: Treat some load_uniform as convergent
No shader-db changes on any Intel platform.

v2: Fix for Xe2.

v3: Rework the way that we determine that an intrinsic can actually be
convergent. This will now depend on whether or not the important
sources have previously be determined to be convergent. Fixes
intermitent failures in some test cases (including
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.push_constant_float_16_to_32.scalar_frag).

v4: s/the it/it/ in a comment. Noticed by Ken.

fossil-db:

No fossil-db changes on Lunar Lake.

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152743449 -> 152743161 (-0.00%)
Cycle count: 17399179660 -> 17399193488 (+0.00%)

Totals from 144 (0.02% of 633314) affected shaders:
Instrs: 5936 -> 5648 (-4.85%)
Cycle count: 51616 -> 65444 (+26.79%)

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150646195 -> 150645907 (-0.00%)
Cycle count: 15618427818 -> 15618428942 (+0.00%)

Totals from 144 (0.02% of 632567) affected shaders:
Instrs: 6218 -> 5930 (-4.63%)
Cycle count: 39968 -> 41092 (+2.81%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick
341e5117ec brw/nir: Treat load_const as convergent
opt_combine_constants goes to great effort to pack 8 constants into a
single register, this can't have much effect.

There is a lot of fossil-db variation among platforms, but the results
are generally positive.

v2: Fix for Xe2.

shader-db:

Lunar Lake
total instructions in shared programs: 18095100 -> 18092845 (-0.01%)
instructions in affected programs: 158931 -> 156676 (-1.42%)
helped: 423 / HURT: 0

total cycles in shared programs: 921523326 -> 921522784 (<.01%)
cycles in affected programs: 7522774 -> 7522232 (<.01%)
helped: 225 / HURT: 228

LOST:   1
GAINED: 7

Meteor Lake and all older Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19820211 -> 19820303 (<.01%)
instructions in affected programs: 53087 -> 53179 (0.17%)
helped: 135 / HURT: 1

total cycles in shared programs: 906380523 -> 906383031 (<.01%)
cycles in affected programs: 1402315 -> 1404823 (0.18%)
helped: 156 / HURT: 100

LOST:   1
GAINED: 16

fossil-db:

Lunar Lake
Totals:
Instrs: 141876801 -> 141783010 (-0.07%); split: -0.07%, +0.00%
Subgroup size: 10994624 -> 10994704 (+0.00%)
Cycle count: 22173441950 -> 22172949188 (-0.00%); split: -0.01%, +0.01%
Spill count: 69850 -> 69890 (+0.06%); split: -0.00%, +0.06%
Fill count: 129285 -> 128877 (-0.32%)
Max live registers: 48047900 -> 48043650 (-0.01%); split: -0.01%, +0.00%

Totals from 29837 (5.41% of 551396) affected shaders:
Instrs: 7842512 -> 7748721 (-1.20%); split: -1.23%, +0.03%
Subgroup size: 940320 -> 940400 (+0.01%)
Cycle count: 3444846368 -> 3444353606 (-0.01%); split: -0.09%, +0.08%
Spill count: 23358 -> 23398 (+0.17%); split: -0.01%, +0.18%
Fill count: 52296 -> 51888 (-0.78%)
Max live registers: 3183481 -> 3179231 (-0.13%); split: -0.16%, +0.03%

Meteor Lake
Totals:
Instrs: 152709353 -> 152666543 (-0.03%); split: -0.03%, +0.00%
Cycle count: 17397176906 -> 17397668904 (+0.00%); split: -0.00%, +0.01%
Fill count: 147896 -> 147893 (-0.00%)
Max live registers: 31862891 -> 31861888 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04%

Totals from 20913 (3.30% of 633046) affected shaders:
Instrs: 6676676 -> 6633866 (-0.64%); split: -0.64%, +0.00%
Cycle count: 1498330125 -> 1498822123 (+0.03%); split: -0.06%, +0.09%
Fill count: 41010 -> 41007 (-0.01%)
Max live registers: 1799295 -> 1798292 (-0.06%); split: -0.06%, +0.00%
Max dispatch width: 12880 -> 14992 (+16.40%); split: +33.29%, -16.89%

DG2 and Tiger Lake had similar results. (DG2 shown)
Totals:
Instrs: 152730878 -> 152688139 (-0.03%); split: -0.03%, +0.00%
Cycle count: 17394835605 -> 17394179808 (-0.00%); split: -0.01%, +0.00%
Max live registers: 31862843 -> 31861840 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559664 -> 5561776 (+0.04%); split: +0.08%, -0.04%

Totals from 20912 (3.30% of 633046) affected shaders:
Instrs: 6563021 -> 6520282 (-0.65%); split: -0.65%, +0.00%
Cycle count: 1201999616 -> 1201343819 (-0.05%); split: -0.08%, +0.03%
Max live registers: 1798392 -> 1797389 (-0.06%); split: -0.06%, +0.00%
Max dispatch width: 12872 -> 14984 (+16.41%); split: +33.31%, -16.90%

Ice Lake
Totals:
Instrs: 151914872 -> 151868108 (-0.03%)
Cycle count: 15262958696 -> 15262665082 (-0.00%); split: -0.00%, +0.00%
Max live registers: 32194225 -> 32193192 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5650880 -> 5650608 (-0.00%); split: +0.02%, -0.03%

Totals from 22192 (3.48% of 637223) affected shaders:
Instrs: 6419739 -> 6372975 (-0.73%)
Cycle count: 184733818 -> 184440204 (-0.16%); split: -0.36%, +0.20%
Max live registers: 1989950 -> 1988917 (-0.05%); split: -0.05%, +0.00%
Max dispatch width: 5744 -> 5472 (-4.74%); split: +23.40%, -28.13%

Skylake
Totals:
Instrs: 141027379 -> 140811741 (-0.15%)
Cycle count: 14817704293 -> 14817418611 (-0.00%); split: -0.01%, +0.01%
Max live registers: 31628796 -> 31627791 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5535176 -> 5539880 (+0.08%); split: +0.14%, -0.06%

Totals from 22218 (3.53% of 628840) affected shaders:
Instrs: 5944856 -> 5729218 (-3.63%)
Cycle count: 182845101 -> 182559419 (-0.16%); split: -0.60%, +0.44%
Max live registers: 1974576 -> 1973571 (-0.05%); split: -0.07%, +0.02%
Max dispatch width: 16912 -> 21616 (+27.81%); split: +46.93%, -19.11%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
d0f1a94e3d brw/build: Prepare BROADCAST for scalar values
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00
Ian Romanick
5ea9ed4798 brw/nir: Prepare try_rebuild_source for scalar values
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:58 -08:00