The meson.build_root() method has been deprecated, so let's switch to
meson.project_build_root(), which usually means the same thing. The case
where it doesn't do the same thing is if Mesa is a subproject to some
other project, but in that case I believe we want the build root of Mesa,
not of the parent project anyway.
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20907>
Allow the compiler to assume that the shader always has full subgroups,
meaning that the initial EXEC mask is -1 in all waves (all lanes enabled).
This assumption is incorrect for ray tracing and internal (meta) shaders
because they can use unaligned dispatch.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>
This was already handled by apply_extract but missing from
can_apply_extract, therefore may not be properly applied everywhere.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17924>
If we pass a physical 2D texture descriptor we should also pass 2
coords. Otherwise it just uses the random content in the second
register which ends up funny sometimes.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20696>
Driver workarounds for game bugs can be easily broken. This one
shouldn't be applied to meta shaders and this restores previous logic.
Fixes: da32cbb5c6 ("aco: fix missing uses of MRT output flags")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20637>
It looks like the LLVM assembler promotes true 16-bit instructions to VOP3
in this case.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
This lets us use less scratch if both VGPR spilling and scratch intrinsics
are used.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>
fossil-db (gfx1100):
Totals from 112 (0.08% of 134574) affected shaders:
Scratch: 1513472 -> 1455360 (-3.84%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20534>
p_cvt_f16_f32_rtne will be lowered to v_cvt_f16_f32 and we already know that
preserves the high bits.
I tested the others on GFX1036.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20574>
This makes these instructions free when considering pipeline statistics
and s_delay_alu insertion.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>
Use gpr_map to determine how many cycles each dependency of the
s_delay_alu needs. This information helps the pass avoid further
s_delay_alu instructions.
fossil-db (gfx1100):
Totals from 13097 (9.73% of 134574) affected shaders:
Instrs: 30711894 -> 30702692 (-0.03%)
CodeSize: 153462500 -> 153425692 (-0.02%)
Latency: 372758612 -> 372741922 (-0.00%)
InvThroughput: 50164111 -> 50160717 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20512>
Any unused split_vector definition can always use the same register
as the operand. This avoids creating unnecessary copies.
Fossil DB stats on Rembrandt (RDNA2):
Totals from 3904 (2.89% of 134906) affected shaders:
CodeSize: 18326692 -> 18271688 (-0.30%)
Instrs: 3386632 -> 3372888 (-0.41%)
Latency: 42337481 -> 42330085 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 6566731 -> 6566424 (-0.00%); split: -0.01%, +0.00%
Copies: 224301 -> 210559 (-6.13%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
Eliminate unnecessary copies when the operand registers of a
p_split_vector instruction are not clobbered between the p_split_vector
and the user of its definitions.
This happens when p_split_vector doesn't kill its operand and its
definitions have a shorter lifespan that the operand. It affects every
NGG culling shader among other things.
This optimization exists because it's too difficult to solve it
in RA, and should be removed after we solved this in RA.
v2 by Daniel Schürmann:
- Rearrange and simplify conditions for the new optimization
- Fix a few bugs
v3 by Daniel Schürmann:
- Check number of encoded ALU operands
Fossil DB stats on Rembrandt (RDNA2):
Totals from 64896 (48.10% of 134906) affected shaders:
CodeSize: 175693348 -> 175434944 (-0.15%)
Instrs: 33333912 -> 33269388 (-0.19%)
Latency: 183766084 -> 183763432 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 28589651 -> 28589340 (-0.00%); split: -0.00%, +0.00%
Copies: 2806550 -> 2742038 (-2.30%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
This allows is_overwritten_since to return false when the last
writer instruction of a register can't be tracked but we know it wasn't
written in the current block.
Fossil DB stats on Rembrandt (RDNA2):
Totals from 1163 (0.86% of 134906) affected shaders:
CodeSize: 9815920 -> 9805016 (-0.11%)
Instrs: 1843688 -> 1840962 (-0.15%)
Latency: 19219153 -> 19209171 (-0.05%)
InvThroughput: 3354375 -> 3353852 (-0.02%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
This works because of SSA and should be safer than just setting 'not_written_yet'.
No Fossil DB changes on Rembrandt (RDNA2).
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16161>
If a shader doesn't export any color targets and instead only exports
mrtz, the discard early exit block should match.
Fixes artifacts on Lara in Rise of the Tomb Raider benchmark and hair in
The Witcher 3 (classic).
https://reviews.llvm.org/D128185
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: bc8da20dda ("aco: export MRT0 instead of NULL on GFX11")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20345>
Similar to emit_gfx10_wave64_bpermute, but uses the new
v_permlane64_b32 instruction to swap data between wave halves.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
Different sequences are emitted for these, so it makes sense to
have different opcodes too.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
The sequence emitted for this pseudo instruction is not ready
to handle constants or literals at all.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b. Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.
At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.
The failing test on d3d12 is a pre-existing bug that is triggered by
this change. I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.
v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.
v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.
v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress. The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).
v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.
v6: Just eliminate the i2b instruction.
v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷
No shader-db changes on any Intel platform.
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2
Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2
The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.
Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>