We want it to be set to wherever the push constants ended up.
Setting it close to the setup_payload_push() call makes this easier.
We'll also be adding some extra UGPRs for the fragment shader payload
soon, and the partitioning code will just have one big UGPR partition
for payload fields, push constants, and general purpose UGPRs, so it
really won't know how to do this very well without duplicating a bunch
of information.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
The documentation is large and hard to follow due to all the optional
fields and the SIMD16 vs. SIMD32 split for barycentrics. This quick
summary helps clarify what fields exist, which are split for SIMD32
or kept together, and which pairs of registers are involved for splits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Constructing the render target store payload is more complex than we can
reasonably handle at the NIR level. The main reason is that samplemask
and stencil are packed 16-bit and 8-bit parameters, respectively, which
are intermixed with other values that are 32-bit. In SIMD32 mode, the
packed sub-32-bit values take up fewer registers than normal values.
Currently we also don't specialize the NIR for each FS dispatch width,
and we can't construct the message descriptor without knowing it.
So, we alter nir_intrinsic_store_render_target_intel to take each of
the expected parameters - colour, depth, stencil, samplemask,
src0_alpha, and discard predicate. We construct the payloads and
descriptors in the backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Implement a simple pre-RA bottom-up list scheduler with the goal of decreasing
register pressure. On Xe2, this significantly reduces spilling.
SSA form allows us to estimate register demand cheaply and accurately, which
theoretically [1] gives this algorithm the two Hippocratic properties:
1. Shaders with low register pressure are unaffected.
2. Register pressure can only be decreased, never increased.
In other words: first, do no harm.
The heuristic itself is very simple: greedily choose instructions that decrease
liveness using a backwards list scheduler. This is far from optimal! But thanks
to the above properties, even a heuristic that picked random instructions would
be a win overall - by construction, we can only ever win.
In other words: this scheduler is your older brother powering off the game
console any time he's about to lose a game, maintaining a 100% win rate.
[1] In reality, neither property is strictly satisfied due to the messy details
of mapping our clean logical model onto Intel's many weird physical register
files. Nevertheless, the algorithm is well-motivated and the empirical results
on Xe2 are excellent.
SIMD16:
Totals:
Instrs: 2754194 -> 2753957 (-0.01%); split: -0.23%, +0.22%
CodeSize: 41094768 -> 41092768 (-0.00%); split: -0.23%, +0.23%
Number of spill instructions: 1724 -> 1129 (-34.51%)
Number of fill instructions: 1912 -> 1119 (-41.47%)
Totals from 168 (6.35% of 2647) affected shaders:
Instrs: 850994 -> 850757 (-0.03%); split: -0.75%, +0.73%
CodeSize: 12825680 -> 12823680 (-0.02%); split: -0.74%, +0.73%
Number of spill instructions: 1724 -> 1129 (-34.51%)
Number of fill instructions: 1912 -> 1119 (-41.47%)
SIMD32:
Totals:
Instrs: 4688858 -> 4557800 (-2.80%); split: -3.53%, +0.74%
CodeSize: 70177200 -> 68214816 (-2.80%); split: -3.53%, +0.74%
Number of spill instructions: 50316 -> 45795 (-8.99%); split: -9.56%, +0.57%
Number of fill instructions: 51526 -> 45075 (-12.52%); split: -13.23%, +0.71%
Totals from 819 (30.94% of 2647) affected shaders:
Instrs: 3810182 -> 3679124 (-3.44%); split: -4.35%, +0.91%
CodeSize: 57044000 -> 55081616 (-3.44%); split: -4.35%, +0.91%
Number of spill instructions: 49264 -> 44743 (-9.18%); split: -9.76%, +0.58%
Number of fill instructions: 50182 -> 43731 (-12.86%); split: -13.58%, +0.73%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
logically it doesn't matter because we'll bail on a later check, but this is
still UB and therefore releases nasal demons.
i am jealous of Faith's Rust compilers. there, I said it.
==107281== Conditional jump or move depends on uninitialised value(s)
==107281== at 0x7069768: propagate_backwards (jay_opt_propagate.c:327)
==107281== by 0x7069768: jay_opt_propagate_backwards (jay_opt_propagate.c:367)
==107281== by 0x7058960: jay_compile (jay_from_nir.c:2677)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Challenging to hit but fixes
dEQP-GLES3.functional.shaders.swizzle_math_operations.vector_multiply.mediump_ivec4_wzyx_zyxw_fragment
with scheduling changes.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
on top of scheduler changes, compile-time of shaders/blender/1017.shader_test:
Difference at 95.0% confidence
-0.00173202 +/- 0.00116931
-0.791537% +/- 0.532384%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
This is a less obtuse error message for why things break.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
The hardware expects it to be present for every colour target.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
allows deleting piles of moves & pressure.
simd16 results:
Totals:
Instrs: 2759547 -> 2753358 (-0.22%); split: -0.29%, +0.06%
CodeSize: 41141280 -> 41071072 (-0.17%); split: -0.23%, +0.06%
Totals from 332 (12.54% of 2647) affected shaders:
Instrs: 648080 -> 641891 (-0.95%); split: -1.23%, +0.28%
CodeSize: 9782272 -> 9712064 (-0.72%); split: -0.97%, +0.25%
simd32 is a loss because of RA being stupid. again, this is obviously the right
thing to do so we're doing it. stats are just a hint.
Totals:
Instrs: 4683556 -> 4689193 (+0.12%); split: -0.25%, +0.37%
CodeSize: 70072256 -> 70171920 (+0.14%); split: -0.23%, +0.38%
Number of spill instructions: 50320 -> 50316 (-0.01%)
Number of fill instructions: 51530 -> 51526 (-0.01%)
Totals from 351 (13.26% of 2647) affected shaders:
Instrs: 1349954 -> 1355591 (+0.42%); split: -0.86%, +1.28%
CodeSize: 20484224 -> 20583888 (+0.49%); split: -0.80%, +1.29%
Number of spill instructions: 21762 -> 21758 (-0.02%)
Number of fill instructions: 26328 -> 26324 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
this is both a correctness fix (insufficient MEM registers reserved in some
cases) and a performance fix (unnecessary allocations & zeroing in the RA when
we don't spill).
fixes dEQP-VK.dgc.ext.compute.misc.scratch_space
stats are noise but positive i guess.
Totals from 35 (1.32% of 2647) affected shaders:
Instrs: 396770 -> 396690 (-0.02%)
CodeSize: 6040832 -> 6039600 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
poking around, it seems branches stall the pipelines so we don't need to do any
dataflow analysis, but we do need to fall through for correctness. just keep
going across block boundaries. this isn't optimal yet but it reduces a
pile of A@1's already.
Totals from 1389 (52.47% of 2647) affected shaders:
CodeSize: 56385376 -> 56325776 (-0.11%); split: -0.13%, +0.03%
--
this also fixes issues where the first instruction of a block is a SEND that has
an unmet register dependency, since the old code was fundamentally broken. oops.
lol. fixes
dEQP-VK.compute.pipeline.workgroup_memory_explicit_layout.zero.uint8_t_array_to_uint_array_1
among many others.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
Lets us use more accumulators, I think this is well motivated. Saw this in a
test shader.
Totals from 242 (9.14% of 2647) affected shaders:
Instrs: 1365060 -> 1365035 (-0.00%); split: -0.00%, +0.00%
CodeSize: 20678592 -> 20680096 (+0.01%); split: -0.01%, +0.02%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>
We can just have iris look at its own program key and change the
fragment shader output variable's location/index in the NIR. By
doing this before lowering fragment shader outputs, the rest of
the output lowering does the right thing, and the backend no longer
has to consider hacks for broken OpenGL apps.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41122>