Fixes "multiple stores to the same location" assertions in tests like
dEQP-VK.pipeline.monolithic.color_write_enable_maxa.cwe_after_bind.attachments3_more0
In that case, the stores were actually to different locations, but some
constant additions hadn't been folded into the location field yet.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
jay is trying to use the fragment shader dispatch mask for helper
invocation lowering, but it was using load_sample_mask_in for that
(now load_coverage_mask_intel). But this isn't the MSAA coverage
mask, the two are different payload fields.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
We want it to be set to wherever the push constants ended up.
Setting it close to the setup_payload_push() call makes this easier.
We'll also be adding some extra UGPRs for the fragment shader payload
soon, and the partitioning code will just have one big UGPR partition
for payload fields, push constants, and general purpose UGPRs, so it
really won't know how to do this very well without duplicating a bunch
of information.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
The documentation is large and hard to follow due to all the optional
fields and the SIMD16 vs. SIMD32 split for barycentrics. This quick
summary helps clarify what fields exist, which are split for SIMD32
or kept together, and which pairs of registers are involved for splits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Constructing the render target store payload is more complex than we can
reasonably handle at the NIR level. The main reason is that samplemask
and stencil are packed 16-bit and 8-bit parameters, respectively, which
are intermixed with other values that are 32-bit. In SIMD32 mode, the
packed sub-32-bit values take up fewer registers than normal values.
Currently we also don't specialize the NIR for each FS dispatch width,
and we can't construct the message descriptor without knowing it.
So, we alter nir_intrinsic_store_render_target_intel to take each of
the expected parameters - colour, depth, stencil, samplemask,
src0_alpha, and discard predicate. We construct the payloads and
descriptors in the backend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Implement a simple pre-RA bottom-up list scheduler with the goal of decreasing
register pressure. On Xe2, this significantly reduces spilling.
SSA form allows us to estimate register demand cheaply and accurately, which
theoretically [1] gives this algorithm the two Hippocratic properties:
1. Shaders with low register pressure are unaffected.
2. Register pressure can only be decreased, never increased.
In other words: first, do no harm.
The heuristic itself is very simple: greedily choose instructions that decrease
liveness using a backwards list scheduler. This is far from optimal! But thanks
to the above properties, even a heuristic that picked random instructions would
be a win overall - by construction, we can only ever win.
In other words: this scheduler is your older brother powering off the game
console any time he's about to lose a game, maintaining a 100% win rate.
[1] In reality, neither property is strictly satisfied due to the messy details
of mapping our clean logical model onto Intel's many weird physical register
files. Nevertheless, the algorithm is well-motivated and the empirical results
on Xe2 are excellent.
SIMD16:
Totals:
Instrs: 2754194 -> 2753957 (-0.01%); split: -0.23%, +0.22%
CodeSize: 41094768 -> 41092768 (-0.00%); split: -0.23%, +0.23%
Number of spill instructions: 1724 -> 1129 (-34.51%)
Number of fill instructions: 1912 -> 1119 (-41.47%)
Totals from 168 (6.35% of 2647) affected shaders:
Instrs: 850994 -> 850757 (-0.03%); split: -0.75%, +0.73%
CodeSize: 12825680 -> 12823680 (-0.02%); split: -0.74%, +0.73%
Number of spill instructions: 1724 -> 1129 (-34.51%)
Number of fill instructions: 1912 -> 1119 (-41.47%)
SIMD32:
Totals:
Instrs: 4688858 -> 4557800 (-2.80%); split: -3.53%, +0.74%
CodeSize: 70177200 -> 68214816 (-2.80%); split: -3.53%, +0.74%
Number of spill instructions: 50316 -> 45795 (-8.99%); split: -9.56%, +0.57%
Number of fill instructions: 51526 -> 45075 (-12.52%); split: -13.23%, +0.71%
Totals from 819 (30.94% of 2647) affected shaders:
Instrs: 3810182 -> 3679124 (-3.44%); split: -4.35%, +0.91%
CodeSize: 57044000 -> 55081616 (-3.44%); split: -4.35%, +0.91%
Number of spill instructions: 49264 -> 44743 (-9.18%); split: -9.76%, +0.58%
Number of fill instructions: 50182 -> 43731 (-12.86%); split: -13.58%, +0.73%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
logically it doesn't matter because we'll bail on a later check, but this is
still UB and therefore releases nasal demons.
i am jealous of Faith's Rust compilers. there, I said it.
==107281== Conditional jump or move depends on uninitialised value(s)
==107281== at 0x7069768: propagate_backwards (jay_opt_propagate.c:327)
==107281== by 0x7069768: jay_opt_propagate_backwards (jay_opt_propagate.c:367)
==107281== by 0x7058960: jay_compile (jay_from_nir.c:2677)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Challenging to hit but fixes
dEQP-GLES3.functional.shaders.swizzle_math_operations.vector_multiply.mediump_ivec4_wzyx_zyxw_fragment
with scheduling changes.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
on top of scheduler changes, compile-time of shaders/blender/1017.shader_test:
Difference at 95.0% confidence
-0.00173202 +/- 0.00116931
-0.791537% +/- 0.532384%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
We generalize the sample_mask_in lowering to handle this too.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
In shader-db, with `-p skl`, shaders/0ad/12.shader_test does not
compact an instruction because precompact overwrites portions of the
instruction. (Treating the three source instruction as a two source
when accessing instruction fields.)
This instruction could be compacted:
mad(8) g65<1>F g61<4,4,1>F g64<4,4,1>F -g17<4,4,1>F { align16 1Q };
But, since precompact erroneously sets bits, the instruction isn't
compacted.
Fossil testing:
* Tested with 0a3f3fd193 ("brw: drop unused color_outputs_valid
key") reverted, as fossils are currently producing inconsitent
results otherwise.
* Tested skl, icl, dg2, mtl, lnl, bmg and ptl. Only skl had a change.
SKL:
Totals:
CodeSize: 8335219296 -> 8320248992 (-0.18%)
Totals from 359508 (14.42% of 2492689) affected shaders:
CodeSize: 2838254352 -> 2823284048 (-0.53%)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41588>
In case of nir_intrinsic_load_inline_data_intel it was not using base_offset to
create the uniform, instead it was using only the special BRW_INLINE_PARAM_REG
value that later will be replaced by the inline_data fixed register.
So here using base_offset for both intrinsics, adding BRW_INLINE_PARAM_REG if
nir_intrinsic_load_inline_data_intel and then in brw_shader::assign_curb_setup
checking for inst->src[i].nr >= BRW_INLINE_PARAM_REG and adjusting brw_reg by
the remaining of the subtraction with BRW_INLINE_PARAM_REG.
Fixes: 7f19814414 ("brw/nir: handle inline_data_intel more like push_data_intel")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41607>
We'll get three new opcodes to properly model float multiply-add.
ffma_old is temporary and will be deleted at the end of this series.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165>
Passes the ACCESS_CAN_REORDER flag from NIR on to the backend so that we
can lower the loads to a non-volatile SEND. This allows the scheduler to
freely reorder them around stores or fences.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
Our scheduler is overly conservative about reordering instructions
around memory writes or fences. Fortunately, there are several simple
assumptions we can make about our IR to schedule these things a lot
more fluidly:
* Unless its an EOT, a SEND instruction's side effects will only be
observed through other SEND instructions
* The effects of workgroup barriers, memory fences, and BRW_OPCODE_SYNC,
are only used in the IR to synchronize SEND instructions
* All other scheduler dependencies related to memory access are already
expressed through the source and destination operands
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41008>
We can determine used components earlier.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41501>
SEND operands don't have regions or types, hardware don't use those
bits except for possibly an old workaround. So from the perspective
of assembler, we shouldn't need to add them. For now brw_asm grammar
requires at least a type, so normalize to UD.
This will make easier to swap the parser syntax and code later.
Assisted-by: Pi coding agent (opus-4.7)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41456>
From the perspective of assembler, regions and types for ARF null are
not relevant -- so ignore them. We still have some validation relying
on the byte-stride of the destination, so keep those for now.
In the long run, if a certain Gfx version HW requires some specific
matching, the encoder (or the parser) should take care of it.
This change will make easier to swap the parser syntax and code later.
Assisted-by: Pi coding agent (opus-4.7)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41456>
This is a less obtuse error message for why things break.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
The hardware expects it to be present for every colour target.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
When lowering tg4 sparse testing to a non-gather opcode, we were adding
an explicit LOD 0 parameter. But we might already have a LOD or bias.
Fixes tests like:
dEQP-VK.glsl.texture_gather.basic.2d.rgba8.base_level.sparse_level_1_amd_lod
dEQP-VK.glsl.texture_gather.basic.2d.rgba8.base_level.sparse_level_1_amd_bias
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41535>
allows deleting piles of moves & pressure.
simd16 results:
Totals:
Instrs: 2759547 -> 2753358 (-0.22%); split: -0.29%, +0.06%
CodeSize: 41141280 -> 41071072 (-0.17%); split: -0.23%, +0.06%
Totals from 332 (12.54% of 2647) affected shaders:
Instrs: 648080 -> 641891 (-0.95%); split: -1.23%, +0.28%
CodeSize: 9782272 -> 9712064 (-0.72%); split: -0.97%, +0.25%
simd32 is a loss because of RA being stupid. again, this is obviously the right
thing to do so we're doing it. stats are just a hint.
Totals:
Instrs: 4683556 -> 4689193 (+0.12%); split: -0.25%, +0.37%
CodeSize: 70072256 -> 70171920 (+0.14%); split: -0.23%, +0.38%
Number of spill instructions: 50320 -> 50316 (-0.01%)
Number of fill instructions: 51530 -> 51526 (-0.01%)
Totals from 351 (13.26% of 2647) affected shaders:
Instrs: 1349954 -> 1355591 (+0.42%); split: -0.86%, +1.28%
CodeSize: 20484224 -> 20583888 (+0.49%); split: -0.80%, +1.29%
Number of spill instructions: 21762 -> 21758 (-0.02%)
Number of fill instructions: 26328 -> 26324 (-0.02%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41510>