This commit makes three changes:
1. Default all newly created registers divergent because this is the
safer default.
2. Make divergence analysis do something sane with register divergence.
It's not perfect because divergence analysis isn't able to prove
registers divergent based on stores but at least if someone uses
registers a bit they'll end up with safe defaults. This matches
what they'd get with nir_ssa_def_init().
3. Make the load_reg() helper automatically propagate divergence from
the register. Because the defaults for both nir_ssa_def_init() and
nir_decl_reg() are to mark everything divergent, this only means
that nir_load_reg() of a uniform reg is now uniform.
Putting all these together, nir_from_ssa should now be producing
load_reg intrinsics with the proper uniform information.
Fixes: 7229bffcb1 ("nir: Add intrinsics for register access")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
In the case of:
halt
// succs: b9
if %618 {
block b3:// preds:
break
// succs: b6
} else {
block b4: // preds: , succs: b5
}
block b5: // preds: b4
32 %556 = iadd %617, %2 (0x1)
opt_constant_if() doesn't work because stitch_blocks() can't join blocks if the
before ends in a jump and the after isn't empty.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24235>
After the previous commit there are so few to reuse that this is no
longer worth doing and actually causes compilation to slow down.
The Blender shader compile time in issue #9326 improves as folows:
21.11 seconds -> 9.90 seconds
The CTS test dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
improves as follows:
0.92 seconds -> 0.68 seconds
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9326
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
Most of the variables in the hash table will never actually be looked up
for any given block so cloning every possible value just creates a bunch
of unrequired memcpy calls.
Here we change the code to only clone the copies array once it is
actually looked up for the first time.
The Blender shader compile time in issue #9326 improves as folows:
151.09 seconds -> 21.11 seconds
The CTS test dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
improves as follows:
1.67 seconds -> 0.92 seconds
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
If kill alias results in the hash table entry holding an empty
copies array then remove the hash entry and return the dynamic array
to the unused pool.
This helps avoid hash table size getting out of control in very large
shaders.
151.09 seconds -> 118.60 seconds
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
Here we change things to simply clone the entire hash table. This
is much faster than trying to rebuild it and is needed to avoid
slow compilation of very large shaders.
The Blender shader compile time in issue #9326 improves as folows:
251.29 seconds -> 151.09 seconds
The CTS test dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
improves as follows:
2.38 seconds -> 1.67 seconds
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
There is no point doing an expensive clone of the copies if the
if-branch is empty.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
Nothing produces them any more, so remove them from NIR. This massively reduces
the size of nir_src, which should improve performance all over.
nir_src size reduced from 56 bytes -> 40 bytes (pahole results on arm64, x86_64
should be similar.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24253>
Non of the know etnaviv GPUs support this feature in hardware
and the binary blob provides sizes via uniforms too.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24217>
It can't be 0 in Vulkan.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24158>
Lowers tess_coord to tess_coord_xy and math. Based on ir3's version.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24159>
This intrinsic (vec2 tess_coord) is generally useful for non-r600 backends.
Promote it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24159>
This was looking at the wrong sources. src0 is the condition.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Fixes: 72ac3f6026 ("nir: add nir_unsigned_upper_bound and nir_addition_might_overflow")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23990>
Read-after-write hazards require special handling on AGX, since image loads are
implemented with texturing. Add intrinsics to handle these hazards.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24148>
Consider code like:
32x4 %2 = @load_interpolated_input (%1, %0 (0x0)) (base=0, component=0, dest_type=float32, io location=VARYING_SLOT_VAR0 slots=1 mediump) // Color
32x4 %3 = fabs %2
32x4 %4 = fsat %3
32x4 %5 = fsin %4
The existing logic would incorrectly tell the backend that both fabs and fsat
could be folded, and then half the shader disappears. Whoops. Fix by stopping
the folding in this case. I choose to do this check in the fsat rather than the
fabs because it's more straightforward (1 source vs N uses) but it's somewhat
arbitrary.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24116>
Consider the IR:
%0 = load_reg
%1 = fneg %0
%2 = ffloor %1
%3 = bcsel .., .., %1
Because the fneg has both foldable and non-foldable users, nir/legacy does not
fold the fneg into the load_reg. This ensures that the backend correctly emits a
dedicated fneg instruction (with the load_reg folded in) for the bcsel to use.
However, because the chasing helpers did not previously take other uses of a
modifier into account, the helpers would fuse in the fneg to the ffloor. Except
that doesn't work, because the load_reg instruction is supposed to be
eliminated. So we end up with broken chased IR:
1 = fneg r0
2 = ffloor -NULL
3 = bcsel, ..., 1
The fix is easy: only fold modifiers into ALU instructions if the modifiers can
be folded away. If we can't eliminate the modifier instruction altogether, it's
not necessarily beneficial to fold it anyway from a register pressure
perspective. So this is probably ok. With that check in place we get correct IR
1 = fneg r0
2 = ffloor 1
3 = bcsel, ..., 1
Fixes carchase/230.shader_test under softpipe.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24116>
This avoids the silly compiler versions. Some bits are slightly more
complicated, because they have to account for inverted enum values (rather than
a separate invert bit), but this is a LOT friendlier to drivers using the pass
and it makes the pass itself more readable.
The conversion functions in panfrost/panvk will go away momentarily.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24076>
Convert it to an opt-in for backends to prefer and use nir_load_texture_scale
instead of txs for nir lowerings.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Suggested-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24054>
Non-SSA functionality will become obsolete after nir_register is removed, so
there's no need to keep the tests around, and they will interfere with the
nir_register de-clawing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
It is now unused, as all internal producers of registers have been switched over
to intrinsics and no drivers call it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
Yet another internal use of nir_register that gets lowered back to SSA after the
pass. Easy enough to replace with intrinsic-based registers instead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
This is a variant of nir_lower_vec_to_movs that produces register intrinsics
(store_reg with write masks) instead of masked moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
This isn't so bad. I still duplicated the pass because it makes a lot easier to
have them coexist, switch users over one by one, and then garbage collect the
old when we're done.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
At this point, everything is SSA. Also, NIR no longer allows different
numbers of components on the two sides of a phi so we can just assert
rather than trying to gracefully handle mismatches.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
A number of passes lower SSA partially to registers, do work that would be
invalid in SSA, and then go back into SSA with nir_lower_regs_to_ssa. As a step
towards replacing nir_register with intrinsics,
the nir_lower_{phis,ssa_defs}_to_regs passes are changed to produce intrinsics
instead of nir_registers, and their callers are updated to call
nir_lower_reg_intrinsics_to_ssa instead of nir_lower_regs_to_ssa to compensate.
Jointly authored with Faith.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
It doesn't do anything yet. We leave that to the subsequent patches so we can
keep the tree-wide refactor as simple as possible.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
in the sense of operating on register intrinsics instead of nir_registers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
These are registerful versions of core nir_src/nir_dest which will become
SSA-only soon enough, and modifierful versions of nir_alu_src/nir_alu_dest.
The latter will let us remove modifiers from nir_alu_instr finally.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
After running the pass, all register access intrinsics are guaranteed to be
"trivial" in the sense that the program is free of hazards preventing
propagating them away without inserting any copies.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
Note the writemask handling is chosen for consistency with the rest of NIR. In
every other instance, writemask=w requires a vec4 source. This is hardcoded into
nir_validate and nir_print as what it means to have a writemask.
More importantly, consistency with how register writemasks currently work.
nir_print hides it, but r0.w = fneg ssa_1.x is actually a vec4 instruction with
source ssa_1.xxxx. As a silly example nir_dest_num_components(that) = 4 in the
old model. I realize this is quite strange coming from a scalar ISA, but it's
perfectly natural for the class of vec4 hardware for which this was designed. In
that hardware, conceptually all instructions are vec4`, so the sequence "fneg
ssa_1 and write to channel w" is implemented as "fneg a vec4 with ssa_1.x in the
last component and write that vec4 out but mask to write only the w channel".
Isn't this inefficient? It can be. To save power, Midgard has scalar ALUs in
addition to vec4 ALUs. Those details are confined to the backend VLIW scheduler;
the instruction selection is still done as vec4. This mechanism has little in
common with AMD's SALUs. Midgard has a wave size of 1, with special hacks for
derivatives.
As a result, all backends consuming register writemasks are expecting this
pattern of code. Changing the store to take a vec1 instead of a vec4 would
require changing every backend to reswizzle the sources to resurrect the vec4. I
started typing a branch to do this yesterday, but it made a mess of both Midgard
and nir-to-tgsi. Without any good reason to think it'd actually help
performance, I abandoned the idea. Getting all 15 backends converted to the
helpers is enough of a challenge without forcing 10 backends to reswizzle their
sources too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
Previously, nir_opt_dead_cf could skip dead CF nodes because overwriting
cur after dead_cf_block is not enough to cover the whole CF list.
foreach_list_typed would select the next node, skipping the node that
previously made progress:
block 1
if (true) {}
block 2
if (true) {}
block 3
if (true) {}
Would turn into:
block 1, then, block 2
if (true) { }
block 3, then
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22064>
This avoids spilling deref instructions by wrapping shader calls inside
dummy blocks, rematerializing derefs in their use blocks and removing
the dummy blocks.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22064>
`y_vu` will be used to convert NV21 to RGB.
`yv_yu` will be used to convert YVYU and VYUY to RGB when the
subsampling formats PIPE_FORMAT_R8B8_R8G8 and PIPE_FORMAT_B8R8_G8R8
are supported.
`yx_xvxu` and `xy_vxux` will be used to convert YVYU and VYUY to RGB
when those subsampling formats are not supported.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21219>