Here we change things to simply clone the entire hash table. This
is much faster than trying to rebuild it and is needed to avoid
slow compilation of very large shaders.
The Blender shader compile time in issue #9326 improves as folows:
251.29 seconds -> 151.09 seconds
The CTS test dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
improves as follows:
2.38 seconds -> 1.67 seconds
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
There is no point doing an expensive clone of the copies if the
if-branch is empty.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
Nothing produces them any more, so remove them from NIR. This massively reduces
the size of nir_src, which should improve performance all over.
nir_src size reduced from 56 bytes -> 40 bytes (pahole results on arm64, x86_64
should be similar.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24253>
Non of the know etnaviv GPUs support this feature in hardware
and the binary blob provides sizes via uniforms too.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24217>
From OpenGL ES 3.0 spec, page 56:
"Binding more than one attribute name to the same location
is referred to as aliasing, and is not permitted in OpenGL
ES Shading Language 3.00 vertex shaders. LinkProgram will
fail when this condition exists. However, aliasing is
possible in OpenGL ES Shading Language 1.00 vertex shaders.
This will only work if only one of the aliased attributes
is active in the executable program, or if no path through
the shader consumes more than one attribute of a set of
attributes aliased to the same location. A link error can
occur if the linker determines that every path through the
shader consumes multiple aliased attributes, but implemen-
tations are not required to generate an error in this case."
So here we make sure to allow the optimisations before validation
for earlier ES shader versions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: 80c001013c ("glsl: do vs attribute validation in NIR linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9342
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24205>
It can't be 0 in Vulkan.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24158>
Lowers tess_coord to tess_coord_xy and math. Based on ir3's version.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24159>
This intrinsic (vec2 tess_coord) is generally useful for non-r600 backends.
Promote it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24159>
This was looking at the wrong sources. src0 is the condition.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Fixes: 72ac3f6026 ("nir: add nir_unsigned_upper_bound and nir_addition_might_overflow")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23990>
Read-after-write hazards require special handling on AGX, since image loads are
implemented with texturing. Add intrinsics to handle these hazards.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24148>
Consider code like:
32x4 %2 = @load_interpolated_input (%1, %0 (0x0)) (base=0, component=0, dest_type=float32, io location=VARYING_SLOT_VAR0 slots=1 mediump) // Color
32x4 %3 = fabs %2
32x4 %4 = fsat %3
32x4 %5 = fsin %4
The existing logic would incorrectly tell the backend that both fabs and fsat
could be folded, and then half the shader disappears. Whoops. Fix by stopping
the folding in this case. I choose to do this check in the fsat rather than the
fabs because it's more straightforward (1 source vs N uses) but it's somewhat
arbitrary.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24116>
Consider the IR:
%0 = load_reg
%1 = fneg %0
%2 = ffloor %1
%3 = bcsel .., .., %1
Because the fneg has both foldable and non-foldable users, nir/legacy does not
fold the fneg into the load_reg. This ensures that the backend correctly emits a
dedicated fneg instruction (with the load_reg folded in) for the bcsel to use.
However, because the chasing helpers did not previously take other uses of a
modifier into account, the helpers would fuse in the fneg to the ffloor. Except
that doesn't work, because the load_reg instruction is supposed to be
eliminated. So we end up with broken chased IR:
1 = fneg r0
2 = ffloor -NULL
3 = bcsel, ..., 1
The fix is easy: only fold modifiers into ALU instructions if the modifiers can
be folded away. If we can't eliminate the modifier instruction altogether, it's
not necessarily beneficial to fold it anyway from a register pressure
perspective. So this is probably ok. With that check in place we get correct IR
1 = fneg r0
2 = ffloor 1
3 = bcsel, ..., 1
Fixes carchase/230.shader_test under softpipe.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24116>
This avoids the silly compiler versions. Some bits are slightly more
complicated, because they have to account for inverted enum values (rather than
a separate invert bit), but this is a LOT friendlier to drivers using the pass
and it makes the pass itself more readable.
The conversion functions in panfrost/panvk will go away momentarily.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24076>
Convert it to an opt-in for backends to prefer and use nir_load_texture_scale
instead of txs for nir lowerings.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Suggested-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24054>
Non-SSA functionality will become obsolete after nir_register is removed, so
there's no need to keep the tests around, and they will interfere with the
nir_register de-clawing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
It is now unused, as all internal producers of registers have been switched over
to intrinsics and no drivers call it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
Yet another internal use of nir_register that gets lowered back to SSA after the
pass. Easy enough to replace with intrinsic-based registers instead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
This is a variant of nir_lower_vec_to_movs that produces register intrinsics
(store_reg with write masks) instead of masked moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
This isn't so bad. I still duplicated the pass because it makes a lot easier to
have them coexist, switch users over one by one, and then garbage collect the
old when we're done.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
At this point, everything is SSA. Also, NIR no longer allows different
numbers of components on the two sides of a phi so we can just assert
rather than trying to gracefully handle mismatches.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
A number of passes lower SSA partially to registers, do work that would be
invalid in SSA, and then go back into SSA with nir_lower_regs_to_ssa. As a step
towards replacing nir_register with intrinsics,
the nir_lower_{phis,ssa_defs}_to_regs passes are changed to produce intrinsics
instead of nir_registers, and their callers are updated to call
nir_lower_reg_intrinsics_to_ssa instead of nir_lower_regs_to_ssa to compensate.
Jointly authored with Faith.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
It doesn't do anything yet. We leave that to the subsequent patches so we can
keep the tree-wide refactor as simple as possible.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
in the sense of operating on register intrinsics instead of nir_registers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
These are registerful versions of core nir_src/nir_dest which will become
SSA-only soon enough, and modifierful versions of nir_alu_src/nir_alu_dest.
The latter will let us remove modifiers from nir_alu_instr finally.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
After running the pass, all register access intrinsics are guaranteed to be
"trivial" in the sense that the program is free of hazards preventing
propagating them away without inserting any copies.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
Note the writemask handling is chosen for consistency with the rest of NIR. In
every other instance, writemask=w requires a vec4 source. This is hardcoded into
nir_validate and nir_print as what it means to have a writemask.
More importantly, consistency with how register writemasks currently work.
nir_print hides it, but r0.w = fneg ssa_1.x is actually a vec4 instruction with
source ssa_1.xxxx. As a silly example nir_dest_num_components(that) = 4 in the
old model. I realize this is quite strange coming from a scalar ISA, but it's
perfectly natural for the class of vec4 hardware for which this was designed. In
that hardware, conceptually all instructions are vec4`, so the sequence "fneg
ssa_1 and write to channel w" is implemented as "fneg a vec4 with ssa_1.x in the
last component and write that vec4 out but mask to write only the w channel".
Isn't this inefficient? It can be. To save power, Midgard has scalar ALUs in
addition to vec4 ALUs. Those details are confined to the backend VLIW scheduler;
the instruction selection is still done as vec4. This mechanism has little in
common with AMD's SALUs. Midgard has a wave size of 1, with special hacks for
derivatives.
As a result, all backends consuming register writemasks are expecting this
pattern of code. Changing the store to take a vec1 instead of a vec4 would
require changing every backend to reswizzle the sources to resurrect the vec4. I
started typing a branch to do this yesterday, but it made a mess of both Midgard
and nir-to-tgsi. Without any good reason to think it'd actually help
performance, I abandoned the idea. Getting all 15 backends converted to the
helpers is enough of a challenge without forcing 10 backends to reswizzle their
sources too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
Previously, nir_opt_dead_cf could skip dead CF nodes because overwriting
cur after dead_cf_block is not enough to cover the whole CF list.
foreach_list_typed would select the next node, skipping the node that
previously made progress:
block 1
if (true) {}
block 2
if (true) {}
block 3
if (true) {}
Would turn into:
block 1, then, block 2
if (true) { }
block 3, then
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22064>
This avoids spilling deref instructions by wrapping shader calls inside
dummy blocks, rematerializing derefs in their use blocks and removing
the dummy blocks.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22064>
`y_vu` will be used to convert NV21 to RGB.
`yv_yu` will be used to convert YVYU and VYUY to RGB when the
subsampling formats PIPE_FORMAT_R8B8_R8G8 and PIPE_FORMAT_B8R8_G8R8
are supported.
`yx_xvxu` and `xy_vxux` will be used to convert YVYU and VYUY to RGB
when those subsampling formats are not supported.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21219>
OpenCL 3.0 core requires __opencl_c_subgroups to be set, the OpenCL
cl_khr_subgroups extenions can only be enabled if and only if the driver
guarentees independent forward progress between subgroups.
See CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS for more information.
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22893>