v1. Replace the existed bool type with new bitfield and edit register
files to take a mask instead of duplicating codes to do masking.
v2. Use fragcoord_compmask != 0 instead of fragcoord_compmask > 0 since
it represents a bitfield.
Tested with
dEQP-VK.glsl.builtin_var.simple.fragcoord_xyz/w
dEQP-GLES2.functional.shaders.builtin_variable.fragcoord_xyz/w
Closes: #2680
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4723>
Apparently this is allowed, and the CTS started doing this more often
recently which resulted in frequent hangs running the entire CTS. I
copied the code to create an empty FS from radv.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4928>
For compressed formats, we need to align the number of blocks, not the
logical number of pixels in the texture. Only compressed formats have
block width/height > 1, so we can just unconditionally multiply the
alignment by the block width/height.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4868>
robclark noted that the blob wasn't doing range reduction in the mediump
case, and I confirmed it on
dEQP-GLES3.functional.shaders.operator.angle_and_trigonometry.sin.mediump_float_fragment
vs
dEQP-GLES3.functional.shaders.operator.angle_and_trigonometry.sin.highp_float_fragment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4893>
These come from the disasm tests, and fix our disasm of blob's
uniform/nonuniform cat6 operands. We also now include human-readable names
for all the modes we know about (though bindless gets distinguished by its
.baseN, like Connor's original disasm).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4857>
With this I also brought in a few new control flow instruction disasm
tests that I'd made back when I wrote the disasm test, but which were too
far from correct to include until now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4857>
We can remove a bunch of conditional code at key comparison time by
computing a bitmask of used key bits at ir3_shader creation time. This
also gives us a nice place to put additional key simplification to reduce
how many variants we create (like skipping rastflat if we don't read
colors in the FS, or skipping vclamp_color if we don't write colors).
It does mean walking the whole key to AND it, but the key is just 28 bytes
so far so that seems pretty fine.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We weren't filling in the tess mode of the key, or setting has_gs on GS
shaders, resulting in assertion failures when NIR intrinsics didn't get
lowered.
We have to make a guess at prim mode for TCS, but it should be better to
have some shader-db coverage than none, and it will avoid these failures
happening when we start precompiling shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
Some of the negative API tests make shaders for tess stages that don't do
all the stores they need to. Once we start precompiling (or doing
shader-db of tess), we need to at least not segfault when generating them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We were failing to tell the allocator about the restriction that scalar
texture instructions (allocated as scalar regs) couldn't be allocated such
that the start of the full unwritemasked vector started before r0. There
was a patch in select_reg_callback on a6xx that tried to work around that,
but you could still end up backed into a corner you shouldn't be because
we didn't tell the RA what it needed.
Fixes compiler assertion failures on a300-a400's blit_z shader, used for
Z32F gmem blits.
Looks like as a result we get tighter register allocation but more nops:
instructions in affected programs: 757945 -> 760356 (0.32%)
nops in affected programs: 317983 -> 320468 (0.78%)
non-nops in affected programs: 27525 -> 27451 (-0.27%)
mov in affected programs: 3098 -> 3023 (-2.42%)
dwords in affected programs: 109664 -> 110656 (0.90%)
last-baryf in affected programs: 112701 -> 112847 (0.13%)
full in affected programs: 4326 -> 4011 (-7.28%)
sstall in affected programs: 120550 -> 120836 (0.24%)
(ss) in affected programs: 13939 -> 13918 (-0.15%)
(sy) in affected programs: 3006 -> 2786 (-7.32%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
When the GS lowering was working on store_output intrinsics, we had to
clean up the split vars to avoid getting confused. Now that we shadow
the output vars instead, there's no confusion and we can drop this
hack.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We mostly got away with replacing a store_output with a store_var, but
for complex types like structs, that doesn't work. Once the IO has
been lowered from vars to intrinsic, we've lost the deref chains and
can't properly shadow the outputs.
This commits moves the GS lowering up so we do it before the output
variables get lowered to store_output. This way the pass works much
like nir_lower_io_to_temporaries() and cleanly shadows the outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
This pass lowers per-vertex input intrinsics to load_shared_ir3. This
was open coded in the TCS and GS lowering passes before - this way we
can share it. Furthermore, we'll need to run the rest of the GS
lowering earlier (before lowering IO) so we need to split off this
part that operates on the IO intrinsics first.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
If use NIR's 1-bit bool representation , we get exactly the bool behavior
the hardware provides: CMPS produces true or false, AND/OR/XOR work as
intended without extra absnegs, and we can pass those half values directly
to other CMPS. We emit an absneg for b2b1 ("turn a memory load into a
1-bit NIR boolean"), but we would have done so for the ir3_n2b() on the
use of that value anyway. The most awkward bit is that inot(a@1) is now a
sub(1, a), but we can encode the 1 as an immediate so it's fine.
No significant changes to GL_TIME_ELAPSED on my set of traces (n=21).
instructions in affected programs: 1570638 -> 1548702 (-1.40%)
nops in affected programs: 624053 -> 611381 (-2.03%)
non-nops in affected programs: 959061 -> 949797 (-0.97%)
mov in affected programs: 5258 -> 5252 (-0.11%)
cov in affected programs: 15099 -> 15902 (5.32%)
dwords in affected programs: 469600 -> 452768 (-3.58%)
last-baryf in affected programs: 162211 -> 154726 (-4.61%)
full in affected programs: 4881 -> 4797 (-1.72%)
sstall in affected programs: 173953 -> 174545 (0.34%)
(ss) in affected programs: 10922 -> 10934 (0.11%)
(sy) in affected programs: 728 -> 745 (2.34%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4518>
Similar to OUT_REG(), this has the benefits of:
1. No more messing up pkt size
2. Detects errors of mixing up the order of dwords in the packet
3. Optimizes to more efficient code
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4813>
The existing structure dates back to when this code was part of libdrm,
and we wanted some of this not to be exposed as ABI between libdrm and
mesa. Now that this is no longer a constraint, inline things.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4813>
The new option replaces the two other _split lowering options, since
there's no need for separate options.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>