The old delay calculation relied on the SSA information staying around,
and wouldn't work once we start introducing phi nodes and making
"normal" values defined in multiple blocks not array regs anymore.
What's worse is that properly inserting phi nodes when splitting live
ranges would make that code even more complicated, and this was the last
place post-RA that actually needed that information.
The new version only compares the physical registers of sources and
destinations. It works by going backwards up to a maximum number of
cycles, so it might be slightly slower when the definition is closer but
should be faster when it is farther away.
To avoid complicating the new method, the old method is kept around, but
only for pre-RA scheduling and it can therefore be drastically
simplified as the array case can be dropped.
ir3_delay_calc() is split into a few variants to avoid an explosion of
boolean arguments in users, especially now that merged_regs now has to
be passed to it.
The new method is a little more complicated when it comes to handling
(rptN), because both the assigner and consumer may be (rptN). This adds
some unit tests for those cases, in addition to dropping the to-SSA code
in the test harness since it's no longer needed.
Finally, ir3_legalize has to be switched to using physical registers for
the branch condition. This was the one place where IR3_REG_SSA remained
after RA.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
In particular, make sure they have a physreg assigned. This was the last
place after RA where SSA registers were created, which won't work with
the new post-RA delay calculation that relies on the physreg.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
There were two different approaches I saw in the post-RA code for
figuring out what regiser range a relative access touched:
1. Use reg->array.offset and reg->array.size. This is wrong in case
reg->array.offset was non-zero before RA, because array.size is
the size of the whole array and array.offset has the const offset
within the array baked in.
2. Lookup the array from the array ID and use the base + range there.
This is correct, but won't work with the new RA, where an array might
not always be assigned to the same register.
This replaces both methods with a new ir3_register::array.base field,
and switches all the users I could find to it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
To simplify the pre-RA merge set code and express the result live-range
splitting in RA, we need to add support for parallel copy instructions,
and for the merge set code these parallel copies need to be in SSA form.
Parallel copies have multiple destinations by necessity, but there was
no way to express this in the existing IR. In particular there was no
support for marking a register as being a destination, and no support
for indicating which destination register out of several an SSA source
refers to. This replaces ir3_register::instr with ir3_register::def and
re-purposes ir3_register::instr. I haven't propagated this into common
helpers, like ssa(), because that would vastly increase the amount of
churn and the number of places that produce such instructions should be
limited -- only RA will create parallel copies and they will be
destroyed right after RA. In the future swz will have multiple
destinations too, but it will only be created after RA via parallel copy
lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
Haven't seen these tests flake, and we don't even run dEQP-GLES2 on G52
in CI anymore. (I still do local runs, and I don't see them flake
there.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
We have CI, we're just a few tests away from conformance on v7, and
Midgard is just a few hundred tests behind. Given the branch point isn't
for another month, I think this is a good time to flip the switch.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Suboptimal but fixes KHR-GLES31.core.compute_shader.pipeline-post-xfb,
which is stubbornly still broken with memory barriers implemented and
cache flush jobs inserted. More investigation needed but probably not
right now.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
This is inefficient but so far I see the DDK doing the same thing. Fixes
KHR-GLES31.core.shader_storage_buffer_object.advanced-usage-sync-vsfs
In the future we should look into cache flush jobs.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Transform feedback, SSBO writes, and image writes in particular can
affect this and have bad interactions. Fixes
KHR-GLES31.core.shader_atomic_counters.basic-usage-vs
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Instead just group the fields about validity into a simpler structure in
panfrost_resource. Panvk can do the same. Common code shouldn't be
thinking in terms of this 'larger' structure anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Job type is alone with bitsize in the bottom byte of the addressed
worse, so if we use an 8-bit store we avoid the RMW complexity.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
In actuality, this just shadows the crc_valid for pan_cs... the
data_valid checks are contained in the caller and just add noise.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
CRC is intended for final render targets and especially for UI, not the
kind of things you'd mipmap. Meanwhile CRC only works for a single
level, meaning at any given point, half the CRC buffer would be wasted
for a full miptree.
"Arm Mali Best Practices Guide" tells developers that the DDK only
enables CRC for non-mipmapped resources (at least the Vulkan DDK), so
let's do the same, save some memory, and simplify our code.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Somehow XFB gets so little use we never noticed. Fixes:
KHR-GLES31.core.vertex_attrib_binding.basic-input-case9
KHR-GLES31.core.vertex_attrib_binding.basic-input-case11
KHR-GLES31.core.vertex_attrib_binding.basic-inputI-case2
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
It's pointless and confuses the hardware. Fixes (on Bifrost)
KHR-GLES31.core.draw_buffers_indexed.color_masks
Yes, this is a silly edge case. Yes, we still have to handle it
correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>
Based on a misunderstanding of how the scissor test works, and in
particular breaks transform feedback and SSBO writes from vertex
shaders.
Replace it with a moral equivalent to rasterizer_discard so vertex
shaders still run.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11123>