This will allow us to encode properties about the load/store ops like we
do for ALU ops. We include now properties about whether we have a store,
and if there are special cases on the load/store op. We also tag each
instruction by its natural size... this is probably not totally right,
but it's a start.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The trick is realizing even with a destination override, the masks are encoded in the same mode as the
instruction itself, rather than stepping down. The override means that
the smaller type is used, but the mask is parsed as if it were the
higher type. Overriding down is down by printed by blinding doing this. Overriding up can be thought of as printing in the upper size, but shifting the alphabet to use the upper half, i.e. shifting xyzw to become abcd.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Add some comments explaining what's going on in a more natural flow in
order to solve the actual bug.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 2d914ebe81 ("pan/midgard: Fix memory corruption in register spilling")
It doesn't make sense. You already spilled it once, and it didn't help.
Don't try again, or you'll end up in a loop.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Essentially an off-by-one error ... bit of an edge case, but seems to
occur in some glamor shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't really need to impose this condition, but we do need to cope
with the slightly more general case.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have live_out calculated per block as metadata, calculating
liveness of an instruction at a given point in the program becomes O(n)
to the size of the block worst-case, rather than O(n) the program.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Callers should have liveness info ready. Ideally we'd have a nice
metadata tracking framework like NIR to handle this automatically, but
for now this will allow us to make forward progress... when we're about
to do something with liveness, invalidate everything ahead to force a
clean calculation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to explicitly invalidate liveness analysis results so
we can cache liveness results.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
By definition, once liveness analysis has occurred:
live_out = OR {succ} succ->live_in
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There are unfortunately two distinct liveness analysis passes in the
compiler right now -- one good (but complex) pass used by RA based on
solving data flow equations, and one awful (but simple) pass used for
dead code elimination and bundling based on an abstract walk of the AST.
Let's move RA's pass into shared code so we can work on unifying.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to fill in ctx->temp_count explicitly, even if we haven't
squished down the MIR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We already enforce this with the SSA/register distinction in the
backend. There is no need to duplicate this logic merely for an assert.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have constant adjustment logic abstracted, we can do this
safely. Along with the csel inversion patch, this allows many more
common csel ops to inline their condition in the bundle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we can reuse constant slots from other instructions, we would like to
do so to include more instructions per bundle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If an instruction could be scheduled to vmul to satisfy the writeout
conditions, let's do that and save an instruction+cycle per fragment
shader.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We still emit in-order but we switch to using the bundles created from
the new scheduler, which will allow greater flexibility and room for
out-of-order optimization.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We require chosen instructions to be "close", to avoid ballooning
register pressure. This is a kludge that will go away once we have
proper liveness tracking in the scheduler, but for now it prevents a lot
of needless spilling.
v2: Lower threshold to 6 (from 8). Schedule is hurt, but a few shaders
that spilled excessively are fixed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Derp
We can bundle two load/store together. This eliminates the need for
explicit load/store pairing in a prepass, as well.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Conditions for branches don't have a swizzle explicitly in the emitted
binary, but they do implicitly get swizzled in whatever instruction
wrote r31, so we need to handle that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Conditional instructions (csel and conditional branches) require their
condition to be written to a special condition pipeline register (r31.w
for scalar, r31.xyzw for vector). However, pipeline registers are live
only for the duration of a single bundle. As such, the logic to schedule
conditionals correct is surprisingly complex. Essentially, we see if we
could stuff the conditional within the same bundle as the csel/branch
without breaking anything; if we can, we do that. If we can't, we add a
dummy move to make room.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A bit of a kludge but allows setting an implicit dependency of synthetic
conditional moves on the actual condition, fixing code generated like:
vmul.feq r0, ..
sadd.imov r31, .., r0
vadd.fcsel [...]
The imov runs simultaneous with feq so it gets garbage results, but it's
too late to add an actual dependency practically speaking, since the new
synthetic imov doesn't have a node associated.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In the future, we will want to keep track of which components of
constants of various sizes correspond to which parts of the bundle
constants, like in the old scheduler. For now, let's just stub it out
for a simple rule of one instruction with embedded constants per bundle.
We can eventually do better, of course.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't actually do any scheduling here yet, but add per-tag helpers to
consume an instruction, print it, pop it off the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's not always obvious what the optimal bundle type should be. Let's
break out the logic to decide.
Currently set for purely in-order operation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
After we've chosen an instruction, popped it off, and processed it, it's
time to update the worklist, removing that instruction from the
dependency graph to allow its dependents to be put onto the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In the future, this routine will implement the core scheduling logic to
decide which instruction out of the worklist will be scheduled next, in
a way that minimizes cycle count and register pressure.
In the present, we are more interested in replicating in-order
scheduling with the much-more-powerful out-of-order model. So rather
than discriminating by a register pressure estimate, we simply choose
the latest possible instruction in the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like to flatten a linked list of midgard_instructions into an
array of midgard_instruction pointers on the heap.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>