Scalarizing preambles in NIR isn't really necessary, we can do it more
efficiently in the backend. This makes the final NIR a lot less annoying to
read; the backend IR was already nice to read thanks to all the scalarized moves
being copypropped. Plus, this is a lot simpler.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21122>
Move the decision of "can I copyprop this uniform?" from copyprop to a
standalone lowering pass. This is more straightforward and will enable the next
patch. This has the side effect of sinking load_preamble instructions, for a
nice reduction in register pressure. Instruction count increase is from
rematerializing some moves, which should be more than balanced out by the
reduced register pressure.
total instructions in shared programs: 1523285 -> 1523317 (<.01%)
instructions in affected programs: 1148 -> 1180 (2.79%)
helped: 0
HURT: 13
HURT stats (abs) min: 1.0 max: 4.0 x̄: 2.46 x̃: 2
HURT stats (rel) min: 0.69% max: 7.69% x̄: 3.65% x̃: 2.61%
95% mean confidence interval for instructions value: 1.78 3.14
95% mean confidence interval for instructions %-change: 2.16% 5.15%
Instructions are HURT.
total bytes in shared programs: 10444532 -> 10444724 (<.01%)
bytes in affected programs: 7386 -> 7578 (2.60%)
helped: 0
HURT: 13
HURT stats (abs) min: 6.0 max: 24.0 x̄: 14.77 x̃: 12
HURT stats (rel) min: 0.63% max: 7.14% x̄: 3.40% x̃: 2.48%
95% mean confidence interval for bytes value: 10.68 18.85
95% mean confidence interval for bytes %-change: 2.02% 4.78%
Bytes are HURT.
total halfregs in shared programs: 419444 -> 416434 (-0.72%)
halfregs in affected programs: 27080 -> 24070 (-11.12%)
helped: 634
HURT: 0
helped stats (abs) min: 1.0 max: 30.0 x̄: 4.75 x̃: 2
helped stats (rel) min: 2.90% max: 54.55% x̄: 13.13% x̃: 8.51%
95% mean confidence interval for halfregs value: -5.08 -4.41
95% mean confidence interval for halfregs %-change: -14.03% -12.23%
Halfregs are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21122>
We can avoid reading both width and height when the texture is a cube map, and
we do so more simply by relying on CSE+DCE (Alyssa).
Closes: #7541
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20628>
Match iadd(x, #y). The format shift will get constant-folded away and, if y
is sufficiently small, the constant will be inlined by the AGX backend
optimizer. This gets rid of piles of 64-bit arithmetic from lowering UBOs. It
probably doesn't matter for perf since that's happening in preamble shaders but
it *is* noisy.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21108>
The offset is in vec4s, not words (unlike the component). This doesn't matter
right now since we get everything lowered (offset -> 0) but it will come up if
we implement clip distances natively (instead of lowering in FS).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21097>
This works around bugs in a LOT of applications, since fp16 texture coordinates
are almost never appropriate even though it's a valid implementation of the GLES
spec. It also doesn't seem to matter for perf.
Code from the Bifrost compiler which implements the same workaround for slightly
different reasons.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21082>
Builds on the work of !15121. This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on 1a35acd8d9.
v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.
v4: Another rebase. Remove f2b stuff from Midgard.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
This pass needs to run early (because it depends on early I/O), but it doesn't
actually need the shader key. Why not? If we overestimate the number of render
targets, extra store_output intrinsics will be generated, but they will be
deleted by AGX tilebuffer lowering later.
Note we'll probably want something smarter than this for fragment epilogues in
the future to avoid piles of unnecessary moves.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21065>
Lowering buffer textures will interact with multiple of our existing lowerings,
and it's convenient to have it all in one place. This also keeps the pass
ordering dependencies centralized.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21060>
nir_opt_preamble is now aware of the internal uniforms we insert, so it can use
the whole uniform file available to it. This lets us push more (all?) uniform
loads in Dolphin ubershaders to the preamble.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562>
To comply with The Ekstrand Rule.
AGX has a large number of "uniform registers" available. These may be loaded
with arbitrary ranges of GPU memory by the driver, or they can be written by the
preamble shader. Currently, the compiler runs nir_opt_preamble on the first half
of the uniform file, and then translates NIR sysvals to moves from the second
half of the uniform file, passing back a uniform->sysval map for the GL driver
to respect. This has (at least) two issues:
* Since nir_opt_preamble runs before gathering sysvals, it has to assume the
maximum number of sysvals are pushed, which can prevent it from moving some
computation to the preamble due to running out of partitioned uniform registers.
This is a problem for Dolphin's ubershaders, though it's unclear how much it
matters for Dolphin perf.
* This violates The Ekstrand Rule and apparently will be a problem for our
Vulkan driver. I'm just a compiler+GL girl, so I wouldn't know.
To fix this, we invert the order of operations. At the end of this series, we
instead lower NIR system values to NIR load_preamble instructions in the GL
driver. The compiler just translates directly to uniform registers reads. The
Vulkan driver will need its own version of this code, but maybe it can do
something clever and descriptor set aware.
This means that there will already be some load_preamble instructions when
nir_opt_preamble runs, so I've made minor changes to nir_opt_preamble to handle
that gracefully. This is a bit lazy... The alternative is to introduce a
`load_uniform_agx` intrinsic which `load_preamble` gets lowered to trivially.
But that's another pass over the IR (and due to AGX's shader variant hell I'm
sensitive to backend compile time) and it would be more complicated than what's
implemented here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Ella Stanforth <ella@iglunix.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562>
Fixes assertion fails in piglit isinf-and-isnan, which uses a constant infinity,
which has an out-of-bounds mantissa (but the function contract says that's
fine and we just return something undefined.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20563>
Rely on the common address arithmetic optimizations. We don't need the
special formats for UBO loads anyway, so this is simpler and optimizes
out the ushr.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20558>
It is invalid to use both sample_mask and zs_emit in the same shader. We'll need
to do something similar for sample mask writes.
Fixes Dolphin ubershaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
The exact semantics of sample_mask aren't quite clear to me yet, but executing
multiple sample_mask instructions seems to raise a fault :|
Fixes SuperTuxKart's advanced renderer.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
Must be used at the end of a vertex shader that does NOT write any varyings, has
rasterizer discard enabled, and is run only for its side effects.
The encoding looks like st_var, but I don't know what this actually *does*. I
just know that the GPU faults if this is omitted.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
...and therefore it needs to be after a "logical end". This means that
"after_block_logical" will do the right thing for the last block.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
Check the invariant that the widths of vectors in the IR are consistent, by
checking that write registers and read registers match up between the writers
and readers respectively.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
Walk the IR instead. This happens when preprocessing so it doesn't really
matter, but it complicates the nir_variable audit.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
Prior to this change, agx_opt_cse is our most expensive backend pass, due to the
time spent hashing instructions. hash_instr was calling into XXH32 a massive
number of times, often to hash only a single bit. It's much faster to hash
entire blocks of memory at a time. Optimize to do just that.
With this change, agx_opt_cse is now cheaper than instruction selection as
it should be.
No shader-db changes (except CPU time decrease).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
We do need to use undefs instead of zeroes in this internal collect. While this
vector gets copypropped out, it'd cause us to fail compilation if noopt is on.
Fix that.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
This will be used for compute kernels (and transform feedback) in the (near)
future. For now, let's get the opcode plumbed in the backend to reduce some of
the rebase pain.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
See 0afd691f29 ("panfrost: clang-format the tree") for why I'm doing this.
Asahi already mostly follows Mesa style so this doesn't do much. But this means
we can all stop thinking about formatting and trust the robot poets to do that
for us.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20434>