Commit graph

169515 commits

Author SHA1 Message Date
Asahi Lina
0a132b0640 asahi: Add a helper macro for debug/error messages
This includes the program short name in the message, which is useful
when running entire desktop sessions with a single log to figure out who
is doing what.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Asahi Lina
883ba4b161 asahi: Make BO import path failures more robust
These operations can fail for complex reasons through no fault of mesa,
so we should have proper runtime checks for them even in release builds.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Asahi Lina
fcf594d00b asahi: Implement valid buffer range tracking
A common pattern is to allocate a vertex/etc buffer and write to it in
subsets. Some games interleave this with draw calls using the buffer.
This causes very expensive flushing for every draw call.

Fix this by tracking which range of a buffer has been written to, and
elide syncs when the range was previously uninitialized.

Fixes Source engine game performance and probably helps a bunch of
others.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Asahi Lina
00064ba4e3 asahi: Fix style nits
Found with a grep abomination which is probably too broken/silly to
actually implement in CI... but hey, at least it found some.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Asahi Lina
a88b9c5540 asahi: Locate low VA BOs correctly
These need the shader_base added to them.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Asahi Lina
030b2306a4 asahi: Enable glthread
This helps a lot with FEX, since the GPU driver runs emulated (and only
64bit supports thunking).

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Asahi Lina
4a5115c47b asahi: Make agx_alloc_staging() take a screen instead of a context
This makes it clear that it is thread-safe.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Asahi Lina
75e3212809 Revert "asahi: Advertise dual-source blending"
This reverts commit f4e2b22646.

This is broken until GL3 is enabled, possibly due to a core Mesa bug,
but it's a corner case not worth fixing.

Fixes Chromium.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
8a6d74d15b agx: Make signal_pix instructions explicit
Rather than implicitly packing them with the sample_mask. Again, this is just
changing where they're emitted, no functional changes yet. Bug for bug
compatibility with the old behaviour.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
bb530760a2 agx: Rename writeout to wait_pix
This is the name applegpu is currently using, to capture the semantics of a
pixel fence. I'm not sure what Apple calls this but wait_pix is closer than
writeout for sure.

This commit just does the rename. It doesn't fix the broken semantics we've had,
this is to ease review and bisection.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
2028e7b88b agx: Tease apart some sample_mask packing magic
There's a second instruction here, and a second source in the first instruction.
applegpu has known about the encodings for a while but I never updated the
packing code. We will need to stop hardcoding this for multisampling support, as
preparation tease apart the magic pieces.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
13b3da822b asahi: Clamp texture buffer sizes
Per the spec / freedreno. Fixes
arb_texture_buffer_object-texture-buffer-size-clamp

Fixes: 6b22a02f90 ("asahi,agx: Implement buffer textures with gnarly NIR")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
c4175c5fc8 asahi: Dirty track depth bias uploads
Reduces how much we upload in SuperTuxKart.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
23880daa8d asahi: Lower 1D to 2D
Khronos APIs require that we support mipmapping even for 1D textures. However,
it isn't clear if this is supported in the hardware, and how it would work even
if it is. But 1D textures are pretty useless, so we just lower 1D textures to 2D
textures instead of worrying about that.

Fixes piles of Piglits relating to 1D textures.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
098295f1a0 asahi: Implement null textures
Use the same silly workaround that Metal does, to fill in texture descriptors
when there's nothing bound in the interest of robust behaviour.

Fixes null pointer dereference in
arb_shading_language_420pack-active-sampler-conflict.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
1fb4e34020 asahi: Honour sampler count
It may not be equal to the texture count. Prevents a regression from the next
commit.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
203c9c12e2 agx: Don't overallocate registers
We need to account for the full vector lengths. Especially important once we
start restricting the reg file.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
42c5d6140b agx: Coalesce more collects
Try harder to coalesce collects, by trying to allocate collects only to regions
of the register file where we actually have a full vector worth of registers
free. If we already know that the vector will be blocked later, it's not a good
base register to pick since we'd be force to shuffle later. So, this tweak to
the collect coalescing heuristic lets us eliminate a pile of pointless copying.

shader-db results are excellent. Note that, although we use more registers,
none of the shaders tested had their thread count affected, likely because the
max HURT isn't too high and most of the scary % here is from using a few more
registers when the register pressure is already low. In the near future, that
property will become guaranteed thanks to live range splitting, too.

total instructions in shared programs: 1507337 -> 1500562 (-0.45%)
instructions in affected programs: 428137 -> 421362 (-1.58%)
helped: 2658
HURT: 167
helped stats (abs) min: 1.0 max: 34.0 x̄: 2.63 x̃: 2
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.04% x̃: 2.14%
HURT stats (abs)   min: 1.0 max: 10.0 x̄: 1.24 x̃: 1
HURT stats (rel)   min: 0.20% max: 23.81% x̄: 3.90% x̃: 3.57%
95% mean confidence interval for instructions value: -2.49 -2.31
95% mean confidence interval for instructions %-change: -2.76% -2.51%
Instructions are helped.

total bytes in shared programs: 10333670 -> 10293172 (-0.39%)
bytes in affected programs: 2996682 -> 2956184 (-1.35%)
helped: 2660
HURT: 175
helped stats (abs) min: 2.0 max: 204.0 x̄: 15.70 x̃: 12
helped stats (rel) min: 0.08% max: 23.08% x̄: 2.64% x̃: 1.83%
HURT stats (abs)   min: 2.0 max: 60.0 x̄: 7.26 x̃: 6
HURT stats (rel)   min: 0.12% max: 22.39% x̄: 3.19% x̃: 2.78%
95% mean confidence interval for bytes value: -14.81 -13.76
95% mean confidence interval for bytes %-change: -2.39% -2.18%
Bytes are helped.

total halfregs in shared programs: 417284 -> 427363 (2.42%)
halfregs in affected programs: 49814 -> 59893 (20.23%)
helped: 95
HURT: 3018
helped stats (abs) min: 1.0 max: 8.0 x̄: 2.29 x̃: 2
helped stats (rel) min: 2.44% max: 28.57% x̄: 9.20% x̃: 6.06%
HURT stats (abs)   min: 1.0 max: 14.0 x̄: 3.41 x̃: 4
HURT stats (rel)   min: 2.08% max: 150.00% x̄: 36.54% x̃: 27.27%
95% mean confidence interval for halfregs value: 3.17 3.31
95% mean confidence interval for halfregs %-change: 34.05% 36.23%
Halfregs are HURT.

total threads in shared programs: 16465280 -> 16465280 (0.00%)
threads in affected programs: 0 -> 0
helped: 0
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
43b221cd59 asahi: Set PIPE_CAP_LOAD_CONSTBUF
The CAP is a bit of a misnomer, what it really does is relax the alignment
requirements for UBO packing. It should work fine and save us some memory.
Noticed while debugging piglit fails.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
8e501b758a asahi/decode: Print VDM barriers
Instead of just decoding silently.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
0bbd8b502a asahi/decode: Remove agxdecode_dump_bo
Now that we have proper parsing this is more of a nuissance than not.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
e713983875 agx: Add helper for calculating occupancy
Add information about the relationship between program register usage and
program occupancy (the maximum number of threads that may execute concurrently
on a single shader core). This table is derived from studying the
maxTotalThreadsPerThreadgroup property in Metal while varying the register
usage, something I blogged about a few years back. It's probably not 100%
accurate and it hasn't been tested against hardware, but it matters "only" for
performance (not correctness) so I'm not super stressed about the details.

In the (near) future, RA will be able to make use of this information to know
exactly when it can use more registers without hurting performance. In the
present, it's just used for better shader-db statistics.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
05e614cc31 agx: Set loads_varying accurately
Instead of just always mashing to true. Should be better for depth-only passes.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
80adaa47e5 asahi: Add perf debug for shader variants
Compiling this can cause jank. This is still an issue in Quake3. There is a way
to solve it but it's rather involved and certainly not this weekend's project.
Better perf debugging on the other hand apparently is ^_^

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
3a4920e928 asahi: Add perf debug for generate_mipmap
The current implementation leaves a lot of perf on the table, so call it out on
ASAHI_MESA_DEBUG=perf to help debugging perf problems, especially if this
ever happens in a real application (i.e. not a benchmark).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
3a87d2cfbd agx: Don't destroy usub_sat with constant
Fixes KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-pad

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
8ec91ee16f agx: Don't allow uniform source to local_atomic
Fixes KHR-GLES31.core.compute_shader.atomic-case3

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
c643f42dc6 agx: Constify agx_{read,write}_registers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
da9c8a4627 agx: Assert that we don't overflow registers
This will become particularly important when we bound to smaller register files.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
7c7b95ba2a agx: DCE even with noopt
To simplify live range splitting, RA will soon assume that DCE has run (removing
extraneous vectors). So run DCE even when otherwise disabling backend
optimizations. AGX_MESA_DEBUG=noopt is still useful for disabling instruction
combining, which is the more-likely-to-be-buggy pass anyway.

This also fixes IR not being printed with noopt.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
75b858e904 asahi: Support more renderable formats
Fixes KHR-GLES3.copy_tex_image_conversions.forbidden.*

Arguably working around a mesa/st issue but more format support is good for
compatibility and performance anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Yiwei Zhang
fc22380c32 venus/docs: sync to latest venus supported extensions
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22243>
2023-04-07 03:05:02 +00:00
Yiwei Zhang
bb7424b4b4 venus: add VK_EXT_rasterization_order_attachment_access support
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22243>
2023-04-07 03:05:02 +00:00
Yiwei Zhang
9c19d426cd venus: add VK_EXT_load_store_op_none support
There's no feature/properties structs associated with this extension.

Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22243>
2023-04-07 03:05:02 +00:00
Yiwei Zhang
303a2136a4 venus: sync latest protocol for layering extensions
- VK_EXT_load_store_op_none
- VK_EXT_rasterization_order_attachment_access

Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22243>
2023-04-07 03:05:02 +00:00
Sajeesh Sidharthan
ab3507691a radeonsi/vcn: optimize bitstream buffer resize logic
bitstream buffer is unmapped, resized and mapped again if new size
is greater than the current bitstream buffer size. This will be done
for each input buffer. This patch will avoid that and do resize
only once irrespective of number of input buffers. With the new logic,
total size is calculated first and call unmap, resize and map only once.

Signed-off-by: Sajeesh Sidharthan <sajeesh.sidharthan@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22308>
2023-04-07 02:31:24 +00:00
Alyssa Rosenzweig
d1b569d26f nir/print: Don't print sampler_index for txf
NIR's docs for sampler_index say

    The following operations do not require a sampler and, as such, this
    field should be ignored:
       - nir_texop_txf
       - nir_texop_txf_ms
       - nir_texop_txs
       - nir_texop_query_levels
       - nir_texop_texture_samples
       - nir_texop_samples_identical

Contrary to this documentation, we were still printing the sampler_index anyway,
even though the value is formally undefined. This was helpful for
PIPE_CAP_TEXTURE_BUFFER_SAMPLER drivers that (despite the NIR docs) respected the
sampler_index anyway. There are no longer any such drivers, so we should stop
printing sampler_index for txf to avoid confusion (and noise).

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig
a9f9953928 docs/gallium: Note samplers are not used for txf
Now that PIPE_CAP_TEXTURE_BUFFER_SAMPLER is gone, txf does not require samplers
for any texture on any Gallium driver. NIR already requires drivers to ignore
sampler_index for non-sampler operation (mainly txf), and nowadays all Gallium
drivers ingest NIR. So, document that samplers aren't bound for txf (etc) as
part of the Gallium frontend-driver contract.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig
6ba29d37c8 gallium: Remove PIPE_CAP_TEXTURE_BUFFER_SAMPLER
No more users. It was already not respected by rusticl so you couldn't set it if
you wanted OpenCL support. I regret introducing the CAP in the first place, and
no more drivers should use it.

Reverts d5d3f77e4a ("gallium: Add new cap PIPE_CAP_TEXTURE_BUFFER_SAMPLER").

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig
e406e74aa4 panfrost: Unset TEXTURE_BUFFER_SAMPLERS
We no longer need this CAP, as we can easily synthesize our own internal sampler
for this case. Gallium doesn't need to know about this quirk of our hardware.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig
b9cc2b2a98 pan/{mdg,bi}: Always use sampler 0 for txf
Now that we upload workaround samplers for txf, sampler 0 is guaranteed to be
valid but other samplers are not. So ignore whatever the current sampler_index
value is (it's formally undefined in NIR) and use 0, which we know is valid. We
already do this on Valhall for OpenCL, just need to generalize for Midgard and
Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
2023-04-07 01:15:41 +00:00
Alyssa Rosenzweig
e15603bdf1 panfrost: Always upload a workaround sampler
The hardware requires a valid sampler even for texelFetch (txf), even though its
contents are ignored. We'd rather not pass on this requirement to the frontends,
so we should handle it by uploading our own workaround sampler in the case when
no sampler is already present. We already do this on Valhall (for rusticl), so
we just need to port the same workaround back to Midgard/Bifrost.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22223>
2023-04-07 01:15:40 +00:00
Mike Blumenkrantz
06bfe07212 zink: don't try copying multiple results for conditional render copy
conditional render is only a single result, so multiple results need
to first be aggregated

fixes #8798

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22345>
2023-04-07 00:52:27 +00:00
Ian Romanick
72a9d12c96 nir/tests: Port almost all loop_analyze tests to new macro-based infastructure
The one test that remains would have an automatically generated name
that would conflict with another test. This test is also a little
special (per the comment in the test), so it's probably best to leave it
separate anyway.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
2023-04-06 23:50:27 +00:00
Yevhenii Kolesnikov
9427aaeab7 nir/loop_analyze: Determine iteration counts for more kinds of loops
If loop iterator is incremented with something other than regular
addition, it would be more error prone to calculate the number of
iterations theoretically. What we can do instead, is try to emulate the
loop, and determine the number of iterations empirically.

These operations are covered:
 - imul
 - fmul
 - ishl
 - ishr
 - ushr

Also add unit tests for loop unrollment.

Improves performance of Aztec Ruins (sixonix
gfxbench5.aztec_ruins_vk_high) by -1.28042% +/- 0.498555% (N=5) on Intel
Arc A770.

v2 (idr): Rebase on 3 years. :( Use nir_phi_instr_add_src in the test
cases.

v3 (idr): Use try_eval_const_alu in to evaluate loop termination
condition in get_iteration_empirical. Also restructure the loop
slightly. This fixed off by one iteration errors in "inverted" loop
tests (e.g., nir_loop_analyze_test.ushr_ieq_known_count_invert_31).

v4 (idr): Use try_eval_const_alu in to evaluate induction variable
update in get_iteration_empirical. This fixes non-commutative update
operations (e.g., shifts) when the induction varible is not the first
source. This fixes the unit test
nir_loop_analyze_test.ishl_rev_ieq_infinite_loop_unknown_count.

v5 (idr): Fix _type parameter for fadd and fadd_rev loop unroll
tests. Hopefully that fixes the failure on s390x. Temporarily disable
fmul. This works-around the revealed problem in
glsl-fs-loop-unroll-mul-fp64, and there were no shader-db or fossil-db
changes.

v6 (idr): Plumb max_unroll_iterations into get_iteration_empirical. I
was going to do this, but I forgot. Suggested by Tim.

v7 (idr): Disable fadd tests on s390x. They fail because S390 is weird.

Almost all of the shaders affected (OpenGL or Vulkan) are from gfxbench
or geekbench. A couple shaders in Deus Ex (OpenGL), Dirt Rally (OpenGL),
Octopath Traveler (Vulkan), and Rise of the Tomb Raider (Vulkan) are
helped.

The lost / gained shaders in OpenGL are an Aztec Ruins shader that goes
from SIMD16 to SIMD8. The spills / fills affected are in a single Aztec
Ruins (Vulkan) compute shader.

shader-db results:

Skylake, Ice Lake, and Tiger Lake had similar results. (Tiger Lake shown)
total loops in shared programs: 5514 -> 5470 (-0.80%)
loops in affected programs: 62 -> 18 (-70.97%)
helped: 37 / HURT: 0

LOST:   2
GAINED: 2

Haswell and Broadwell had similar results. (Broadwell shown)
total loops in shared programs: 5346 -> 5298 (-0.90%)
loops in affected programs: 66 -> 18 (-72.73%)
helped: 39 / HURT: 0

fossil-db results:

Skylake, Ice Lake, and Tiger Lake had similar results. (Tiger Lake shown)
Instructions in all programs: 157374679 -> 157397421 (+0.0%)
Instructions hurt: 28

SENDs in all programs: 7463800 -> 7467639 (+0.1%)
SENDs hurt: 28

Loops in all programs: 38980 -> 38950 (-0.1%)
Loops helped: 28

Cycles in all programs: 7559486451 -> 7557455384 (-0.0%)
Cycles helped: 28

Spills in all programs: 11405 -> 11403 (-0.0%)
Spills helped: 1

Fills in all programs: 19578 -> 19588 (+0.1%)
Fills hurt: 1

Lost: 1

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
2023-04-06 23:50:27 +00:00
Yevhenii Kolesnikov
f051967f19 nir/loop_analyze: Track induction variables incremented by more operations
These operations are covered:

 - imul
 - fmul
 - ishl
 - ishr
 - ushr

The only cases that can be currently affected are those where the
calculated loop-trip count would be zero.

v2 (idr): Split out from original commit. Rebase on lots of other work.

v3 (idr): Move operand size assertion. This code only cares that the
operands have the same size for the iadd and fadd cases. In other
cases, such as shifts, the sizes may not match. Fixes assertion
failures in
tests/spec/arb_gpu_shader_int64/glsl-fs-loop-unroll-ishl-int64.shader_test.

No shader-db or fossil-db changes on any Intel platform.

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
2023-04-06 23:50:27 +00:00
Ian Romanick
bc170e895f nir/loop_analyze: Use try_eval_const_alu and induction variable basis info
This dramatically simplifies will_break_on_first_iteration, and, much
more importantly, makes it significantly more flexible. It is now
possible to handle loops with more complex exit condition and other
kinds of increment operations.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
2023-04-06 23:50:27 +00:00
Ian Romanick
99a7a6648d nir/loop_analyze: Change invert_cond instead of changing the condition
This ensures that scenarios like
nir_loop_analyze_test.iadd_inot_ilt_rev_known_count_5 don't regress in
the next commit. It also means we don't change float comparisons. These
are probably fine... but it still made me a little uneasy.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
2023-04-06 23:50:27 +00:00
Ian Romanick
aeb8af1141 nir/loop_analyze: Track induction variable basis information
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
2023-04-06 23:50:27 +00:00
Ian Romanick
30879a760c nir/loop_analyze: Add a function to evaluate an ALU as constant
...with a substitution. This function is largely a copy-and-paste of
try_fold_alu (nir_opt_constant_folding.c), and an argument could be made
that this function belongs in that file.

v2: Some changes were mistakenly squashed in to "nir/loop_analyze: Use
try_eval_const_alu and induction variable basis info" that should have
been here.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3445>
2023-04-06 23:50:27 +00:00