Commit graph

495 commits

Author SHA1 Message Date
Alyssa Rosenzweig
f5d73a9989 agx: Use unified atomics
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22914>
2023-05-12 20:39:46 +00:00
Asahi Lina
64a595291e asahi: Add some more system registers
Core and opfifo stuff from the compute helper blob, vm_slot because it
was the only one changing when I poked around yesterday and it hit me
what it was ^^

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22971>
2023-05-11 23:24:48 +00:00
Alyssa Rosenzweig
5a80bf2eb0 agx: Optimize multiplies
We have an imad instruction and our iadd has a small immediate shift on the
second source. Together, these allow expressing lots of integer multiplies more
efficiently. Add some rules to optimize these now that the backend compiler can
ingest the optimized forms.

Half-register changes are from load_const scheduling changing in some vertex
shaders.

   total instructions in shared programs: 1539092 -> 1537949 (-0.07%)
   instructions in affected programs: 167896 -> 166753 (-0.68%)

   total bytes in shared programs: 10543012 -> 10533866 (-0.09%)
   bytes in affected programs: 1218068 -> 1208922 (-0.75%)

   total halfregs in shared programs: 483180 -> 483448 (0.06%)
   halfregs in affected programs: 1942 -> 2210 (13.80%)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
2023-05-11 09:23:23 -04:00
Alyssa Rosenzweig
c2793a304d agx: Fix packing of imsub instructions
The negate for imad is on the third source (a * b - c), not the second source.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
2023-05-11 09:23:23 -04:00
Alyssa Rosenzweig
8289fa253b agx: Handle imadshl_agx, imsubshl_agx
Same hardware instructions as iadd/isub/imad/imsub, just with the extra input
represented in NIR as required.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
2023-05-11 09:23:23 -04:00
Alyssa Rosenzweig
3df4ae3334 agx: Use nir_alu_src_as_uint
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22695>
2023-05-11 09:23:04 -04:00
Alyssa Rosenzweig
e79e743674 agx: Lower I/O to scalar later
This lets us preserve vectorized stores for transform feedback shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:10:36 -04:00
Alyssa Rosenzweig
a561a6c468 agx: Validate that collect sources are the same size
RA asserts this, but by then if you've messed it up, the failure is inscrutable.
Let's check it in the validator for more pleasant debugging.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:10:36 -04:00
Alyssa Rosenzweig
9337f6a865 agx: Rework z/s emit
We were being sloppy with the sizes before. It mostly worked out, but there were
some corner cases where we would end up with mixed sized collects and that won't
end well for us. Let's rework the logic to make all the sizes explicit in NIR --
32-bit for depth and 16-bit stencil -- and then do the needed promotions to make
it happen in the AGX IR side.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:10:36 -04:00
Alyssa Rosenzweig
f4f9269b66 agx: Ensure load_frag_coord has the right sizes
In case .x isn't read, it'll be null which has the wrong size and will fail
the validation added later in this series. We fix this by padding with sized
undefs (something that exists of defined size but undefined value) rather than
nothingness (of undefined size).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:10:36 -04:00
Alyssa Rosenzweig
7f71e1bc2d agx/lower_address: Match multiplies, not only shifts
Sometimes a shader might index with a non-power-of-two stride. For example, if
it's indexing into an array of structures where the structure size is not a
power of two, we'll get a multiply with a constant as opposed to a shift. We
want to handle these cases, too. To do so, we generalize our pattern matching to
look for any kind of multiply (with our new helper), rather than hardcoding
logic for ishl. This eliminates right-shifts in a pile of compute shaders, which
makes me happy from a "I read lots of shader assembly when debugging"
perspective.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:10:36 -04:00
Alyssa Rosenzweig
032d7bd302 agx/lower_address: Add helper to match multiplies
Currently, we hardcode logic in the addressing chasing code to look for ishl
instructions that shift by constants. We can generalize this to looking for
integer multiplies by constants to optimize more addressing patterns. Add a
helper to do so.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:10:36 -04:00
Alyssa Rosenzweig
31c805d0aa agx: Don't wait at the end of the shader
This is totally pointless. This saves some waits at the ends of compute kernels
(waiting for stores to complete before terminating the thread). I don't know
how much this would matter for performance, since the hardware may have to do
these waits internally, but it makes the generated code less silly which is
always nice.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:05:39 -04:00
Alyssa Rosenzweig
87e57eae09 asahi: Rename no colour output to tag write disable
Comparison with PowerVR's XML shows that this is the actual name... And it needs
to be set a bit more carefully than "no colour output" in order to get correct
behaviour for depth-only passes that use sample mask / discard. Fix the name
first, the extra conditions will come when they're needed for multisampling.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:05:39 -04:00
Alyssa Rosenzweig
e13f9caa25 agx: Fix packing for iadd with shift
Wrong bit pattern was packed, oops.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:05:39 -04:00
Alyssa Rosenzweig
bd9c33e16a agx: Defeature fsub
All has_fsub does is fuse fsubs (they're unfused otherwise), no point doing that
if we're going to just going to lower.

shader-db is mostly noise.

total instructions in shared programs: 1487217 -> 1487035 (-0.01%)
instructions in affected programs: 22658 -> 22476 (-0.80%)
helped: 85
HURT: 2
helped stats (abs) min: 1.0 max: 12.0 x̄: 2.19 x̃: 1
helped stats (rel) min: 0.38% max: 2.46% x̄: 0.87% x̃: 0.65%
HURT stats (abs)   min: 1.0 max: 3.0 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.58% max: 1.08% x̄: 0.83% x̃: 0.83%
95% mean confidence interval for instructions value: -2.51 -1.67
95% mean confidence interval for instructions %-change: -0.97% -0.70%
Instructions are helped.

total bytes in shared programs: 10189996 -> 10189288 (<.01%)
bytes in affected programs: 158132 -> 157424 (-0.45%)
helped: 85
HURT: 2
helped stats (abs) min: 4.0 max: 48.0 x̄: 8.75 x̃: 4
helped stats (rel) min: 0.22% max: 1.44% x̄: 0.51% x̃: 0.38%
HURT stats (abs)   min: 6.0 max: 30.0 x̄: 18.00 x̃: 18
HURT stats (rel)   min: 0.90% max: 0.91% x̄: 0.91% x̃: 0.91%
95% mean confidence interval for bytes value: -9.98 -6.30
95% mean confidence interval for bytes %-change: -0.56% -0.39%
Bytes are helped.

total halfregs in shared programs: 462536 -> 462556 (<.01%)
halfregs in affected programs: 131 -> 151 (15.27%)
helped: 1
HURT: 4
helped stats (abs) min: 2.0 max: 2.0 x̄: 2.00 x̃: 2
helped stats (rel) min: 28.57% max: 28.57% x̄: 28.57% x̃: 28.57%
HURT stats (abs)   min: 4.0 max: 8.0 x̄: 5.50 x̃: 5
HURT stats (rel)   min: 12.77% max: 36.36% x̄: 25.01% x̃: 25.45%
95% mean confidence interval for halfregs value: -0.65 8.65
95% mean confidence interval for halfregs %-change: -18.64% 47.23%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:05:39 -04:00
Alyssa Rosenzweig
1185ac931f agx: Remove bogus assert
I->mask isn't even valid for iter instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:00:59 -04:00
Alyssa Rosenzweig
fc88876329 agx: Handle linear 2D array textureSize()
We handle linear 2D arrays internally for blit shaders, so we need textureSize
to work for these. That requires some special casing, because there's a line
stride where the layer count would otherwise be. But it's not too bad.

Fixes
dEQP-GLES3.functional.shaders.texture_functions.texturesize.sampler2darray_*
when forcing linear textures.

Since we clamp array access to the maximum layer, we need textureSize() to work
for even the most basic array texturing. So this should fix blits from linear 2D
arrays as well, which finally unlocks support for compressed arrays/cubes/3D
textures.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:00:59 -04:00
Alyssa Rosenzweig
21d7049925 agx/lower_zs_emit: Fix progress returning
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:00:56 -04:00
Alyssa Rosenzweig
c8e331bf72 agx: Fix abs/neg propagation into fcmpsel
The first two sources are floats, the latter two sources and destination (and
hence the opcode) are not. Reflect that when packing and optimizing. Noticed
while debugging a silly dEQP test.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:00:56 -04:00
Alyssa Rosenzweig
632014ece0 agx: Handle splits of uniforms
This is straightforward, and can happen with certain u2u16 patterns.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:00:56 -04:00
Alyssa Rosenzweig
e9b471d1b3 asahi: Fix disk cache disable with AGX_MESA_DEBUG
We go to initialize the disk cache before we've compiled any shaders so
agx_compiler_debug is 0 at this point. Don't try to read it, instead go through
sa safe getter that will do the right thing.

Fixes: 5e9538c12e ("agx: isolate compiler debug flags")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
2023-05-07 09:00:40 -04:00
Asahi Lina
00064ba4e3 asahi: Fix style nits
Found with a grep abomination which is probably too broken/silly to
actually implement in CI... but hey, at least it found some.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
8a6d74d15b agx: Make signal_pix instructions explicit
Rather than implicitly packing them with the sample_mask. Again, this is just
changing where they're emitted, no functional changes yet. Bug for bug
compatibility with the old behaviour.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
bb530760a2 agx: Rename writeout to wait_pix
This is the name applegpu is currently using, to capture the semantics of a
pixel fence. I'm not sure what Apple calls this but wait_pix is closer than
writeout for sure.

This commit just does the rename. It doesn't fix the broken semantics we've had,
this is to ease review and bisection.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
2028e7b88b agx: Tease apart some sample_mask packing magic
There's a second instruction here, and a second source in the first instruction.
applegpu has known about the encodings for a while but I never updated the
packing code. We will need to stop hardcoding this for multisampling support, as
preparation tease apart the magic pieces.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:04 +00:00
Alyssa Rosenzweig
23880daa8d asahi: Lower 1D to 2D
Khronos APIs require that we support mipmapping even for 1D textures. However,
it isn't clear if this is supported in the hardware, and how it would work even
if it is. But 1D textures are pretty useless, so we just lower 1D textures to 2D
textures instead of worrying about that.

Fixes piles of Piglits relating to 1D textures.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
098295f1a0 asahi: Implement null textures
Use the same silly workaround that Metal does, to fill in texture descriptors
when there's nothing bound in the interest of robust behaviour.

Fixes null pointer dereference in
arb_shading_language_420pack-active-sampler-conflict.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
203c9c12e2 agx: Don't overallocate registers
We need to account for the full vector lengths. Especially important once we
start restricting the reg file.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
42c5d6140b agx: Coalesce more collects
Try harder to coalesce collects, by trying to allocate collects only to regions
of the register file where we actually have a full vector worth of registers
free. If we already know that the vector will be blocked later, it's not a good
base register to pick since we'd be force to shuffle later. So, this tweak to
the collect coalescing heuristic lets us eliminate a pile of pointless copying.

shader-db results are excellent. Note that, although we use more registers,
none of the shaders tested had their thread count affected, likely because the
max HURT isn't too high and most of the scary % here is from using a few more
registers when the register pressure is already low. In the near future, that
property will become guaranteed thanks to live range splitting, too.

total instructions in shared programs: 1507337 -> 1500562 (-0.45%)
instructions in affected programs: 428137 -> 421362 (-1.58%)
helped: 2658
HURT: 167
helped stats (abs) min: 1.0 max: 34.0 x̄: 2.63 x̃: 2
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.04% x̃: 2.14%
HURT stats (abs)   min: 1.0 max: 10.0 x̄: 1.24 x̃: 1
HURT stats (rel)   min: 0.20% max: 23.81% x̄: 3.90% x̃: 3.57%
95% mean confidence interval for instructions value: -2.49 -2.31
95% mean confidence interval for instructions %-change: -2.76% -2.51%
Instructions are helped.

total bytes in shared programs: 10333670 -> 10293172 (-0.39%)
bytes in affected programs: 2996682 -> 2956184 (-1.35%)
helped: 2660
HURT: 175
helped stats (abs) min: 2.0 max: 204.0 x̄: 15.70 x̃: 12
helped stats (rel) min: 0.08% max: 23.08% x̄: 2.64% x̃: 1.83%
HURT stats (abs)   min: 2.0 max: 60.0 x̄: 7.26 x̃: 6
HURT stats (rel)   min: 0.12% max: 22.39% x̄: 3.19% x̃: 2.78%
95% mean confidence interval for bytes value: -14.81 -13.76
95% mean confidence interval for bytes %-change: -2.39% -2.18%
Bytes are helped.

total halfregs in shared programs: 417284 -> 427363 (2.42%)
halfregs in affected programs: 49814 -> 59893 (20.23%)
helped: 95
HURT: 3018
helped stats (abs) min: 1.0 max: 8.0 x̄: 2.29 x̃: 2
helped stats (rel) min: 2.44% max: 28.57% x̄: 9.20% x̃: 6.06%
HURT stats (abs)   min: 1.0 max: 14.0 x̄: 3.41 x̃: 4
HURT stats (rel)   min: 2.08% max: 150.00% x̄: 36.54% x̃: 27.27%
95% mean confidence interval for halfregs value: 3.17 3.31
95% mean confidence interval for halfregs %-change: 34.05% 36.23%
Halfregs are HURT.

total threads in shared programs: 16465280 -> 16465280 (0.00%)
threads in affected programs: 0 -> 0
helped: 0
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
e713983875 agx: Add helper for calculating occupancy
Add information about the relationship between program register usage and
program occupancy (the maximum number of threads that may execute concurrently
on a single shader core). This table is derived from studying the
maxTotalThreadsPerThreadgroup property in Metal while varying the register
usage, something I blogged about a few years back. It's probably not 100%
accurate and it hasn't been tested against hardware, but it matters "only" for
performance (not correctness) so I'm not super stressed about the details.

In the (near) future, RA will be able to make use of this information to know
exactly when it can use more registers without hurting performance. In the
present, it's just used for better shader-db statistics.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
3a87d2cfbd agx: Don't destroy usub_sat with constant
Fixes KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-pad

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
8ec91ee16f agx: Don't allow uniform source to local_atomic
Fixes KHR-GLES31.core.compute_shader.atomic-case3

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
c643f42dc6 agx: Constify agx_{read,write}_registers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
da9c8a4627 agx: Assert that we don't overflow registers
This will become particularly important when we bound to smaller register files.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig
7c7b95ba2a agx: DCE even with noopt
To simplify live range splitting, RA will soon assume that DCE has run (removing
extraneous vectors). So run DCE even when otherwise disabling backend
optimizations. AGX_MESA_DEBUG=noopt is still useful for disabling instruction
combining, which is the more-likely-to-be-buggy pass anyway.

This also fixes IR not being printed with noopt.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>
2023-04-07 03:23:03 +00:00
Emma Anholt
094b5a71d7 agx: Enable nir_lower_frexp.
Needed for Vulkan, and for dropping GLSL frontend lowering for the deqp
coverage override case.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
2023-04-06 02:32:01 +00:00
Emma Anholt
2a33ea95d6 glsl: Retire ldexp lowering in favor of the nir lowering flag.
Compilers need to set the nir flag anyway for vulkan, so just pass ldexp
through to NIR and let that handle it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
2023-04-06 02:32:00 +00:00
Alyssa Rosenzweig
0f974d1f90 asahi: Convert to SPDX headers
Also drop my email address in the copyright lines and fix some "Copyright 208
Alyssa Rosenzweig" lines, I'm not *that* old. Together this drops a lot of
boilerplate without losing any meaningful licensing information. SPDX is already
in use for the MIT-licensed code in turnip, venus, and a few other scattered
parts of the tree, so this should be ok from a Mesa licensing standpoint.

This reduces friction to create new files, by parsing the copy/paste boilerplate
and being short enough you can easily type it out if you want.  It makes new
files seem less daunting: 20 lines of header for 30 lines of code is
discouraging, but 2 lines of header for 30 lines of code is reasonable for a
simple compiler pass. This has technical effects, as lowering the barrier to
making new files should encourage people to split code into more modular files
with (hopefully positive) effects on project compile time.

This helps with consistency between files. Across the tree we have at least a
half dozen variants of the MIT license text (probably more), plus code that uses
SPDX headers instead. I've already been using SPDX headers in Asahi manually, so
you can tell old vs new code based on the headers.

Finally, it means less for reviewers to scroll through adding files. Minimal
actual cognitive burden for reviewers thanks to banner blindness, but the big
headers still bloat diffs that add/delete files.

I originally proposed this in December (for much more of the tree) but someone
requested I wait until January to discuss. I've been trying to get in touch with
them since then. It is now almost April and, with still no response, I'd like to
press forward with this. So with a joint sign-off from the major authors of the
code in question, let's do this.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Rose Hudson <rose@krx.sh>
Acked-by: Lyude Paul [over IRC: "yes I'm fine with that"]
Meh'd-by: Rob Clark <robdclark@chromium.org>

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22062>
2023-03-28 05:14:00 +00:00
Eric Engestrom
8e6ac35658 asahi: fix a few typos
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21947>
2023-03-17 22:11:33 +00:00
Alyssa Rosenzweig
45554a957a agx: Lower discard late
Fixes regression with Dolphin's ubershaders.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21855>
2023-03-11 23:34:56 +00:00
Alyssa Rosenzweig
933b5c76f6 agx: Switch to scoped_barrier
Rather than ingesting separate control and memory barriers, ingest only the
combined and optimized scoped_barrier intrinsic. For barriers originating from
GLSL, this makes it easier to ensure correctness. For barriers originating from
SPIR-V, this is required for translation at all, as spirv_to_nir knows only
scoped barriers. So this gets us closer to Vulkan and OpenCL.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21752>
2023-03-11 16:20:06 +00:00
Alyssa Rosenzweig
b768a254f7 agx: Use nir_lower_mem_access_bit_sizes
Lowers away 64-bit loads, which we'll create in the sysval lowering for
dynamically indexed UBOs/VBOs. The lowering generates pack_64_2x32 instructions,
so lower those too.

No shader-db changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
2023-03-11 14:15:50 +00:00
Alyssa Rosenzweig
8a53050d7d agx: Implement extract_[ui]16
Instead of lowering to bitwise ops. Yet another way of subdividing in NIR.
Probably insignificant but makes it easy to check that the pass ordering from the
previous pass is right. It does let us get much better codegen for
unpacksnorm2x16, whatever that's worth.

No shader-db changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
2023-03-11 14:15:50 +00:00
Alyssa Rosenzweig
706815488e agx: Fix subdivision coalescing
As intended. We can't CSE with partial null destinations in the way, so we
shouldn't eliminate dead destinations until after CSE has run. But we should
still eliminate dead instructions to ensure CSE doesn't move things around
needlessly, hurting register pressure.

Noticed while debugging live range splitting.

No GLES3.0 shader-db changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
2023-03-11 14:15:50 +00:00
Alyssa Rosenzweig
5ea9c2e634 agx: Make partial DCE optional
Our dead code elimination pass does two things:

1. delete instructions that are entirely unnecessary
2. delete unnecessary destinations of necessary instructions

To deal with pass ordering issues, we sometimes want to do #1 without #2.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
2023-03-11 14:15:50 +00:00
Alyssa Rosenzweig
16f8bfb042 agx: Don't set lower_pack_split
We should handle nir_op_unpack_32_2x16_split_* natively, since we can generate
better code with agx_subdivide (coalescing the ops away) than the bitshift
lowering.

That said, we do need some extra instructions for the floating point
conversions.

No shader-db changes (which makes sense because we're targetting the GLES3.0
shader-db, which doesn't have the packing GLSL functions).

The real motivation of this change isn't optimizing some GLSL pack functions,
though, it's avoiding a code regression from using NIR's memory bit size
lowering in a future MR. That lowering will turn things like "load i16vec4" into
"load i32vec2 + unpack_32_2x16", so we need to be able to coalesce that unpack.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>
2023-03-11 14:15:50 +00:00
Alyssa Rosenzweig
6b22a02f90 asahi,agx: Implement buffer textures with gnarly NIR
Implement buffer textures in full generality.  There are a few issues here:

* OpenGL requires buffer textures support a minimum size of 65536 elements,
  however 1D textures in AGX are (at most) 8192 elements.

* OpenGL 4.0 (and OpenGL ES) require buffer textures to support the "RGB32"
  texture formats. These are 3 packed channels of 32-bits each. In general,
  non-power-of-two texel sizes are problematic. AGX does not support any such
  formats and we rely on the GL frontend to lower to a padded format (RGBX) if
  necessary. Such a lowering cannot work for buffer textures, however, so we
  need to find a way to implement RGB32 buffer textures.

We solve these issues in the follow way:

* Use 2D texture descriptors for buffer textures, with a large fixed
  power-of-two size along one axis. Then large texel indices may be accessed at
  a small vec2 texel coordinate, and since the fixed dimension is a
  power-of-two, that vector may be recovered by simply shifting and masking.
  This effectively avoids size restriction. We do need to clamp texel indices to
  the buffer size to avoid faulting on OOB reads, since we may read past the end
  of the buffer (if the app binds a non-page-aligned offset into the buffer).

* Use a general purpose memory load for RGB32 buffer textures. Lower the texture
  load instruction to a memory load from the buffer and some address arithmetic.
  There's no format conversion needed for RGB32, other than maybe filling in a
  format-appropriate alpha, so this is straightforward. Again, we need to clamp
  the texel index for robustness with OOB reads.

Each of these solutions brings its own problem.

* Using 2D textures instead of 1D requires physically rounding up the buffer
  size when packing the descriptor, so we can no longer implement textureSize()
  by reading off the texture descriptor like normal.

* We don't know at compile-time whether a given texture load will read from an
  RGB32 buffer texture or not, so we need to emit code for both. In Vulkan, we
  can't key the shader to this property, either, since it's descriptor set state
  and not pipeline state.

And each of these problems in turn brings its own solution:

* The texture descriptor is linear, so the "compression buffer address" field is
  ignored by the hardware. We stash the real buffer size there so that
  textureSize becomes a load from the texture descriptor like usual, without
  requiring a sideband (which would complicate bindless textures).

* If we determine a texture descriptor contains RGB32 data, then it will never
  be interpreted by the hardware and hence does not need to be a valid texture
  descriptor. So, we extend the hardware's format enum to contain a
  software-defined RGB32 format enum. Then, when lowering texture buffer loads,
  we either read it as a typed RGB32 memory load or as a texture load depending
  on the value of the format field in the texture descriptor.

All of this is accomplished with a big NIR pass generating a pile of strange
looking code. But it should be good enough in practice for this silly feature.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
2023-03-11 02:26:31 +00:00
Alyssa Rosenzweig
826649ba19 asahi, agx: Implement dummy samplers
In NIR, texelFetch (txf) does not use a sampler, but in AGX, it does -- even
though the contents of the sampler are semantically irrelevant. Rather than
requiring the state tracker to bind a sampler anyway (indicated for texture
buffers with PIPE_CAP_TEXTURE_BUFFER_SAMPLER), just add a dummy sampler
ourselves if txf is used and there are otherwise no samplers. This is helpful
because PIPE_CAP_TEXTURE_BUFFER_SAMPLER isn't honoured by Rusticl or seemingly
mesa/st's PBO code, and after implementing this dummy sampler workaround in
Panfrost for Rusticl, I realized this CAP is silly and shouldn't exist in the
first place. (And I regret pushing for its reinclusion.)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21672>
2023-03-11 02:26:31 +00:00
Alyssa Rosenzweig
ee6785309e agx: Handle indirect texture/samplers
Get the texture/sampler index from the texture/sampler_offset source (which
is an offset from 0 thanks to the lower_index_to_offset lowering) and feed it in
as corresponding 16-bit texture instruction sources.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21704>
2023-03-10 14:14:42 +00:00