Commit graph

2000 commits

Author SHA1 Message Date
Alyssa Rosenzweig
2c7635ab63 agx: add tests for sign/zero-extend propagate
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Alyssa Rosenzweig
6d56c8bc02 agx: fold zext into int sources
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Alyssa Rosenzweig
200d0794e2 agx: optimize signext+iadd
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Alyssa Rosenzweig
cfe0a9acec agx: add pseudo for signext
easier to optimize

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Alyssa Rosenzweig
8de339c0d8 agx: change int conversion test
it's not useful as is but we can salvage

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Alyssa Rosenzweig
0a81434adf agx: rewrite address mode lowering
AGX load/stores supports a single family of addressing modes:

   64-bit base + sign/zero-extend(32-bit) << (format shift + optional shift)

This is a base-index addressing mode, where the index is minimally in elements
(never bytes, unless we're doing 8-bit load/stores). Both base and the resulting
address must be aligned to the format size; the mandatory shift means that
alignment of base is equivalent to alignment of the final address, which is
taken care of by lower_mem_access_bit_size anyhow.

The other key thing to note is that this is a 64-bit shift, after the sign- or
zero-extension of the 32-bit index. That means that AGX does NOT implement

   64-bit base + sign/zero-extend(32-bit << shift)

This has sweeping implications.

For addressing math from C-based languages (including OpenCL C), the AGX mode is
more helpful, since we tend to get 64-bit shifts instead of 32-bit shifts.
However, for addressing math coming from GLSL, the AGX mode is rather annoying
since we know UBOs/SSBOs are at most 4GB so nir_lower_io & friends are all
32-bit byte indexing. It's tricky to teach them to do otherwise, and would not
be optimal either since 64-bit adds&shifts are *usually* much more expensive
than 32-bit on AGX *except* for when fused into the load/store.

So we don't want 32-bit NIR, since then we can't use the hardware addressing
mode at all. We also don't want 64-bit NIR, since then we have excessive 64-bit
math resulting from deep deref chains from complex struct/array cases. Instead,
we want a middle ground: 32-bit operations that are guaranteed not to overflow
32-bit and can therefore be losslessly promoted to 64-bit.

We can make that no-overflow guarantee as a consequence of the maximum UBO/SSBO
size, and indeed Mesa relies on this already all over the place. So, in this
series, we use relaxed amul opcodes for addressing
math. Then, we rewrite our address mode pattern matching to fuse AGX address
modes.

The actual pattern matching is rewritten. The old code was brittle handwritten
nir_scalar chasing, based on a faulty model of the hardware (with the 32-bit
shift). We delete it all, it's broken. In the new approach, we add some NIR
pseudo-opcodes for address math (ulea_agx/ilea_agx) which we pattern match with
NIR algebraic rules. Then the chasing required to fuse LEA's into load/stores is
trivial because we never go deeper than 1 level. After fusing, we then lower the
leftover lea/amul opcodes and let regular nir_opt_algebraic take it from
here.

We do need to be very careful around pass order to make sure things like
load/store vectorization still happen. Some passes are shuffled in this commit
to make this work. We also need to cleanup amul before fusing since we
specifically do not have nir_opt_algebraic do so - the entire point of the
pseudo-opcodes is to make nir_opt_algebraic ignore the opcodes until we've had a
chance to fuse. If we simply used the .nuw bit on iadd/imul, nir_opt_algebraic
would "optimize" things and lose the bit and then we would fail to fuse
addressing modes, which is a much more expensive failure case than anything
nir_opt_algebraic can do for us. I don't know what the "optimal" pass order for
AGX would look like at this point, but what we have here is good enough for now
and is a net positive for shader-db.

That all ends up being much less code and much simpler code, while fixing the
soundness holes in the old code, and also optimizing a significantly richer set
of addressing calculations. Now we don't juts optimize GL/VK modes, but also CL.
This is crucial even for GL/VK performance, since we rely on CL via libagx even
in graphics shaders.

Terraintessellation is up 10% to ~310fps, which is quite nice.

The following stats are for the end of the series together, including this
change + libagx change + the NIR changes building up to this... but not
including the SSBO vectorizer stats or the IC modelling fix. In other words,
these are the stats for "rewriting address mode handling". This is on OpenGL,
and since the old code was targeted at GL, anything that's not a loss is good
enough - we need this for the soundness fix regardless.

total instructions in shared programs: 2751356 -> 2750518 (-0.03%)
instructions in affected programs: 372143 -> 371305 (-0.23%)
helped: 715
HURT: 75
Instructions are helped.

total alu in shared programs: 2279559 -> 2278721 (-0.04%)
alu in affected programs: 304170 -> 303332 (-0.28%)
helped: 715
HURT: 75
Alu are helped.

total fscib in shared programs: 2277843 -> 2277008 (-0.04%)
fscib in affected programs: 304167 -> 303332 (-0.27%)
helped: 715
HURT: 75
Fscib are helped.

total ic in shared programs: 632686 -> 621886 (-1.71%)
ic in affected programs: 113078 -> 102278 (-9.55%)
helped: 1159
HURT: 82
Ic are helped.

total bytes in shared programs: 21489034 -> 21477530 (-0.05%)
bytes in affected programs: 3018456 -> 3006952 (-0.38%)
helped: 751
HURT: 107
Bytes are helped.

total regs in shared programs: 865148 -> 865114 (<.01%)
regs in affected programs: 1603 -> 1569 (-2.12%)
helped: 10
HURT: 9
Inconclusive result (value mean confidence interval includes 0).

total uniforms in shared programs: 2120735 -> 2120792 (<.01%)
uniforms in affected programs: 22752 -> 22809 (0.25%)
helped: 76
HURT: 49
Inconclusive result (value mean confidence interval includes 0).

total threads in shared programs: 27613312 -> 27613504 (<.01%)
threads in affected programs: 1536 -> 1728 (12.50%)
helped: 3
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig
d466ccc6bd libagx: promote math to use AGX address mode
we want to fit into the 64 + ext() << #n pattern to let us fuse address
arithmetic into our loads, so rework some libagx addressing to better match that

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig
77ce91e99b hk: reduce max SSBO size
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig
01d2aa1d53 agx: fix bfeil timing
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig
db8d467ec6 agx: model IC dispatch
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig
3c222da6c0 agx: vectorize SSBOs
this was missed due to the lowering, and mitigates a lot of stats weirdness with
the address mode rework.

total instructions in shared programs: 2755170 -> 2751399 (-0.14%)
instructions in affected programs: 16323 -> 12552 (-23.10%)
helped: 71
HURT: 0
helped stats (abs) min: 10 max: 178 x̄: 53.11 x̃: 42
helped stats (rel) min: 2.04% max: 50.00% x̄: 34.73% x̃: 40.79%
95% mean confidence interval for instructions value: -60.94 -45.28
95% mean confidence interval for instructions %-change: -37.81% -31.65%
Instructions are helped.

total alu in shared programs: 2169888 -> 2168281 (-0.07%)
alu in affected programs: 9547 -> 7940 (-16.83%)
helped: 71
HURT: 0
helped stats (abs) min: 5 max: 90 x̄: 22.63 x̃: 16
helped stats (rel) min: 1.02% max: 43.33% x̄: 25.39% x̃: 29.41%
95% mean confidence interval for alu value: -26.33 -18.93
95% mean confidence interval for alu %-change: -27.91% -22.87%
Alu are helped.

total fscib in shared programs: 2165597 -> 2163990 (-0.07%)
fscib in affected programs: 9547 -> 7940 (-16.83%)
helped: 71
HURT: 0
helped stats (abs) min: 5 max: 90 x̄: 22.63 x̃: 16
helped stats (rel) min: 1.02% max: 43.33% x̄: 25.39% x̃: 29.41%
95% mean confidence interval for fscib value: -26.33 -18.93
95% mean confidence interval for fscib %-change: -27.91% -22.87%
Fscib are helped.

total bytes in shared programs: 21517750 -> 21489352 (-0.13%)
bytes in affected programs: 126270 -> 97872 (-22.49%)
helped: 71
HURT: 0
helped stats (abs) min: 80 max: 1084 x̄: 399.97 x̃: 324
helped stats (rel) min: 1.77% max: 50.57% x̄: 35.07% x̃: 42.31%
95% mean confidence interval for bytes value: -455.66 -344.28
95% mean confidence interval for bytes %-change: -38.34% -31.79%
Bytes are helped.

total regs in shared programs: 864490 -> 865162 (0.08%)
regs in affected programs: 4567 -> 5239 (14.71%)
helped: 4
HURT: 61
helped stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
helped stats (rel) min: 4.51% max: 5.13% x̄: 4.82% x̃: 4.82%
HURT stats (abs)   min: 2 max: 24 x̄: 11.41 x̃: 12
HURT stats (rel)   min: 1.98% max: 82.35% x̄: 21.05% x̃: 16.00%
95% mean confidence interval for regs value: 8.52 12.16
95% mean confidence interval for regs %-change: 14.91% 24.00%
Regs are HURT.

total threads in shared programs: 27613056 -> 27613312 (<.01%)
threads in affected programs: 3200 -> 3456 (8.00%)
helped: 4
HURT: 0
helped stats (abs) min: 64 max: 64 x̄: 64.00 x̃: 64
helped stats (rel) min: 7.69% max: 8.33% x̄: 8.01% x̃: 8.01%
95% mean confidence interval for threads value: 64.00 64.00
95% mean confidence interval for threads %-change: 7.42% 8.60%
Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig
4a931ec9eb asahi/clc: ingest spir-v
use mesa_clc for the spir-v part, this improves incremental build granularity.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31923>
2024-11-01 13:25:37 -07:00
Alyssa Rosenzweig
0f278bf3c5 hk: enable constant promotion
reduce the perf gap with GL :)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
c8870da833 agx: fold more inots
noticed in the tessellator.

total instructions in shared programs: 2757905 -> 2757078 (-0.03%)
instructions in affected programs: 105372 -> 104545 (-0.78%)
helped: 115
HURT: 0
helped stats (abs) min: 1 max: 29 x̄: 7.19 x̃: 6
helped stats (rel) min: 0.02% max: 6.67% x̄: 2.01% x̃: 2.44%
95% mean confidence interval for instructions value: -8.67 -5.71
95% mean confidence interval for instructions %-change: -2.31% -1.71%
Instructions are helped.

total alu in shared programs: 2172400 -> 2171573 (-0.04%)
alu in affected programs: 82535 -> 81708 (-1.00%)
helped: 115
HURT: 0
helped stats (abs) min: 1 max: 29 x̄: 7.19 x̃: 6
helped stats (rel) min: 0.03% max: 9.58% x̄: 2.90% x̃: 3.30%
95% mean confidence interval for alu value: -8.67 -5.71
95% mean confidence interval for alu %-change: -3.33% -2.47%
Alu are helped.

total fscib in shared programs: 2168107 -> 2167280 (-0.04%)
fscib in affected programs: 82535 -> 81708 (-1.00%)
helped: 115
HURT: 0
helped stats (abs) min: 1 max: 29 x̄: 7.19 x̃: 6
helped stats (rel) min: 0.03% max: 9.58% x̄: 2.90% x̃: 3.30%
95% mean confidence interval for fscib value: -8.67 -5.71
95% mean confidence interval for fscib %-change: -3.33% -2.47%
Fscib are helped.

total bytes in shared programs: 21534940 -> 21528976 (-0.03%)
bytes in affected programs: 774528 -> 768564 (-0.77%)
helped: 115
HURT: 1
helped stats (abs) min: 2 max: 192 x̄: 51.88 x̃: 42
helped stats (rel) min: 0.01% max: 6.06% x̄: 1.85% x̃: 2.11%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.10% max: 0.10% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for bytes value: -62.70 -40.13
95% mean confidence interval for bytes %-change: -2.14% -1.52%
Bytes are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
d51ae1b634 agx: don't upload constant padding at the start
noticed in vkcube.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
d6d66bf72d asahi,agx: rework constant promotion upload
stuff promoted constants into the binary, this simplifies state management.
saves a big pile of alloc&copy in the gl driver. will unblock this for VK.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
a3696f29c1 agx: run algebraic later
to deal with ldst vectorize leftover

ironically worse due to nir_opt_preamble lottery, but confirmed it fixes ldst
vectorize silliness in preambles, making preambles a *lot* shorter.

total instructions in shared programs: 2759806 -> 2759882 (<.01%)
instructions in affected programs: 26821 -> 26897 (0.28%)
helped: 0
HURT: 10
HURT stats (abs)   min: 1 max: 15 x̄: 7.60 x̃: 6
HURT stats (rel)   min: 0.07% max: 1.33% x̄: 0.47% x̃: 0.19%
95% mean confidence interval for instructions value: 3.65 11.55
95% mean confidence interval for instructions %-change: 0.09% 0.85%
Instructions are HURT.

total alu in shared programs: 2174292 -> 2174340 (<.01%)
alu in affected programs: 25727 -> 25775 (0.19%)
helped: 1
HURT: 10
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
HURT stats (abs)   min: 1 max: 11 x̄: 5.00 x̃: 4
HURT stats (rel)   min: 0.09% max: 0.52% x̄: 0.27% x̃: 0.23%
95% mean confidence interval for alu value: 1.92 6.81
95% mean confidence interval for alu %-change: 0.12% 0.37%
Alu are HURT.

total fscib in shared programs: 2170011 -> 2170059 (<.01%)
fscib in affected programs: 25727 -> 25775 (0.19%)
helped: 1
HURT: 10
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
HURT stats (abs)   min: 1 max: 11 x̄: 5.00 x̃: 4
HURT stats (rel)   min: 0.09% max: 0.52% x̄: 0.27% x̃: 0.23%
95% mean confidence interval for fscib value: 1.92 6.81
95% mean confidence interval for fscib %-change: 0.12% 0.37%
Fscib are HURT.

total bytes in shared programs: 18414728 -> 18415244 (<.01%)
bytes in affected programs: 234114 -> 234630 (0.22%)
helped: 1
HURT: 11
helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8
helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
HURT stats (abs)   min: 4 max: 90 x̄: 47.64 x̃: 34
HURT stats (rel)   min: 0.03% max: 1.18% x̄: 0.39% x̃: 0.18%
95% mean confidence interval for bytes value: 20.47 65.53
95% mean confidence interval for bytes %-change: 0.08% 0.63%
Bytes are HURT.

total regs in shared programs: 864549 -> 864533 (<.01%)
regs in affected programs: 117 -> 101 (-13.68%)
helped: 3
HURT: 0
helped stats (abs) min: 4 max: 6 x̄: 5.33 x̃: 6
helped stats (rel) min: 10.26% max: 15.38% x̄: 13.68% x̃: 15.38%

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
25c302d337 agx: test immediate packing opt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
6d4dc9d9bf agx: negate iadd/imsub constants
total instructions in shared programs: 892853 -> 892841 (<.01%)
instructions in affected programs: 44400 -> 44388 (-0.03%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.02% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.03% -0.03%
Instructions are helped.

total alu in shared programs: 676057 -> 676045 (<.01%)
alu in affected programs: 28599 -> 28587 (-0.04%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.04% max: 0.05% x̄: 0.04% x̃: 0.04%
95% mean confidence interval for alu value: -1.00 -1.00
95% mean confidence interval for alu %-change: -0.05% -0.04%
Alu are helped.

total fscib in shared programs: 675565 -> 675553 (<.01%)
fscib in affected programs: 28599 -> 28587 (-0.04%)
helped: 12
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.04% max: 0.05% x̄: 0.04% x̃: 0.04%
95% mean confidence interval for fscib value: -1.00 -1.00
95% mean confidence interval for fscib %-change: -0.05% -0.04%
Fscib are helped.

total bytes in shared programs: 6047050 -> 6046978 (<.01%)
bytes in affected programs: 303744 -> 303672 (-0.02%)
helped: 12
HURT: 0
helped stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
helped stats (rel) min: 0.02% max: 0.03% x̄: 0.02% x̃: 0.02%
95% mean confidence interval for bytes value: -6.00 -6.00
95% mean confidence interval for bytes %-change: -0.03% -0.02%
Bytes are helped.

total uniforms in shared programs: 552413 -> 552315 (-0.02%)
uniforms in affected programs: 13800 -> 13702 (-0.71%)
helped: 48
HURT: 0
helped stats (abs) min: 2 max: 4 x̄: 2.04 x̃: 2
helped stats (rel) min: 0.39% max: 5.26% x̄: 0.96% x̃: 1.04%
95% mean confidence interval for uniforms value: -2.13 -1.96
95% mean confidence interval for uniforms %-change: -1.18% -0.75%
Uniforms are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
f6d8bb9a66 agx: optimize wait_pix a bit
this is a start at least.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
85b3dc90e0 nir,agx: lower fmin/fmax in NIR
we want to elide flushes, doing so requires more sophisticated analysis than I'd
like in the middle of isel. also, it should be done before forming preambles for
efficiency (notice the uniform reduction here). let's do it with a NIR pass.

total instructions in shared programs: 2768481 -> 2757832 (-0.38%)
instructions in affected programs: 644084 -> 633435 (-1.65%)
helped: 2242
HURT: 18
helped stats (abs) min: 1 max: 349 x̄: 4.77 x̃: 3
helped stats (rel) min: 0.01% max: 34.91% x̄: 3.19% x̃: 2.19%
HURT stats (abs)   min: 1 max: 19 x̄: 2.89 x̃: 1
HURT stats (rel)   min: 0.24% max: 7.94% x̄: 1.27% x̃: 0.81%
95% mean confidence interval for instructions value: -5.20 -4.22
95% mean confidence interval for instructions %-change: -3.30% -3.01%
Instructions are helped.

total alu in shared programs: 2182880 -> 2172352 (-0.48%)
alu in affected programs: 513166 -> 502638 (-2.05%)
helped: 2235
HURT: 16
helped stats (abs) min: 1 max: 349 x̄: 4.73 x̃: 3
helped stats (rel) min: 0.02% max: 37.65% x̄: 3.70% x̃: 2.59%
HURT stats (abs)   min: 1 max: 19 x̄: 2.50 x̃: 1
HURT stats (rel)   min: 0.33% max: 3.74% x̄: 1.04% x̃: 0.91%
95% mean confidence interval for alu value: -5.16 -4.20
95% mean confidence interval for alu %-change: -3.83% -3.49%
Alu are helped.

total fscib in shared programs: 2178643 -> 2168059 (-0.49%)
fscib in affected programs: 514666 -> 504082 (-2.06%)
helped: 2243
HURT: 17
helped stats (abs) min: 1 max: 349 x̄: 4.74 x̃: 3
helped stats (rel) min: 0.02% max: 37.65% x̄: 3.74% x̃: 2.59%
HURT stats (abs)   min: 1 max: 19 x̄: 2.65 x̃: 1
HURT stats (rel)   min: 0.33% max: 14.71% x̄: 1.85% x̃: 0.93%
95% mean confidence interval for fscib value: -5.16 -4.20
95% mean confidence interval for fscib %-change: -3.87% -3.53%
Fscib are helped.

total bytes in shared programs: 18467348 -> 18403042 (-0.35%)
bytes in affected programs: 4403648 -> 4339342 (-1.46%)
helped: 2247
HURT: 20
helped stats (abs) min: 2 max: 2132 x̄: 28.73 x̃: 18
helped stats (rel) min: 0.01% max: 33.53% x̄: 2.80% x̃: 1.94%
HURT stats (abs)   min: 4 max: 72 x̄: 12.60 x̃: 6
HURT stats (rel)   min: 0.23% max: 6.58% x̄: 1.06% x̃: 0.75%
95% mean confidence interval for bytes value: -31.29 -25.45
95% mean confidence interval for bytes %-change: -2.90% -2.64%
Bytes are helped.

total regs in shared programs: 864605 -> 864442 (-0.02%)
regs in affected programs: 4692 -> 4529 (-3.47%)
helped: 68
HURT: 48
helped stats (abs) min: 1 max: 54 x̄: 7.25 x̃: 3
helped stats (rel) min: 4.26% max: 43.20% x̄: 13.21% x̃: 10.53%
HURT stats (abs)   min: 1 max: 36 x̄: 6.88 x̃: 6
HURT stats (rel)   min: 3.64% max: 91.67% x̄: 23.12% x̃: 24.00%
95% mean confidence interval for regs value: -3.60 0.79
95% mean confidence interval for regs %-change: -2.10% 5.75%
Inconclusive result (value mean confidence interval includes 0).

total uniforms in shared programs: 2120927 -> 2120911 (<.01%)
uniforms in affected programs: 770 -> 754 (-2.08%)
helped: 6
HURT: 0
helped stats (abs) min: 2 max: 4 x̄: 2.67 x̃: 2
helped stats (rel) min: 1.79% max: 2.70% x̄: 2.13% x̃: 1.96%
95% mean confidence interval for uniforms value: -3.75 -1.58
95% mean confidence interval for uniforms %-change: -2.50% -1.76%
Uniforms are helped.

total threads in shared programs: 27612224 -> 27613056 (<.01%)
threads in affected programs: 7168 -> 8000 (11.61%)
helped: 6
HURT: 3
helped stats (abs) min: 64 max: 192 x̄: 170.67 x̃: 192
helped stats (rel) min: 8.33% max: 23.08% x̄: 20.62% x̃: 23.08%
HURT stats (abs)   min: 64 max: 64 x̄: 64.00 x̃: 64
HURT stats (rel)   min: 8.33% max: 9.09% x̄: 8.59% x̃: 8.33%
95% mean confidence interval for threads value: -3.17 188.06
95% mean confidence interval for threads %-change: -0.92% 22.69%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
b3ef0f5aa8 asahi: don't leak drm version
valgrind.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
b40fd95eee asahi/clc: strip nir
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
e86a35dad2 libagx: always tessellate clockwise
easy enough to flip later in the pipeline instead and reduce significantly the
tessellator variants.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
87e6324459 libagx: make points mode dynamic
it's a cold enough path.

16 to 10 tessellator variants.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
a7843643c6 libagx: drop generated VDM tess path (for now?)
this was definitely the coolest thing I did in my career. but it has a lot of
drawbacks:

* complexity across the whole stack.
* perf #s even for synthetic tess workloads are... lackluster. couple % on
  terraintess. and games tend not to be tess heavy (and if they are we're
  screwed and this won't save us.) ironically, sascha willem's non-terrain tess
  is sped up a ton by the prefix sum path, so.. lol?
* more brittle for (M3?) porting.
* makes it harder to make the tessellator common code.
* doesn't play nice with the indirect path.
* pile of extra tessellator variants.
* harder to test, coverage is already not great here.

so... drop it, for now. the code isn't gone, and the idea may come back in a
future iteration, perhaps based on mesh shaders. but in its current form I don't think it's worth keeping right now.

my main resevation is actually about heap usage from doubling the index buffer
size. hopefully this is tolerable in practice.

this gets us from 24 to 16 tessellator variants.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
02e29bdea4 libagx: don't rely on loop unroll in txs
silly, should reduce memory footprint.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
4607e0bf31 libagx: fix missing statics
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
ddf2f2f5b1 hk: make tess partitioning dynamic
no discernible difference in perf in terraintessellation. reduces tessellator
variants from 72 to 24.

before:
SHADER-DB:  - MESA_SHADER_COMPUTE shader: 2966 inst, 2310 alu, 2310 fscib, 1216 ic, 23148 bytes, 239 regs, 180 uniforms, 0 scratch, 384 threads, 17 loops, 0:0 spills:fills

after:
SHADER-DB:  - MESA_SHADER_COMPUTE shader: 3011 inst, 2343 alu, 2343 fscib, 1264 ic, 23508 bytes, 235 regs, 188 uniforms, 0 scratch, 384 threads, 17 loops, 0:0 spills:fills

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
d3d22039e1 hk: allow tess modes in either stage
this makes shader objects more flexible.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
9bbe93d158 hk: fix alpha-to-coverage with sample shading
fixes sascha willem's deferredmultisampling.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
d0b3b4c309 agx: move binary_size into info
this simplifies serialization, and will simplify future work.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
6c5be08269 agx: pack agx_cf_binding
dramatically reduces shader info size.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
5d628e3892 agx: fix uniform packing with local_load
fixes with constant promotion:

dEQP-VK.compute.shader_object_spirv.workgroup_memory_explicit_layout.alias.i32_to_u16_array_scalar_func_read_write_barrier

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Asahi Lina
02169e76dd agx: Fix queue destroy op for virtgpu
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
12ab3abaac hk: bump max push size
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
6dad37c812 hk: use push size macro
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
fe84aaa8a7 hk: switch to 64-bit queries
for timestamps

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
b1b125dbe6 hk: drop store_op_dontcare w/a
updated CTS fixes this.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
3ee036b2b5 hk: wire up indirect tess
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
a4e3ca1fc5 hk: add mechanism to test indirects
while we wait for cts.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
b8ae31948d hk: plumb indirect_local
for tess.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
e68aed3a13 libagx: extend indirect tess to handle indexed
this was missed, vk cts hits it

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Mary Guillemard
053d4c5666 hk: Fill deviceUUID
Before this the device UUID was never filled.

Fixes: 5bc8284816 ("hk: add Vulkan driver for Apple GPUs")
Signed-off-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Sergio Lopez
167744dce9 hk: allow overriding sysmem with an env var
When running in a VM, the amount of RAM seen through /proc/meminfo
doesn't necessarily reflect the real amount of memory available in the
system to be used as VRAM. For instance, on a system with 8 GB of
physical RAM, the user might want to run a VM with 2 GB of RAM and still
be able to use 4 GB as VRAM.

To achieve this, allow the reported sysmem to be overridden by the
value (in bytes) of the HK_SYSMEM environment variable.

Signed-off-by: Sergio Lopez <slp@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Sergio Lopez
2c18b0c6aa hk: limit the number of free BOs in each cmd pool
Some clients like to have long living command pools, each one with
multiple command buffers, and trigger bursts of BO allocations on them.
This leads to the an ever increasing number of BOs allocated on each
command pool, easily reaching over a couple thousand BOs per pool.

The visible effect is an ever increasing amount of VRAM consumption,
which has been observed to easily get over 700MB for a workload
consuming around 4GB of heap.

Here we limit the number of free BOs in each pool to 32. The rest are
dereferenced when the command buffer is reset. In practice, with
workloads triggering bursts of BO allocations on command buffers, those
BOs aren't really released, keeping them in the BO cache and getting
reused shortly after. This has the nice size effect of doing bookkeeping
on the BO cache, trimming it up when needed.

With this change, the tested workloads present a significant reduction
of VRAM, combined with an stabilization of BO allocations.

Signed-off-by: Sergio Lopez <slp@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
2c513e8689 agx: consistent ffma name
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
7e9d468891 agx: encoding_32 -> encoding
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
22a8a0fe6d agx: drop encoding_16
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00
Alyssa Rosenzweig
ce02690902 agx: special case mov_imm
so we can rm a worse special case

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31908>
2024-10-30 10:14:07 -04:00